Robust estimation of the mAR index of high grossing ﬁlms at the US box oﬃce, 1935 to 2005

: The modiﬁed autoregressive (mAR) index has been proposed as a description of the clustering of shots of similar duration in a motion picture. In this paper we derive robust estimates of the mAR index for high grossing ﬁlms at the US box oﬃce using a rank-based autocorrelation function resistant to the inﬂuence of outliers and compare this to estimates obtained using the classical, moment-based autocorrelation function. The results show that (1) The classical mAR index underestimates both the level of shot clustering in a ﬁlm and the variation in style among the ﬁlms in the sample; (2) there is a decline in shot clustering from 1935 to the 1950s followed by an increase from the 1960s to the 1980s and a levelling oﬀ thereafter rather than the monotonic trend indicated by the classical index, and this is mirrored in the trend of the median shot lengths and interquartile range; and (3) the rank mAR index identiﬁes diﬀerences between genres overlooked when using the classical index.


Introduction
Cutting, De Long, and Nothelfer(2010) proposed the modified autoregressive (mAR) index as a statistic of film style measuring the degree to which shots of similar duration cluster together in a motion picture. They calculate the mAR index as the intercept of the negative exponential function 1/[1 + lag] β fitted to the partial autocorrelation function out to lag-20 with a critical value based on the average number of shots in a motion picture from a sample. Applying this method to 150 high grossing films at the US box office released from 1935 to 2005 they identified a tendency for shots to become increasingly more correlated in length with their neighbours over time and also noted variations in the degree of shot clustering between genres.
Though the mAR index can be a useful description of film style there is good reason to doubt the validity of these conclusions. The mAR values reported by Cutting, De Long, and Nothelfer are derived from the classical, moment-based estimator of the auto-covariance function, which is well known be non-resistant to the presence of outliers (Ma and Genton, 2000;Marrona,Martina, and Yohai, 2006). Typically, the distribution of shot lengths in a motion picture is positively skewed and contains a number of shots of atypically long duration that adversely affect the moments this function is calculated from (i.e. the mean and variance). Consequently, estimates of the mAR index determined in this way will not accurately describe the style of a film and lead to incorrect conclusions about the nature of film style. Because these long takes are 'true' outliers representing the decisions of filmmakers about the arrangement of stylistic elements (staging, cinematography, editing, etc) we are interested in using robust statistical methods that perform reliably in the presence of outliers and departures from the assumptions that underpin statistical methods (Marrona, Martina and Yohai, 2006). This paper calculates robust estimates of the mAR index using a rank-based autocorrelation function (rmAR) and compares these values to the index based on classical autocorrelation function (cmAR).

Classical and rank-based autocorrelation
The auto-covariance function describes the statistical dependence between the values taken by a stochastic process at two points in time. The classical, moment-based auto-covariance function of a weakly stationary time series x = (X 1 , ..., X n ) T is defined as whereX is the mean and h is a lag operator specifying the distance between the observations X i and X i+h . The denominator in (1) is the total sample size (n), and so this function is biased and positive semi-definite. The auto-covariance function when h = 0 is equal to the variance and standardising (1) by this value gives the autocorrelation function The autocorrelation function ranges from -1 to 1, with negative autocorrelation at lag h reflecting a tendency of observations to lie on opposite sides of the mean and positive autocorrelation a tendency for observations tend to lie on the same side of the mean. The partial autocorrelation function α(h, x) is the correlation between X i and X i+h with the linear dependence of the intervening lags removed, and can be calculated recursively using the Durbin-Levinson algorithm.
The above functions are not resistant to the influence of outlying data points. The mean and the variance of a dataset have finite sample breakdown points of 1/n and unbounded influence functions, and can be arbitrarily bad estimates of location and dispersion in the presence of even a single outlier. Therefore, the above functions, being based on these statistics, are similarly affected by the presence of outliers. The presence of outliers in the upper tail of a shot length distribution inflates the mean so that the majority of observations will tend to lie on the same side of the mean irrespective of the underlying structure of the time series. Consequently, the autocorrelation function will tend to overestimate positive autocorrelation and underestimate negative autocorrelation. The presence of outliers inflates the variance introducing a bias of ρ(h, x) toward zero that becomes stronger as the magnitude of the outlier increases as they appear quadratically in the denominator in (2) (see Marrona, Martina and Yohai, 2006, pp. 250-252). Consequently, the presence of outliers leads to underestimation of the strength of autocorrelation between observations in a time series. The lack of robustness of the classical auto-covariance and its derived functions mean that the information it carries about the structure of a time series can be destroyed by just a single outlier (Ma and Genton, 2000). Furthermore, if a time series contains more than one outlier we may find spuriously large autocorrelation coefficients when h is equal to the distance between outliers (Chatfield, 2004).
Rank-based methods provide an obvious alternative to the classical functions (Ferguson, Genest and Hallin, 2000) and have been explored since Wolfowitz (1943). Although some information is lost when ranking data, rankautocorrelation functions have a number of attractive properties: they are distributionfree while also being as powerful as classical methods (and in many cases more powerful); they are robust being relatively resistant to the influence of outliers and nonlinear distortions; and they are conceptually simple (Hallin and Puri, 1992). A rank-based approach to identifying serial dependency and periodicities in time series by Ahdesmaki, Lahdesmaki, Pearson, Huttunen and Yli-Harja (2005) calculates the autocorrelation function of a time series aŝ where R x (i) are the ranks of x i in S = {x t , t = 1, ..., n − h} and R x (i) are the ranks of x i+h in S = {x t+h , t = 1, ..., n − h}. As a moving-window extension of Spearman's rank correlation statistic,ρ s measures the monotonicity of the relationship between two observations and does not assume linearity. This function is biased and is directly comparable to the biased autocorrelation function based on (1), though it is not guaranteed to be positive semi-definite.

Methods
The data set used in this study comprises the same data used by Cutting, De Long, and Nothelfer accessed via the Cinemetrics database (http://www.cinemetrics.lv/index.php). However, we were unable to use all 150 films from the original study because the minimum shot length for nine films was given as 0.0 seconds and was less than 0.0s for seven films, presumably due to rounding or data entry errors. These films were excluded from the study to give a reduced sample size of 134 films.
We calculated the classical and rank autocorrelation functions for the linearly detrended shot lengths of films in the sample to h = (n + 1)/2 if n is odd and h = n/2 if n is even, where n is the number of shots in a film. The rank function in (3) was calculated using unpackaged R functions by Bernhard Spangl, 1 and verified as a valid positive definite sequence in each case. The partial autocorrelations for each measure were calculated recursively using the Durbin-Levinson algorithm, and the mAR indices determined by fitting the negative exponential function 1/[1 + h] β to the partial autocorrelation functions for lags 0 to 20 using nonlinear least squares (df = 20). The value of an index is the intercept between the fitted function and a critical value of 2/ √ N = 0.0611, where N = 1070 and is the median number of shots in a film in the sample. The methods used here differ from that originally used by Cutting, De Long and Nothefler and so our estimates of the cmAR differ from theirs.
The full set of results is in the supplementary material attached to this article. We were unable to determine the cmAR index for three films (A Night at the Opera [1935], The Great Dictator [1940], Detour [1945]) because the lag-1 autocorrelation was negative resulting in a singular gradient when fitting by non-linear least squares, and so these films are excluded from discussion of the classical mAR index but the rank index of each film is included. It is unclear how Cutting, De Long and Nothefler obtained mAR values for these films.
To describe trends in film style over time we fit a locally weighted (LOESS) regression smoother to the descriptive statistics. LOESS is a nonparametric method for graphically depicting the relationship between independent and dependent variables in a scatter-plot by fitting a low-order polynomial to only those observations in the neighbourhood of a point on the x-axis (x i ) rather than fitting the trendline globally. Observations within this window are inversely weighted according to their distance from the evaluation point so that points closest to x i have more influence on the placement of the LOESS curve than more distant observations. The degree of smoothing is controlled by the span, which specifies the proportion of the data included in the window. In this study the span was determined separately for each time series using a generalized cross-validation procedure, and a bootstrapped 95% confidence interval gives the precision of the LOESS curve. See Jacoby (2000) for an overview of LOESS regression.

Results
Comparing the cmAR and rmAR values calculated for each film we see that they are very different. Specifically, the classical, moment-based method underestimates both the degree to which shots of similar duration are clustered together within a film and differences in film style between films in the sample. The cmAR indices are less than the rank mAR index for 96% of films in the reduced sample, with a median difference between the two indices of -1.42 (95% CI: -1.62, -1.21). The largest difference is for Charlie's Angels (2000), which has a classical mAR index of 1.82 but a rank index of 6.82. We also note the dispersion of the rank index is greater for films in the sample indicating more variation in editing style than that suggested by the cmAR index: the range and standard deviation for the rmAR index are 7.37 and 1.40, respectively, and the corresponding statistics for the cmAR index are 4.01 and 0.79. Figure 1 presents the times series plots of the classical and rank mAR indices. The classical mAR index shows a gradual trend to increased shot clustering, and is consistent with the monotonic trend reported by Cutting, De Long, and Nothelfer in Figure 2.a of their article. The trendline for the rank mAR index shows a very different pattern of changes in film style with a decline in the clustering of shots from 1935 to the 1950s followed by an increase from the 1960s to the 1980s with a levelling off after 1985. Basing our analyses of changes in film style overtime on the classical mAR index would thus lead us to incorrectly describe changes in films style over time and to underestimate the size of those changes.
Cutting, De Long, and Nothelfer state that the trend in the cmAR index is not an artefact of decreases in the mean shot length, but because neither statistic is resistant to outliers this claim is dubious. From Figure 2.a we see that measures of location and scale for shot length distributions are strongly related, and so we combined the median and interquartile range (IQR) of the shot lengths of each film using principal components analysis to produce a new dummy variable that retains most of the information of the original variables (see Abdi and Williams (2010) for an overview of PCA). As an alternative descriptive statistic this dummy variable can be thought of as a size measure with films with a low score having a low median and low IQR a stronger tendency to more rapid editing while highscoring films with a high median and high IQR are edited more slowly. Plotting this score against year of release (Figure 2.b) we see the same trend in film style evident in the rank mAR index, with above average scores tending to come in the early decades of the sample while later films tend have lower-than-average scores. There is a slowing down of film editing from 1935 to the 1950s as the median and dispersion of shot lengths increases, followed by a decrease on both measures from the mid-1960s to 2005. The differences between a group of films and the one immediately preceding it have become smaller over time as editing has stabilised into a single Hollywood style: the greater variation in the scores in Figure 2.b for the 1940s and 1950s indicates much greater stylistic variation between films from those decades; while the trendline after the 1950s shows that high grossing films have converged to a single, rapidly-edited style. These trends correspond to the trends in shot clustering in Figure 1.b, indicating that changes in the rapidity of cutting and in the degree of shot clustering are a part of the same overall transformation of film style. Again, this is a relationship overlooked when using non-robust methods.
Cutting, De Long, and Nothelfer assigned the films in their sample to one of five genres (action, adventure, animation, comedy, drama) and compared the distribution of the mAR index of each genre. They found that action films tend to have a higher mAR index than films in other genres along with smaller differences between the other genres, though they did not correct for multiple comparisons. The beanplots in Figure 3 present the classical and rank mAR indices sorted by the above genre categories, and the differences in the level and dispersion of the indices are stark. To compare the distribution of indices for each genre we performed a Kruskal-Wallis ANOVA test, and pairwise post-hoc Dunn tests assuming an experiment-wise error rate of 0.10 and 10 tests giving a two-tailed Sidak-corrected p-value of 0.0105 and a critical Z -value of 2.56. All reported test statistics are corrected for ties. The omnibus test for the cmAR index shows a statistically significant difference between genres (χ 2 (4) = 28.50, p =< 0.01), with significant pairwise differences between the action genre and comedy films (Z = 4.09), and drama films (Z = 4.49). For the rmAR index we also see a statistically significant difference (χ 2 (4) = 38.61, p =< 0.01), with pairwise differences between action films and the animation (Z = 3.22), comedy (Z = 4.59), and drama (Z = 5.22) genres and between adventure films and comedy (Z = 3.15) and drama (Z = 3.60) films. (The difference between adventure films and animated films was not quite significant [Z = 2.46]). Using the classical mAR index would therefore lead us to miss key differences between the style of films in particular genres.

Conclusion
This paper compared estimates of the degree of shot clustering using the classical, moment-based autocorrelation function and a rank-autocorrelation function resistant to the influence of outliers. The results show that the classical mAR index underestimates both the level of shot clustering and the variation in style among the films in the sample. We also found that this index gives a misleading impression of changes in film style over time and that the trends identified by the rank mAR index are consistent with trends in other statistics describing the editing style of these films. Finally, the classical mAR index failed to identify key differences between films in the sample when sorted by genre. These results show that the mAR index can be a useful statistics of film style but that it is necessary to use robust methods due to the presence of outliers in shot length data.
Because the power spectral density of a time series is the Fourier transform of its autocorrelation function the problem of outliers in shot length data will be transmitted to the spectral analysis of time series. This raises questions about Cutting, De Long, and Nothelfer's claim that the editing of films in the sample shows an increasing tendency to be well fitted by a 1/f noise pattern over time. This should not lead us to reject the idea that a 1/f noise pattern is a characteristic of film editing. On the contrary, given that the rank mAR index indicates the correlation between shots has been underestimated it is likely the role of 1/f noise in entraining viewers' attention in the cinema has also been underestimated due to the incorrect identification of white noise in the composite power spectra of each film. Future research on the relationship between film style and attention will therefore need to employ robust methods of spectral analysis (see, for example, Spangl, 2008) to determine if this is the case. Wald, A., and Wolfowitz, J. (1943). An exact test of randomness in the nonparametric case based on serial correlation, Annals of Mathematical Statis-tics14, 378-388.