Summary: Longitudinal binary data often arise in clinical trials when repeated measurements, positive or negative to certain tests, are made on the same subject over time. To account for the serial corre lation within subjects, we propose a marginal logistic model which is implemented using the Generalized Estimating Equation (GEE) ap proach with working correlation matrices adopting some widely used forms. The aim of this paper is to seek some robust working correla tion matrices that give consistently good fit to the data. Model-fit is assessed using the modified expected utility of Walker & Guti´errez Pe˜na (1999). To evaluate the effect of the length of time series and the strength of serial correlation on the robustness of various working correlation matrices, the models are demonstrated using three data sets containing respectively all short time series, all long time series and time series of varying length. We identify factors that affect the choice of robust working correlation matrices and give suggestions under different situations.
Abstract: The modified autoregressive (mAR) index has been proposed as a description of the clustering of shots of similar duration in a motion picture. In this paper we derive robust estimates of the mAR index for high grossing films at the US box office using a rank-based autocorrelation function resis tant to the influence of outliers and compare this to estimates obtained using the classical, moment-based autocorrelation function. The results show that (1) The classical mAR index underestimates both the level of shot clustering in a film and the variation in style among the films in the sample; (2) there is a decline in shot clustering from 1935 to the 1950s followed by an increase from the 1960s to the 1980s and a levelling off thereafter rather than the monotonic trend indicated by the classical index, and this is mirrored in the trend of the median shot lengths and interquartile range; and (3) the rank mAR index identifies differences between genres overlooked when using the classical index.
Abstract: A field study was carried out to determine the spatial distribution of air dose rate on grazed grassland after the earthquake on 11 March, 2011 in the Northwest Pacific of Northeastern Japan. Data on air dose rates (µSv h-1) were collected from Ichinoseki, south of Iwate Prefecture, Japan. Air dose rates were collected from each of 1 m interval of 12 ×12 m2 site (L-site). At the center of Lsite, 1.2 ×1.2 m2 site (S-site) was located. One hundred and forty four (144) equal spaced quadrats were defined in the S-site. Again, air dose rates were collected from central point of each of the quadrat. Moran’s I, a measure of autocorrelation was used to test the spatial heterogeneity of air dose rate on grazed grassland. Autocorrelation in S-site area was significantly higher than L-site area. Air dose rate did not show significant autocorrelation at any spatial lag in L-site. In S-site, air dose rate level showed significant autocorrelation in twelve of sixteen spatial lag. Autocorrelograms and Moran’s scatterplot showed that air dose rate was frequently and positively spatially correlated at distance less than 0.1 m.
The complexity of energy infrastructure at large institutions increasingly calls for data-driven monitoring of energy usage. This article presents a hybrid monitoring algorithm for detecting consumption surges using statistical hypothesis testing, leveraging the posterior distribution and its information about uncertainty to introduce randomness in the parameter estimates, while retaining the frequentist testing framework. This hybrid approach is designed to be asymptotically equivalent to the Neyman-Pearson test. We show via extensive simulation studies that the hybrid approach enjoys control over type-1 error rate even with finite sample sizes whereas the naive plug-in method tends to exceed the specified level, resulting in overpowered tests. The proposed method is applied to the natural gas usage data at the University of Connecticut.
Large-scale genomics studies provide researchers with access to extensive datasets with extensive detail and unprecedented scope that encompasses not only genes, but also more experimental functional units, including non-coding microRNAs (miRNAs). In order to analyze these high-fidelity data while remaining faithful to the underlying biology, statistical methods are necessary that can reflect the full range of understanding in contemporary molecular biology, while remaining flexible enough to analyze a wide range of data and complex phenomena. Leveraging multiple omics datasets, miRNA-gene targets as well as signaling pathway topology, we present an integrative linear model to analyze signaling pathways. Specifically, we use a mixed linear model to characterize tumor and healthy tissue, and execute statistical significance testing to identify pathway disturbances. In this paper, pan-cancer analysis is performed for a wide range of signaling pathways. We discuss specific findings from this analysis, as well as an interactive data visualization available for public consumption that contains the full range of our analytic findings.