Summary: Longitudinal binary data often arise in clinical trials when repeated measurements, positive or negative to certain tests, are made on the same subject over time. To account for the serial corre lation within subjects, we propose a marginal logistic model which is implemented using the Generalized Estimating Equation (GEE) ap proach with working correlation matrices adopting some widely used forms. The aim of this paper is to seek some robust working correla tion matrices that give consistently good fit to the data. Model-fit is assessed using the modified expected utility of Walker & Guti´errez Pe˜na (1999). To evaluate the effect of the length of time series and the strength of serial correlation on the robustness of various working correlation matrices, the models are demonstrated using three data sets containing respectively all short time series, all long time series and time series of varying length. We identify factors that affect the choice of robust working correlation matrices and give suggestions under different situations.
Abstract: The modified autoregressive (mAR) index has been proposed as a description of the clustering of shots of similar duration in a motion picture. In this paper we derive robust estimates of the mAR index for high grossing films at the US box office using a rank-based autocorrelation function resis tant to the influence of outliers and compare this to estimates obtained using the classical, moment-based autocorrelation function. The results show that (1) The classical mAR index underestimates both the level of shot clustering in a film and the variation in style among the films in the sample; (2) there is a decline in shot clustering from 1935 to the 1950s followed by an increase from the 1960s to the 1980s and a levelling off thereafter rather than the monotonic trend indicated by the classical index, and this is mirrored in the trend of the median shot lengths and interquartile range; and (3) the rank mAR index identifies differences between genres overlooked when using the classical index.
Abstract: A field study was carried out to determine the spatial distribution of air dose rate on grazed grassland after the earthquake on 11 March, 2011 in the Northwest Pacific of Northeastern Japan. Data on air dose rates (µSv h-1) were collected from Ichinoseki, south of Iwate Prefecture, Japan. Air dose rates were collected from each of 1 m interval of 12 ×12 m2 site (L-site). At the center of Lsite, 1.2 ×1.2 m2 site (S-site) was located. One hundred and forty four (144) equal spaced quadrats were defined in the S-site. Again, air dose rates were collected from central point of each of the quadrat. Moran’s I, a measure of autocorrelation was used to test the spatial heterogeneity of air dose rate on grazed grassland. Autocorrelation in S-site area was significantly higher than L-site area. Air dose rate did not show significant autocorrelation at any spatial lag in L-site. In S-site, air dose rate level showed significant autocorrelation in twelve of sixteen spatial lag. Autocorrelograms and Moran’s scatterplot showed that air dose rate was frequently and positively spatially correlated at distance less than 0.1 m.
When releasing data to the public, a vital concern is the risk of exposing personal information of the individuals who have contributed to the data set. Many mechanisms have been proposed to protect individual privacy, though less attention has been dedicated to practically conducting valid inferences on the altered privacy-protected data sets. For frequency tables, the privacy-protection-oriented perturbations often lead to negative cell counts. Releasing such tables can undermine users’ confidence in the usefulness of such data sets. This paper focuses on releasing one-way frequency tables. We recommend an optimal mechanism that satisfies ϵ-differential privacy (DP) without suffering from having negative cell counts. The procedure is optimal in the sense that the expected utility is maximized under a given privacy constraint. Valid inference procedures for testing goodness-of-fit are also developed for the DP privacy-protected data. In particular, we propose a de-biased test statistic for the optimal procedure and derive its asymptotic distribution. In addition, we also introduce testing procedures for the commonly used Laplace and Gaussian mechanisms, which provide a good finite sample approximation for the null distributions. Moreover, the decaying rate requirements for the privacy regime are provided for the inference procedures to be valid. We further consider common users’ practices such as merging related or neighboring cells or integrating statistical information obtained across different data sources and derive valid testing procedures when these operations occur. Simulation studies show that our inference results hold well even when the sample size is relatively small. Comparisons with the current field standards, including the Laplace, the Gaussian (both with/without post-processing of replacing negative cell counts with zeros), and the Binomial-Beta McClure-Reiter mechanisms, are carried out. In the end, we apply our method to the National Center for Early Development and Learning’s (NCEDL) multi-state studies data to demonstrate its practical applicability.
The complexity of energy infrastructure at large institutions increasingly calls for data-driven monitoring of energy usage. This article presents a hybrid monitoring algorithm for detecting consumption surges using statistical hypothesis testing, leveraging the posterior distribution and its information about uncertainty to introduce randomness in the parameter estimates, while retaining the frequentist testing framework. This hybrid approach is designed to be asymptotically equivalent to the Neyman-Pearson test. We show via extensive simulation studies that the hybrid approach enjoys control over type-1 error rate even with finite sample sizes whereas the naive plug-in method tends to exceed the specified level, resulting in overpowered tests. The proposed method is applied to the natural gas usage data at the University of Connecticut.