Abstract: Data systems collecting information from different sources or over long periods of time can receive multiple reports from the same indi vidual. An important example is public health surveillance systems that monitor conditions with long natural histories. Several state-level systems for surveillance of one such condition, the human immunodeficiency virus (HIV), use codes composed of combinations of non-unique personal charac teristics such as birth date, soundex (a code based on last name), and sex as patient identifiers. As a result, these systems cannot distinguish between several different individuals having identical codes and a unique individual erroneously represented several times. We applied results for occupancy models to estimate the potential magnitude of duplicate case counting for AIDS cases reported to the Centers for Disease Control and Prevention with only non-unique partial personal identifiers. Occupancy models with equal and unequal occupancy probabilities are considered. Unbiased estimators for the numbers of true duplicates within and between case reporting areas are provided. Formulas to calculate estimators’ variances are also provided. These results can be applied to evaluating duplicate reporting in other data systems that have no unique identifier for each individual.
Abstract: The Weibull distribution is the most important distribution for problems in reliability. We study some mathematical properties of the new wider Weibull-G family of distributions. Some special models in the new family are discussed. The properties derived hold to any distribution in this family. We obtain general explicit expressions for the quantile function, or dinary and incomplete moments, generating function and order statistics. We discuss the estimation of the model parameters by maximum likelihood and illustrate the potentiality of the extended family with two applications to real data.
Some specific random fields have been studied by many researchers whose finite-dimensional marginal distributions are multivariate closed skewnormal or multivariate extended skew-t, in time and spatial domains. In this paper, a necessary and sufficient condition is provided for applicability of such random field in spatial interpolation, based on the marginal distributions. Two deficiencies of the random fields generated by some well-known multivariate distributions are pointed out and in contrast, a suitable skew and heavy tailed random field is proposed. The efficiency of the proposed random field is illustrated through the interpolation of a real data.
Abstract: Interval estimation for the proportion parameter in one-sample misclassified binary data has caught much interest in the literature. Re cently, an approximate Bayesian approach has been proposed. This ap proach is simpler to implement and performs better than existing frequen tist approaches. However, because a normal approximation to the marginal posterior density was used in this Bayesian approach, some efficiency may be lost. We develop a closed-form fully Bayesian algorithm which draws a posterior sample of the proportion parameter from the exact marginal posterior distribution. We conducted simulations to show that our fully Bayesian algorithm is easier to implement and has better coverage than the approximate Bayesian approach.
Abstract: Conservation of artifacts is a major concern of museum cura tors. Light, humidity, and air pollution are responsible for the deterioration of many artifacts and materials. We present here an exploratory analysis of humidity and temperature data that were collected to document the en vironment of the Bowdoin College Museum of Art, located in the Walker Art Building at Bowdoin College. As a result of this study, funds are being sought to install a climate control system.
Abstract: In Bayesian analysis of mortality rates it is standard practice to present the posterior mean rates in a choropleth map, a stepped statistical surface identified by colored or shaded areas. A natural objection against the posterior mean map is that it may not be the “best” representation of the mortality rates. One should really present the map that has the highest posterior density over the ensemble of areas in the map (i.e., the coordinates that maximize the joint posterior density of the mortality rates). Thus, the posterior modal map maximizes the joint posterior density of the mortality rates. We apply a Poisson regression model, a Bayesian hierarchical model, that has been used to study mortality data and other rare events when there are occurrences from many areas. The model provides convenient Rao-Blackwellized estimators of the mortality rates. Our method enables us to construct the posterior modal map of mortality data from chronic obstructive pulmonary diseases (COPD) in the continental United States. We show how to fit the Poisson regression model using Markov chain Monte Carlo methods (i.e., the Metropolis-Hastings sampler), and obtain both the posterior modal map and posterior mean map are obtained by an output analysis from the Metropolis-Hastings sampler. The COPD data are used to provide an empirical comparison of these two maps. As expected, we have found important differences between the two maps, and recommended that the posterior modal map should be used.
Earthquake in recent years has increased tremendously. This paper outlines an evaluation of Cumulative Sum (CUSUM) and Exponentially Weighted Moving Average (EWMA) charting technique to determine if the frequency of earthquake in the world is unusual. The frequency of earthquake in the world is considered from the period 1973 to 2016. As our data is auto correlated we cannot use the regular control chart like Shewhart control chart to detect unusual earthquake frequency. An approach that has proved useful in dealing with auto correlated data is to directly model time series model such as Autoregressive Integrated Moving Average (ARIMA), and apply control charts to the residuals. The EWMA control chart and the CUSUM control chart have detected unusual frequencies of earthquake in the year 2012 and 2013 which are state of statistically out of control.
Abstract: Motivated by a situation encountered in the Well Elderly 2 study, the paper considers the problem of robust multiple comparisons based on K independent tests associated with 2K independent groups. A simple strategy is to use an extension of Dunnett’s T3 procedure, which is designed to control the probability of one or more Type I errors. However, this method and related techniques fail to take into account the overall pattern of p-values when making decisions about which hypotheses should be rejected. The paper suggests a multiple comparison procedure that does take the overall pattern into account and then describes general situations where this alternative approach makes a practical difference in terms of both power and the probability of one or more Type I errors. For reasons summarized in the paper, the focus is on 20% trimmed means, but in principle the method considered here is relevant to any situation where the Type I error probability of the individual tests can be controlled reasonably well.
Abstract: In this paper, we propose a nonparametric approach using the Dirichlet processes (DP) as a class of prior distributions for the distribution G of the random effects in the hierarchical generalized linear mixed model (GLMM). The support of the prior distribution (and the posterior distribution) is large, allowing for a wide range of shapes for G. This provides great flexibility in estimating G and therefore produces a more flexible estimator than does the parametric analysis. We present some computation strategies for posterior computations involved in DP modeling. The proposed method is illustrated with real examples as well as simulations.