Overdispersion is a common phenomenon in Poisson modelling. The generalized Poisson (GP) distribution accommodates both overdispersion and under dispersion in count data. In this paper, we briefly overview different overdispersed and zero-inflated regression models. To study the impact of fitting inaccurate model to data simulated from some other model, we simulate data from ZIGP distribution and fit Poisson, Generalized Poisson (GP), Zero-inflated Poisson (ZIP), Zero-inflated Generalized Poisson (ZIGP) and Zero-inflated Negative Binomial (ZINB) model. We compare the performance of the estimates of Poisson, GP, ZIP, ZIGP and ZINB through mean square error, bias and standard error when the samples are generated from ZIGP distribution. We propose estimators of parameters of ZIGP distribution based on the first two sample moments and proportion of zeros referred to as MOZE estimator and compare its performance with maximum likelihood estimate (MLE) through a simulation study. It is observed that MOZE are almost equal or even more efficient than that of MLE of the parameters of ZIGP distribution.
Abstract: As a useful alternative to the Cox proportional hazards model, the linear regression survival model assumes a linear relationship between the covariates and a known monotone transformation, for example logarithm, of an event time of interest. In this article, we study the linear regression survival model with right censored survival data, when high-dimensional microarray measurements are present. Such data may arise in studies in vestigating the statistical influence of molecular features on survival risk. We propose using the principal component regression (PCR) technique for model reduction based on the weight least squared Stute estimate. Com pared with other model reduction techniques, the PCR approach is relatively insensitive to the number of covariates and hence suitable for high dimen sional microarray data. Component selection based on the nonparametric bootstrap, and model evaluation using the time-dependent ROC (receiver operating characteristic) technique are investigated. We demonstrate the proposed approach with datasets from two microarray gene expression pro filing studies of lymphoma cancers
Abstract: Accurately understanding the distribution of sediment measurements within large water bodies such as Lake Michigan is critical for modeling and understanding of carbon, nitrogen, silica, and phosphorus dynamics. Several water quality models have been formulated and applied to the Great Lakes to investigate the fate and transport of nutrients and other constituents, as well as plankton dynamics. This paper summarizes the development of spatial statistical tools to study and assess the spatial trends of the sediment data sets, which were collected from Lake Michigan, as part of Lake Michigan Mass Balance Study. Several new spatial measurements were developed to quantify the spatial variation and continuity of sediment data sets under concern. The applications of the newly designed spatial measurements on the sediment data, in conjunction with descriptive statistics, clearly reveal the existence of the intrinsic structure of strata, which is hypothesized based on linear wave theory. Furthermore, a new concept of strata consisting of two components defined based on depth is proposed and justified. The findings presented in this paper may impact the future studies of sediment within Lake Michigan and all of the Great Lakes as well.
Abstract: Trials for comparing interventions where cluster of subjects, rather than individuals, are randomized, are commonly called cluster randomized trials (CRTs). For comparison of binary outcomes in a CRT, although there are a few published formulations for sample size computation, the most commonly used is the one developed by Donner, Birkett, and Buck (Am J Epidemiol, 1981) probably due to its incorporation in the text book by Fleiss, Levin, and Paik (Wiley, 2003). In this paper, we derive a new χ 2 approximation formula with a general continuity correction factor (c) and show that specially for the scenarios of small event rates (< 0.01), the new formulation recommends lower number of clusters than the Donner et al. formulation thereby providing better efficiency. All known formulations can be shown to be special cases at specific value of the general correction factor (e.g., Donner formulation is equivalent to the new formulation for c = 1). Statistical simulation is presented with data on comparative efficacy of the available methods identifying correction factors that are optimal for rare event rates. Table of sample size recommendation for variety of rare event rates along with code in“R” language for easy computation of sample size in other settings is also provided. Sample size calculations for a published CRT (“Pathways to Health study” that evaluates the value of intervention for smoking cessation) are computed for various correction factors to illustrate that with an optimal choice of the correction factor, the study could have maintained the same power with a 20% less sample size.
Abstract: This paper is motivated by an investigation into the growth of pigs, which studied among other things the effect of short–term feed with drawal on live weight. This treatment was thought to reduce the variability in the weights of the pigs. We represent this reduction as an attenuation in an animal–specific random effect. Given data on each pig before and after treatment, we consider the problems of testing for a treatment effect and measuring the strength of the effect, if significant. These problems are related to those of testing the homogeneity of correlated variances, and re gression with errors in variables. We compare three different estimates of the attenuation factor using data on the live weights of pigs, and by simulation.
Abstract: This article presents and illustrates several important subset design approaches for Gaussian nonlinear regression models and for linear models where interest lies in a nonlinear function of the model parameters. These design strategies are particularly useful in situations where currentlyused subset design procedures fail to provide designs which can be used to fit the model function. Our original design technique is illustrated in conjuction with D-optimality, Bayesian D-optimality and Kiefer’s Φk-optimality, and is extended to yield subset designs which take account of curvature.
Abstract: We propose two simple, easy-to-implement methods for obtaining simultaneous credible bands in hierarchical models from standard Markov chain Monte Carlo output. The methods generalize Scheff´e’s (1953) approach to this problem, but in a Bayesian context. A small simulation study is followed by an application of the methods to a seasonal model for Ache honey gathering.
Abstract: The assessment of modality or “bumps” in distributions is of in terest to scientists in many areas. We compare the performance of four statistical methods to test for departures from unimodality in simulations, and further illustrate the four methods using well-known ecological datasets on body mass published by Holling in 1992 to illustrate their advantages and disadvantages. Silverman’s kernel density method was found to be very conservative. The excess mass test and a Bayesian mixture model approach showed agreement among the data sets, whereas Hall and York’s test pro vided strong evidence for the existence of two or more modes in all data sets. The Bayesian mixture model also provided a way to quantify the un certainty associated with the number of modes. This work demonstrates the inherent richness of animal body mass distributions but also the difficulties for characterizing it, and ultimately understanding the processes underlying them.
Abstract: In modeling and analyzing multivariate data, the conventionally used measure of dependence structure is the Pearson’s correlation coeffi cient. However use of the correlation as a dependence measure has several pitfalls. Copulas recently have emerged as an alternative measure of the de pendence, overcoming most of the drawbacks of the correlation. We discuss Archimedean copulas and their relationships with tail dependence. An algo rithm to construct empirical and Archimedean copulas is described. Monte Carlo simulations are carried out to replicate and analyze data sets by iden tifying the appropriate copula. We apply the Archimedean copula based methodology to assess the accuracy of Doppler echocardiography in deter mining aortic valve area from the Aortic Stenosis: Simultaneous Doppler – Catheter Correlative study carried out at the King Faisal Specialist Hospital and Research Centre, Riyadh, KSA