An extension of truncated Poisson distribution having two parameters for a group of two types of population is derived and named as Bounded Poisson (BP) distribution. To estimate the parameters, method of moment has been employed. To check the suitability and applicability of the model it has been applied on real data set on human fertility derived from the third round of National Family Health Survey conducted in 2005-06 in Uttar Pradesh, India. Proposed model provides a good fitting to the data under consideration.
Abstract: Sample size and power calculations are often based on a two-group comparison. However, in some instances the group membership cannot be ascertained until after the sample has been collected. In this situation, the respective sizes of each group may not be the same as those prespecified due to binomial variability, which results in a difference in power from that expected. Here we suggest that investigators calculate an “expected power” taking into account the binomial variability of the group member ship, and adjust the sample size accordingly when planning such studies. We explore different scenarios where such an adjustment may or may not be necessary for both continuous and binary responses. In general, the number of additional subjects required depends only slightly on the values of the (standardized) difference in the two group means or proportions, but more importantly on the respective sizes of the group membership. We present tables with adjusted sample sizes for a variety of scenarios that can be readily used by investigators at the study design stage. The proposed approach is motivated by a genetic study of cerebral malaria and a sleep apnea study.
Abstract: In voting rights cases, judges often infer unobservable individ ual vote choices from election data aggregated at the precinct level. That is, one must solve an ill-posed inverse problem to obtain the critical information used in these cases. The ill-posed nature of the problem means that tradi tional frequentist and Bayesian approaches cannot be employed without first imposing a range of assumptions. In order to mitigate the problems result ing from incorporating potentially inaccurate information in these cases, we propose the use of information theoretic methods as a basis for recovering an estimate of the unobservable individual vote choices. We illustrate the empirical non-parametric likelihood methods with some election data.
Abstract: In this paper, we consider analysis of follow-up data where each event time is either right censored, observed, left censored or left truncated. In the case of left censoring, the covariates measured at baseline are considered as missing. The work is motivated by data from the MORGAM Project, which explores the association between cardiovascular diseases and their classic and genetic risk factors. We propose a nonparametric multiple imputation (NPMI) approach where the left censored event times and the missing covariates are imputed in hot deck manner. The left truncation due to deaths prior to baseline is compensated by Lexis diagram imputation introduced in the paper. After imputation, the standard estimation methods for right censored survival data can be directly applied. The performance of the proposed imputation approach is studied with simulated and real world data. The results suggest that the NPMI is a flexible and reliable approach to the analysis of left and right censored data.
Abstract: This paper describes how to explore gene expression data using a combination of graphical and numerical methods. We start from the general methodology for multivariate data visualization, describing heatmaps, par allel coordinate plots and scatterplots. We propose new methods for gene expression data analysis using direct manipulation graphics. With linked scatterplots and parallel coordinate plots we explore gene expression data differently than many common practices. To check replicates in relation to treatments we introduce a new type of plot called a “replicate line” plot. There is a worked example, that focuses on an experimental study containing two two-level factors, genotype and cofactor presence, with two replicates.
Abstract: In this paper we tried to fit a predictive model for the average annual rainfall of Bangladesh through a geostatistical approach. From geostatistical point of view, we studied the spatial dependence pattern of average annual rainfall data (measured in mm) collected from 246 stations of Bangladesh. We have employed kriging or spatial interpolation for rainfall data. The data reveals a linear trend when investigated, so by fitting a linear model we tried to remove the trend and, then we used the trend-free data for further calculations. Four theoretical semivariogram models Exponential, Spherical, Gaussian and Matern were used to explain the spatial variation among the average annual rainfall. These models are chosen according to the pattern of empirical semivariogram. The prediction performance of Ordinary kriging with these four fitted models are then compared through 𝑘 fold cross-validation and it is found that Ordinary Kriging performs better when the spatial dependency in average annual rainfall of Bangladesh is modeled through Gaussian semivariogram model.
This article presents a classification of disease severity for patients with cystic fibrosis (CF). CF is a genetic disease that dramatically decreases life expectancy and quality. The disease is characterized by polymicrobial infections which lead to lung remodeling and airway mucus plugging. In order to quantify disease severity of CF patients and compute a continuous severity index measure, quantile regression, rank scores, and corresponding normalized ranks are calculated for CF patients. Based on the rank scores calculated from the set of quantile regression models, a continuous severity index is computed for each CF patient and can be considered a robust estimate of CF disease severity.