Abstract: In the United States, diabetes is common and costly. Programs to prevent new cases of diabetes are often carried out at the level of the county, a unit of local government. Thus, efficient targeting of such programs re quires county-level estimates of diabetes incidence−the fraction of the non diabetic population who received their diagnosis of diabetes during the past 12 months. Previously, only estimates of prevalence−the overall fraction of population who have the disease−have been available at the county level. Counties with high prevalence might or might not be the same as counties with high incidence, due to spatial variation in mortality and relocation of persons with incident diabetes to another county. Existing methods cannot be used to estimate county-level diabetes incidence, because the fraction of the population who receive a diabetes diagnosis in any year is too small. Here, we extend previously developed methods of Bayesian small-area esti mation of prevalence, using diffuse priors, to estimate diabetes incidence for all U.S. counties based on data from a survey designed to yield state-level estimates. We found high incidence in the southeastern United States, the Appalachian region, and in scattered counties throughout the western U.S. Our methods might be applicable in other circumstances in which all cases of a rare condition also must be cases of a more common condition (in this analysis, “newly diagnosed cases of diabetes” and “cases of diabetes”). If ap propriate data are available, our methods can be used to estimate proportion of the population with the rare condition at greater geographic specificity than the data source was designed to provide.
Abstract: Complexities involved with identifying the projection for a specific set of k factors (k = 2,..., 11) from an n-run (n = 12, 20 or 24) Plackett Burman design are described. Once the correct projection is determined, difficulties with selecting the necessary additional runs to complete either the full or half fraction factorial for the respective projection are noted, especially for n = 12, 20 or 24 and k = 4 or 5. Because of these difficulties, a user-friendly computational approach that identifies the projection and corresponding necessary follow-up runs to complete the full or half fraction factorial is given. The method is illustrated with a real data example.
Abstract: We have studied the effect of several factors that influence recombinant protein production, by using the expression of recombinant streptolysin-O as our model. This protein, produced by Streptococcus pyogenes, is important in the biotechnological industry, where it is used to produce immunodiagnostic reagents. In order to improve the yield of this protein, we tried an alternative production method using strains of Escherichia coli and recombinant DNA technology. We have evaluated this method at the laboratory scale, taking into account factors such as inductor concentration, temperature of induction, proportion of culture medium volume to total flask volume, and strain of Escherichia coli used. To this end we applied techniques of experimental design, particularly a “fixed-effects bifactorial design”, with the expression level of recombinant streptolysin-O in E. coli being the response to the factors. All the effects studied were found to be significant and relevant to the economics of the protein production.
Abstract: We introduce a new class of continuous distributions called the Ku maraswamy transmuted-G family which extends the transmuted class defined by Shaw and Buckley (2007). Some special models of the new family are provided. Some of its mathematical properties including explicit expressions for the ordinary and incomplete moments, generating function, Rényi and Shannon entropies, order statistics and probability weighted moments are derived. The maximum likelihood is used for estimating the model parameters. The flexibility of the generated family is illustrated by means of two applications to real data sets.
Abstract: HIV (Human Immunodeficiency Virus) researchers are often con cerned with the correlation between HIV viral load measurements and CD4+ lymphocyte counts. Due to the lower limits of detection (LOD) of the avail able assays, HIV viral load measurements are subject to left-censoring. Mo tivated by these considerations, the maximum likelihood (ML) method under normality assumptions was recently proposed for estimating the correlation between two continuous variables that are subject to left-censoring. In this paper, we propose a generalized estimating equations (GEE) approach as an alternative to estimate such a correlation coefficient. We investigate the robustness to the normality assumption of the ML and the GEE approaches via simulations. An actual HIV data example is used for illustration.
Abstract: In maximum likelihood exploratory factor analysis, the estimates of unique variances can often turn out to be zero or negative, which makes no sense from a statistical point of view. In order to overcome this difficulty, we employ a Bayesian approach by specifying a prior distribution for the variances of unique factors. The factor analysis model is estimated by EM algorithm, for which we provide the expectation and maximization steps within a general framework of EM algorithms. Crucial issues in Bayesian factor analysis model are the choice of adjusted parameters including the number of factors and also the hyper-parameters for the prior distribution. The choice of these parameters can be viewed as a model selection and evaluation problem. We derive a model selection criterion for evaluating a Bayesian factor analysis model. Monte Carlo simulations are conducted to investigate the effectiveness of the proposed procedure. A real data example is also given to illustrate our procedure. We observe that our modeling procedure prevents the occurrence of improper solutions and also chooses the appropriate number of factors objectively.
We introduce the four-parameter Kumaraswamy Gompertz distribution. We obtain the moments, generating and quantilefunctions, Shannon and Rényi entropies, mean deviations and Bonferroni and Lorenz curves. We provide a mixture representation for the density function of the order statistics. We discuss the estimation of the model parameters by maximum likelihood. We provide an application a real data set that illustrates the usefulness of the new model.
Abstract: In epidemiological studies where subjects are seen periodically on follow-up visits, interval-censored data occur naturally. The exact time the change of state (such as HIV seroconversion) occurs is not known exactly, only that it occurred sometime within a specific time interval. This paper considers estimation of parameters when HIV infection times are intervalcensored and correlated. It is assumed that each sexual partnership has a specific unobservable random effect that induces association between infection times. Parameters are estimated using the expectation-maximization algorithm and the Gibbs sampler. The results from the two methods are compared. Both methods yield fixed effects and baseline hazard estimates that are comparable. However, standard errors and frailty variance estimates are underestimated in the expectation-maximization algorithm compared to those from the Gibbs sampler. The Gibbs sampler is considered a plausible alternative to the expectation-maximization algorithm.