Abstract: We present an analysis of a health survey data by multiple cor respondence analysis (MCA) and multiple taxicab correspondence analysis (MTCA), MTCA being a robust L1 variant of MCA. The survey has one passive item, gender, and 22 active substantive items representing health services offered by municipal authorities; each active item has four answer categories: this service is used, never tried, tried with no access, non re sponse. We show that the first principal MTCA factor is perfectly charac terized by the sum score of the category this service is used over all service items. Further, we prove that such a sum score characterization always exists for any survey data.
Abstract: We introduce a new class of the slash distribution using folded normal distribution. The proposed model defined on non-negative measure ments extends the slashed half normal distribution and has higher kurtosis than the ordinary half normal distribution. We study the characterization and properties involving moments and some measures based on moments of this distribution. Finally, we illustrate the proposed model with a simulation study and a real application.
Abstract: In the United States, diabetes is common and costly. Programs to prevent new cases of diabetes are often carried out at the level of the county, a unit of local government. Thus, efficient targeting of such programs re quires county-level estimates of diabetes incidence−the fraction of the non diabetic population who received their diagnosis of diabetes during the past 12 months. Previously, only estimates of prevalence−the overall fraction of population who have the disease−have been available at the county level. Counties with high prevalence might or might not be the same as counties with high incidence, due to spatial variation in mortality and relocation of persons with incident diabetes to another county. Existing methods cannot be used to estimate county-level diabetes incidence, because the fraction of the population who receive a diabetes diagnosis in any year is too small. Here, we extend previously developed methods of Bayesian small-area esti mation of prevalence, using diffuse priors, to estimate diabetes incidence for all U.S. counties based on data from a survey designed to yield state-level estimates. We found high incidence in the southeastern United States, the Appalachian region, and in scattered counties throughout the western U.S. Our methods might be applicable in other circumstances in which all cases of a rare condition also must be cases of a more common condition (in this analysis, “newly diagnosed cases of diabetes” and “cases of diabetes”). If ap propriate data are available, our methods can be used to estimate proportion of the population with the rare condition at greater geographic specificity than the data source was designed to provide.
Abstract: The aim of this study is to model the progression of HIV/AIDS disease of an individual patient under ART follow-up using semi-Markov pro cesses. Recorded hospital data were obtained for a cohort of 710 patients at Felege-Hiwot referral hospital, Ethiopia, who have been under ART follow up from June 2005 to August 2009. States of the Markov process are defined by the seriousness of the sickness based on the CD4 counts in cells/microliter. The five states considered are: state one (CD4 count > 500); state two (350 < CD4 count ≤ 500); state three (200 < CD4 count ≤ 350); state four (CD4 count ≤ 200); and state five (Death). The first four states are named as good or alive states. The findings obtained from the current study are as follows: within the good states, the transition probability from a given state to the next worse state increases with time, gets optimum at a time and then decreases with increasing time. This means that there is some period of time when such probability is highest for a patient to transit to a worse state of the disease. Moreover, the probability of dying decreases with in creasing CD4 counts over time. For an HIV/AIDS patient in a specific state of the disease, the probability of being in same state decreases over time. Within the good states, the results show that probability of being in a better state is non-zero, but less than the probability of being in worse state. At any time of the process, there is more likely to be in worse state than to be in better one. The conditional probability of staying in same state until a given number of month decreases with increasing time. The reliability analysis also revealed that the survival probabilities are all declining over time. This implies that patient conditions should be improved with ART to improve the survival probability.
Abstract: It is well known that the ordinary least squares (OLS) regression estimator is not robust. Many robust regression estimators have been proposed and inferential methods based on these estimators have been derived. However, for two independent groups, let θj (X) be some conditional measure of location for the jth group, given X, based on some robust regression estimator. An issue that has not been addressed is computing a 1 − confidence interval for θ1(X) − θ2(X) in a manner that allows both within group and between group hetereoscedasticity. The paper reports the finite sample properties of a simple method for accomplishing this goal. Simulations indicate that, in terms of controlling the probability of a Type I error, the method performs very well for a wide range of situations, even with a relatively small sample size. In principle, any robust regression estimator can be used. The simulations are focused primarily on the Theil-Sen estimator, but some results using Yohai’s MM-estimator, as well as the Koenker and Bas sett quantile regression estimator, are noted. Data from the Well Elderly II study, dealing with measures of meaningful activity using the cortisol awakening response as a covariate, are used to illustrate that the choice between an extant method based on a nonparametric regression estimator, and the method suggested here, can make a practical difference.
Abstract: Copulas have recently emerged as practical methods for multivari ate modeling. To our knowledge, only a limited amount of work has been done to apply copula-based modeling in context analysis. In this study, we generalized Clayton copula under the appropriate weighted function. In some examples, bivariate distributions by using the weighted Clayton cop ula are generalized. Also the properties of generalized Clayton copula are provided. The Clayton copula and weighted Clayton model cannot be used for negative dependence. These have been used to study left tail depen dence. This property is stronger in weighted Clayton model with respect to ordinary Clayton copula. It will also be shown that the generalized Clayton copula is suitable for the probable modeling of the hydrology data.
Abstract: The scheme of doubly type-II censored sampling is an important method of obtaining data in lifetime studies. Statistical analysis of life time distributions under this censoring scheme is based on precise lifetime data. However, some collected lifetime data might be imprecise and are represented in the form of fuzzy numbers. This paper deals with the prob lem of estimating the scale parameter of Rayleigh distribution under doubly type-II censoring scheme when the lifetime observations are fuzzy and are assumed to be related to underlying crisp realization of a random sample. We propose a new method to determine the maximum likelihood estimate of the parameter of interest. The asymptotic variance of the ML estimate is then derived by using the missing information principle. Their performance is then assessed through Monte Carlo simulations. Finally, an illustrative example with real data concerning 25 ball bearings in a life test is presented.
Abstract: A basic assumption concerned with general linear regression model is that there is no correlation (or no multicollinearity) between the explana tory variables. When this assumption is not satisfied, the least squares estimators have large variances and become unstable and may have a wrong sign. Therefore, we resort to biased regression methods, which stabilize the parameter estimates. Ridge regression (RR) and principal component regression (PCR) are two of the most popular biased regression methods which can be used in case of multicollinearity. But the r-k class estimator, which is composed by combining the RR estimator and the PCR estimator into a single estimator gives the better estimates of the regression coefficients than the RR estimator and PCR estimator. This paper explores the multiple regression technique using r-k class estimator between TFR and other socio-economic and demographic variables and the data has been taken from the National Family Health Survey-III (NFHS-III): 29 states of India. The analysis shows that use of contraceptive devices shares the greatest impact on fertility rate followed by maternal care, use of improved water, female age at marriage and spacing between births.
Abstract: We have developed a tool for model space exploration and variable selec tion in linear regression models based on a simple spike and slab model (Dey, 2012). The model chosen is the best model with minimum final prediction error (FPE) values among all other models. This is implemented via the R package modelSampler. However, model selection based on FPE criteria is dubious and question able as FPE criteria can be sensitive to perturbations in the data. This R package can be used for empirical assessment of the stability of FPE criteria. A stable model selection is accomplished by using a bootstrap wrapper that calls the primary function of the package several times on the bootstrapped data. The heart of the method is the notion of model averaging for sta ble variable selection and to study the behavior of variables over the entire model space, a concept invaluable in high dimensional situations.
Abstract: In this study, the data based on nucleic acid amplification tech niques (Polymerase chain reaction) consisting of 23 different transcript vari ables which are involved to investigate genetic mechanism regulating chlamy dial infection disease by measuring two different outcomes of muring C. pneumonia lung infection (disease expressed as lung weight increase and C. pneumonia load in the lung), have been analyzed. A model with fewer reduced transcript variables of interests at early infection stage has been obtained by using some of the traditional (stepwise regression, partial least squares regression (PLS)) and modern variable selection methods (least ab solute shrinkage and selection operator (LASSO), forward stagewise regres sion and least angle regression (LARS)). Through these variable selection methods, the variables of interest are selected to investigate the genetic mechanisms that determine the outcomes of chlamydial lung infection. The transcript variables Tim3, GATA3, Lacf, Arg2 (X4, X5, X8 and X13) are being detected as the main variables of interest to study the C. pneumonia disease (lung weight increase) or C. pneumonia lung load outcomes. Models including these key variables may provide possible answers to the problem of molecular mechanisms of chlamydial pathogenesis.