Abstract: We present an analysis of a health survey data by multiple cor respondence analysis (MCA) and multiple taxicab correspondence analysis (MTCA), MTCA being a robust L1 variant of MCA. The survey has one passive item, gender, and 22 active substantive items representing health services offered by municipal authorities; each active item has four answer categories: this service is used, never tried, tried with no access, non re sponse. We show that the first principal MTCA factor is perfectly charac terized by the sum score of the category this service is used over all service items. Further, we prove that such a sum score characterization always exists for any survey data.
Abstract: Testosterone levels decline as men age. There is little consensus on what testosterone levels are normal for aging men. In this paper, we estimate age-specific prevalence of testosterone deficiency in men using nor mal mixture models when no generally agreed on cut-off value for defining testosterone deficiency is available. The Box-Cox power transformation is used to determine which transformation is most appropriate for correcting skewness in data and best suits normal mixture distributions. Parametric bootstrap tests are used to determine the number of components in a normal mixture.
Abstract: In many clinical trials, information is collected on both the frequency of event occurrence and the severity of each event. For example, in evaluating a new anti-epileptic medication both the total number of seizures a patient has during the study period as well as the severity (e.g., mild, severe) of each seizure could be measured. In order to arrive at a full picture of drug or treatment performance, one needs to jointly model the number of events and their correlated ordinal severity measures. A separate analysis is not recommended as it is inefficient and can lead to what we define as “zero length bias” in estimates of treatment effect on severity. This paper proposes a general, likelihood based, marginal regression model for jointly modeling the number of events and their correlated ordinal severity measures. We describe parameter estimation issues and derive the Fisher information matrix for the joint model in order to obtain the asymptotic covariance matrix of the parameter estimates. A limited simulation study is conducted to examine the asymptotic properties of the maximum likelihood estimators. Using this joint model, we propose tests that incorporate information from both the number of events and their correlated ordinal severity measures. The methodology is illustrated with two examples from clinical trials: the first concerning a new drug treatment for epilepsy; the second evaluating the effect of a cholesterol lowering medication on coronary artery disease.
Anemia, especially among children, is a serious public health problem in Bangladesh. Apart from understanding the factors associated with anemia, it may be of interest to know the likelihood of anemia given the factors. Prediction of disease status is a key to community and health service policy making as well as forecasting for resource planning. We considered machine learning (ML) algorithms to predict the anemia status among children (under five years) using common risk factors as features. Data were extracted from a nationally representative cross-sectional survey- Bangladesh Demographic and Health Survey (BDHS) conducted in 2011. In this study, a sample of 2013 children were selected for whom data on all selected variables was available. We used several ML algorithms such as linear discriminant analysis (LDA), classification and regression trees (CART), k-nearest neighbors (k-NN), support vector machines (SVM), random forest (RF) and logistic regression (LR) to predict the childhood anemia status. A systematic evaluation of the algorithms was performed in terms of accuracy, sensitivity, specificity, and area under the curve (AUC). We found that the RF algorithm achieved the best classification accuracy of 68.53% with a sensitivity of 70.73%, specificity of 66.41% and AUC of 0.6857. On the other hand, the classical LR algorithm reached a classification accuracy of 62.75% with a sensitivity of 63.41%, specificity of 62.11% and AUC of 0.6276. Among all considered algorithms, the k-NN gave the least accuracy. We conclude that ML methods can be considered in addition to the classical regression techniques when the prediction of anemia is the primary focus.
Abstract: For the first time, we propose and study the Kumaraswamy generalized half-normal distribution for modeling skewed positive data. The half-normal and generalized half-normal (Cooray and Ananda, 2008) distributions are special cases of the new model. Various of its structural properties are derived, including explicit expressions for the density function, moments, generating and quantile functions, mean deviations and moments of the order statistics. We investigate maximum likelihood estimation of the parameters and derive the expected information matrix. The proposed model is modified to open the possibility that long-term survivors may be presented in the data. Its applicability is illustrated by means of four real data sets.
In this paper, a new version of the Poisson Lomax distributions is proposed and studied. The new density is expressed as a linear mixture of the Lomax densities. The failure rate function of the new model can be increasing-constant, increasing, U shape, decreasing and upside down-increasing. The statistical properties are derived and four applications are provided to illustrate the importance of the new density. The method of maximum likelihood is used to estimate the unknown parameters of the new density. Adequate fitting is provided by the new model.