Abstract: This paper describes and compares three clustering techniques: traditional clustering methods, Kohonen maps and latent class models. The paper also proposes some novel measures of the quality of a clustering. To the best of our knowledge, this is the first contribution in the literature to compare these three techniques in a context where the classes are not known in advance.
Abstract: This article concerns the Bayesian estimation of interest rate mod els based on Euler-Maruyama approximation. Assume the short term inter est rate follows the CIR model, an iterative method of Bayesian estimation is proposed. Markov Chain Monte Carlo simulation based on Gibbs sam pler is used for the posterior estimation of the parameters. The maximum A-posteriori estimation using the genetic algorithm is employed for finding the Bayesian estimates of the parameters. The method and the algorithm are calibrated with the historical data of US Treasury bills.
Abstract: Information fusion has become a powerful tool for challenging applications such as biological prediction problems. In this paper, we apply a new information-theoretical fusion technique to HIV-1 protease cleavage site prediction, which is a problem that has been in the focus of much interest and investigation of the machine learning community recently. It poses a difficult classification task due to its high dimensional feature space and a relatively small set of available training patterns. We also apply a new set of biophysical features to this problem and present experiments with neural networks, support vector machines, and decision trees. Application of our feature set results in high recognition rates and concise decision trees, producing manageable rule sets that can guide future experiments. In particular, we found a combination of neural networks and support vector machines to be beneficial for this problem.
Abstract: Latent class analysis (LCA) is a popular method for analyzing multiple categorical outcomes. Given the potential for LCA model assump tions to influence inference, model diagnostics are a particulary important part of LCA. We suggest using the rate of missing information as an addi tional diagnostic tool. The rate of missing information gives an indication of the amount of information missing as a result of observing multiple sur rogates in place of the underlying latent variable of interest and provides a measure of how confident one can be in the model results. Simulation studies and real data examples are presented to explore the usefulness of the proposed measure.
Abstract: The aim of this study is to model the progression of HIV/AIDS disease of an individual patient under ART follow-up using semi-Markov pro cesses. Recorded hospital data were obtained for a cohort of 710 patients at Felege-Hiwot referral hospital, Ethiopia, who have been under ART follow up from June 2005 to August 2009. States of the Markov process are defined by the seriousness of the sickness based on the CD4 counts in cells/microliter. The five states considered are: state one (CD4 count > 500); state two (350 < CD4 count ≤ 500); state three (200 < CD4 count ≤ 350); state four (CD4 count ≤ 200); and state five (Death). The first four states are named as good or alive states. The findings obtained from the current study are as follows: within the good states, the transition probability from a given state to the next worse state increases with time, gets optimum at a time and then decreases with increasing time. This means that there is some period of time when such probability is highest for a patient to transit to a worse state of the disease. Moreover, the probability of dying decreases with in creasing CD4 counts over time. For an HIV/AIDS patient in a specific state of the disease, the probability of being in same state decreases over time. Within the good states, the results show that probability of being in a better state is non-zero, but less than the probability of being in worse state. At any time of the process, there is more likely to be in worse state than to be in better one. The conditional probability of staying in same state until a given number of month decreases with increasing time. The reliability analysis also revealed that the survival probabilities are all declining over time. This implies that patient conditions should be improved with ART to improve the survival probability.
Anemia is a common public health issue and multi-factorial condition which cuts across all the sections of the population and is associated with a variety of adverse outcomes, including mortality. According to the World Health Organization (WHO) anemia is defined as hemoglobin concentration in the blood. A female is anemic if hemoglobin concentration in the blood is less than 12 g/dl. Anemia is an indicator of poor nutrition and thus it is a public health issue which affects social and economic development of the region. The body mass index of married women is a high quality sign of a country’s health status as well as economic condition and generally it has four categories i.e. underweight, normal weight, overweight and obese. Body Mass Index (BMI) provides an indicator for supporting to wipe out many preventable diseases. Alteration in nutritional status plays an important role in the course of a person’s health. Hence, BMI can be used as an indicator for nutrition status, and association with some diseases can be expected. This study aimed to investigate the relationship between BMI and socioeconomic, demographic and health variables among 6723 currently married and non-pregnant women aged between 15-49 in Uttar Pradesh, India. In Indian population, overweight/obese women are significantly 86 percent more likely to be non-anemic, thus we may use BMI as a marker of anemia.
Abstract: Recently, Yoo and Cook (2007) developed an optimal version of Cook and Setodji (2003). When predictors are not highly skewed, the Yoo-Cook approach can be improved, especially with small samples, by it eratively estimating the inner product matrix used in their method without changing their asymptotic results. Since highly skewed predictors are often transformed for normality in sufficient dimension reduction literature, the proposed method can have more useful application in practice than Yoo and Cook (2007).
Abstract: To identify the stand attributes that best explain the variability in wood density, Pinus radiata plantations located in the Chilean coastal sector were studied and modeled. The study area corresponded to stands located in sedimentary soil between the zones of Constituci on and Cobquecura. Within each sampling sector, individual tree variables were recorded and the most relevant stand parameters were estimated. Fifty trees were sampled in each sector, obtaining from each one six wood discs from different stem heights. Each disc was weighed in green and then dried to anhydrous weight, and its basic density was calculated. The profile identification to classify basic density according to stand characteristics was performed through regression trees, a technique based in the use of predictor variables to partition the database using recursive algorithms in regions with similar responses. The objective of the regression tree method is to obtain highly homogenous groups (branches), which are identified using pruning techniques that successively eliminate the branches that least contribute to the classification of the variable of interest. The results found that the stand attributes that contributed significantly to basic density classification were the basal area, the number of trees per hectare, and the mean height.
The censoring arises when exact lifetimes are only partially known, and it is useful in life testing experiments for time and cost restrictions. Especially, when some sample values at either or both extremes might have been adulterated. In present article, the Bayes estimation for unknown parameter of Gompertz distribution has been addressed based on three different censoring criterions. The performances of the procedures are illustrated by a simulation technique.
Abstract: The rule of three gives 3/n as the upper 95% bound for the success rate of the zero-numerator problems. However, this bound is usu ally conservative although it is useful in practice. Some Bayesian methods with beta distributions as priors have been studied. However, choosing the parameters for the priors is subjective and can severely impact the corre sponding posterior distributions. In this paper, some hierarchical models are proposed, which provide practitioners other options for those zero-numerator problems.