Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 892

Order by:

Select: All None Download:

Adapting the Extended Neyman’s Smooth Test to Be Used in Accelerated Failure Time Models

Abdalla Abdel-Ghaly Hanan Aly Elham Abdel-Rahman

https://doi.org/10.6339/JDS.201604_14(2).0005

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 14, Issue 2 (2016), pp. 271–294

Abstract

Abstract: Accelerated life testing (ALT) has gained greater importance because of dealing with high reliability units. As a result, there is a big need to use a goodness of fit (GOF) technique for testing the underlying lifetime distribution. But there is a difficulty due to the existence of several stress levels with different samples of units at each level. Then, the choice of a certain GOF technique is based on its capability to combine the failure times from all stress levels to reach a conclusion about the adequacy of a certain lifetime distribution at each stress level. In this paper, the extended Neyman’s smooth test (ENST) is chosen. It is then modified in order to be used in validating the distributional assumption of accelerated failure time (AFT) model. This modified method is called; the adapted extended Neyman’s smooth test (AENST). It is applied to test for both Weibull and exponential distributions in case of constant stress under complete sampling. To check the performance of the AENST, a comparison is made with the conditional probability integral transformation test (CPITT) via a simulation study. Moreover, a real data set is provided to illustrate the application of the introduced AENST. The results revealed that the AENST is a powerful test comparing with the CPITT. Thus, the AENST is recommended for testing the AFT models.

Identifying Groups: A Comparison of Methodologies

Abdolreza Eshghi Dominique Haughton Pascal Legrand

https://doi.org/10.6339/JDS.201104_09(2).0009

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 2 (2011), pp. 271–291

Abstract

Abstract: This paper describes and compares three clustering techniques: traditional clustering methods, Kohonen maps and latent class models. The paper also proposes some novel measures of the quality of a clustering. To the best of our knowledge, this is the first contribution in the literature to compare these three techniques in a context where the classes are not known in advance.

Bayesian Estimation of CIR Model

Xiaoxia Feng Dejun Xie

https://doi.org/10.6339/JDS.2012.10(2).746

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 10, Issue 2 (2012), pp. 271–280

Abstract

Abstract: This article concerns the Bayesian estimation of interest rate mod els based on Euler-Maruyama approximation. Assume the short term inter est rate follows the CIR model, an iterative method of Bayesian estimation is proposed. Markov Chain Monte Carlo simulation based on Gibbs sam pler is used for the posterior estimation of the parameters. The maximum A-posteriori estimation using the genetic algorithm is employed for finding the Bayesian estimates of the parameters. The method and the algorithm are calibrated with the historical data of US Treasury bills.

Information Fusion for Biological Prediction

Stefan Jaeger Su-Shing Chen

https://doi.org/10.6339/JDS.2010.08(2).607

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 8, Issue 2 (2010), pp. 269–288

Abstract

Abstract: Information fusion has become a powerful tool for challenging applications such as biological prediction problems. In this paper, we apply a new information-theoretical fusion technique to HIV-1 protease cleavage site prediction, which is a problem that has been in the focus of much interest and investigation of the machine learning community recently. It poses a difficult classification task due to its high dimensional feature space and a relatively small set of available training patterns. We also apply a new set of biophysical features to this problem and present experiments with neural networks, support vector machines, and decision trees. Application of our feature set results in high recognition rates and concise decision trees, producing manageable rule sets that can guide future experiments. In particular, we found a combination of neural networks and support vector machines to be beneficial for this problem.

Missing Information as a Diagnostic Tool for Latent Class Analysis

Ofer Harel Diana Miglioretti

https://doi.org/10.6339/JDS.2007.05(2).333

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 5, Issue 2 (2007), pp. 269–288

Abstract

Abstract: Latent class analysis (LCA) is a popular method for analyzing multiple categorical outcomes. Given the potential for LCA model assump tions to influence inference, model diagnostics are a particulary important part of LCA. We suggest using the rate of missing information as an addi tional diagnostic tool. The rate of missing information gives an indication of the amount of information missing as a result of observing multiple sur rogates in place of the underlying latent variable of interest and provides a measure of how confident one can be in the model results. Simulation studies and real data examples are presented to explore the usefulness of the proposed measure.

Modelling Progression of HIV/AIDS Disease Stages Using Semi-Markov Processes

Ayele Taye Goshu Zelalem Getahun Dessie

https://doi.org/10.6339/JDS.2013.11(2).1136

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 11, Issue 2 (2013), pp. 269–280

Abstract

Abstract: The aim of this study is to model the progression of HIV/AIDS disease of an individual patient under ART follow-up using semi-Markov pro cesses. Recorded hospital data were obtained for a cohort of 710 patients at Felege-Hiwot referral hospital, Ethiopia, who have been under ART follow up from June 2005 to August 2009. States of the Markov process are defined by the seriousness of the sickness based on the CD4 counts in cells/microliter. The five states considered are: state one (CD4 count > 500); state two (350 < CD4 count ≤ 500); state three (200 < CD4 count ≤ 350); state four (CD4 count ≤ 200); and state five (Death). The first four states are named as good or alive states. The findings obtained from the current study are as follows: within the good states, the transition probability from a given state to the next worse state increases with time, gets optimum at a time and then decreases with increasing time. This means that there is some period of time when such probability is highest for a patient to transit to a worse state of the disease. Moreover, the probability of dying decreases with in creasing CD4 counts over time. For an HIV/AIDS patient in a specific state of the disease, the probability of being in same state decreases over time. Within the good states, the results show that probability of being in a better state is non-zero, but less than the probability of being in worse state. At any time of the process, there is more likely to be in worse state than to be in better one. The conditional probability of staying in same state until a given number of month decreases with increasing time. The reliability analysis also revealed that the survival probabilities are all declining over time. This implies that patient conditions should be improved with ART to improve the survival probability.

Anemia in Married Females of Uttar Pradesh and Its relation to Body Mass Index: Application of Poisson Regression

Brijesh P. Singh Sonam Maheshwari Puneet Kumar Gupta

https://doi.org/10.6339/JDS.201704_15(2).0005

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 15, Issue 2 (2017), pp. 267–274

Abstract

Anemia is a common public health issue and multi-factorial condition which cuts across all the sections of the population and is associated with a variety of adverse outcomes, including mortality. According to the World Health Organization (WHO) anemia is defined as hemoglobin concentration in the blood. A female is anemic if hemoglobin concentration in the blood is less than 12 g/dl. Anemia is an indicator of poor nutrition and thus it is a public health issue which affects social and economic development of the region. The body mass index of married women is a high quality sign of a country’s health status as well as economic condition and generally it has four categories i.e. underweight, normal weight, overweight and obese. Body Mass Index (BMI) provides an indicator for supporting to wipe out many preventable diseases. Alteration in nutritional status plays an important role in the course of a person’s health. Hence, BMI can be used as an indicator for nutrition status, and association with some diseases can be expected. This study aimed to investigate the relationship between BMI and socioeconomic, demographic and health variables among 6723 currently married and non-pregnant women aged between 15-49 in Uttar Pradesh, India. In Indian population, overweight/obese women are significantly 86 percent more likely to be non-anemic, thus we may use BMI as a marker of anemia.

Iterative Optimal Sufficient Dimension Reduction for Conditional Mean in Multivariate Regressio

Jae Keun Yoo

https://doi.org/10.6339/JDS.2009.07(2).465

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 7, Issue 2 (2009), pp. 267–276

Abstract

Abstract: Recently, Yoo and Cook (2007) developed an optimal version of Cook and Setodji (2003). When predictors are not highly skewed, the Yoo-Cook approach can be improved, especially with small samples, by it eratively estimating the inner product matrix used in their method without changing their asymptotic results. Since highly skewed predictors are often transformed for normality in sufficient dimension reduction literature, the proposed method can have more useful application in practice than Yoo and Cook (2007).

Using the Non-Parametric Classifier CART to Model Wood Density

Eduardo Navarrete Miguel Espinosa

https://doi.org/10.6339/JDS.201104_09(2).0008

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 2 (2011), pp. 261–270

Abstract

Abstract: To identify the stand attributes that best explain the variability in wood density, Pinus radiata plantations located in the Chilean coastal sector were studied and modeled. The study area corresponded to stands located in sedimentary soil between the zones of Constituci on and Cobquecura. Within each sampling sector, individual tree variables were recorded and the most relevant stand parameters were estimated. Fifty trees were sampled in each sector, obtaining from each one six wood discs from different stem heights. Each disc was weighed in green and then dried to anhydrous weight, and its basic density was calculated. The profile identification to classify basic density according to stand characteristics was performed through regression trees, a technique based in the use of predictor variables to partition the database using recursive algorithms in regions with similar responses. The objective of the regression tree method is to obtain highly homogenous groups (branches), which are identified using pruning techniques that successively eliminate the branches that least contribute to the classification of the variable of interest. The results found that the stand attributes that contributed significantly to basic density classification were the basal area, the number of trees per hectare, and the mean height.

A Comparative Study Based on Bayes Estimation Under Different Censoring Criterion

Gyan Prakash.

https://doi.org/10.6339/JDS.201504_13(2).0003

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 13, Issue 2 (2015), pp. 261–280

Abstract

The censoring arises when exact lifetimes are only partially known, and it is useful in life testing experiments for time and cost restrictions. Especially, when some sample values at either or both extremes might have been adulterated. In present article, the Bayes estimation for unknown parameter of Gompertz distribution has been addressed based on three different censoring criterions. The performances of the procedures are illustrated by a simulation technique.

53 54 55 56 57

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China