Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 877

Order by:

Select: All None Download:

Improved Tolerance Limits by Combining Analytical and Experimental Data: An Information Integration Methodology

A. Alexandre Trindade Stan Uryasev

https://doi.org/10.6339/JDS.2006.04(3).271

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 4, Issue 3 (2006), pp. 371–386

Abstract

Abstract: We propose a coherent methodology for integrating different sources of information on a response variable of interest, in order to accurately predict percentiles of its distribution. Under the assumption that one of the sources is more reliable than the other(s), the approach combines factors formed from the data into an additive linear regression model. Quantile regression, designed for quantifying the goodness of fit precisely at a desired quantile, is used as the optimality criterion in model-fitting. Asymptotic confidence interval construction methods for the percentiles are adopted to compute statistical tolerance limits for the response. The approach is demonstrated on a materials science case study that pools together information on failure load from physical tests and computer model predictions. A small simulation study assesses the precision of the inferences. The methodology gives plausible percentile estimates. Resulting tolerance limits are close to nominal coverage probability levels.

Deinstitutionalization in California: Mortality of Persons with Developmental Disabilities after Transfer into Community Care, 1997-1999

Robert Shavelle David Strauss Steven Day

https://doi.org/10.6339/JDS.2005.03(4).224

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 3, Issue 4 (2005), pp. 371–380

Abstract

Abstract: More than 2,000 persons with developmental disability trans ferred from California institutions into community care during 1993 to early 1996. Using data on 1,878 children and adults moved between April 1, 1993 and March 5, 1996, Strauss, Shavelle, Baumeister and Anderson (1998) found a corresponding increase in mortality rates by comparison with those who stayed behind. Shavelle and Strauss (1999) updated the study through 1996 and found similar results. The present study is a further update through 1999. There were 81 deaths, a 47% increase in risk-adjusted mor tality over that expected in institutions (p < 0.01). As in the two previous studies, we found that persons transferred later were at higher risk than those moving earlier, even after adjustment for differences in risk profiles. The difference cannot be explained by the short-term effects of the transfer, and therefore appear to reflect an increased mortality rate associated with the less intensive medical care and supervision available in the community.

Korean Economic Condition Indicator Using a Neural Network Trained on the 1997 Crisis

Tae Yoon Kim Changha Hwang Jongkyu Lee

https://doi.org/10.6339/JDS.2004.02(4).158

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 2, Issue 4 (2004), pp. 371–381

Variable Selection in the Chlamydia Pneumoniae Lung Infection Study

Yuan Kang Nedret Billor

https://doi.org/10.6339/JDS.2013.11(2).1073

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 11, Issue 2 (2013), pp. 371–387

Abstract

Abstract: In this study, the data based on nucleic acid amplification tech niques (Polymerase chain reaction) consisting of 23 different transcript vari ables which are involved to investigate genetic mechanism regulating chlamy dial infection disease by measuring two different outcomes of muring C. pneumonia lung infection (disease expressed as lung weight increase and C. pneumonia load in the lung), have been analyzed. A model with fewer reduced transcript variables of interests at early infection stage has been obtained by using some of the traditional (stepwise regression, partial least squares regression (PLS)) and modern variable selection methods (least ab solute shrinkage and selection operator (LASSO), forward stagewise regres sion and least angle regression (LARS)). Through these variable selection methods, the variables of interest are selected to investigate the genetic mechanisms that determine the outcomes of chlamydial lung infection. The transcript variables Tim3, GATA3, Lacf, Arg2 (X4, X5, X8 and X13) are being detected as the main variables of interest to study the C. pneumonia disease (lung weight increase) or C. pneumonia lung load outcomes. Models including these key variables may provide possible answers to the problem of molecular mechanisms of chlamydial pathogenesis.

Comonotonic Approximations for the Sum of Log Unified Skew Normal Random Variables: Application in Finance and Actuarial Science

Arjun K. Gupta Mohammad A. Azizb

https://doi.org/10.6339/JDS.201504_13(2).0008

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 13, Issue 2 (2015), pp. 369–384

Abstract

The classical works in finance and insurance for modeling asset returns is the Gaussian model. However, when modeling complex random phenomena, more flexible distributions are needed which are beyond the normal distribution. This is because most of the financial and economic data are skewed and have “fat tails”. Hence symmetric distributions like normal or others may not be good choices while modeling these kinds of data. Flexible distributions like skew normal distribution allow robust modeling of high-dimensional multimodal and asymmetric data. In this paper, we consider a very flexible financial model to construct comonotonic lower convex order bounds in approximating the distribution of the sums of dependent log skew normal random variables. The dependence structure of these random variables is based on a recently developed generalized multivariate skew normal distribution, known as unified skew normal distribution. The approximations are used to calculate the risk measure related to the distribution of terminal wealth. The accurateness of the approximation is investigated numerically. Results obtained from our methods are competitive with a more time consuming method known as Monte Carlo method.

Francis Erebholo Victor Apprey Paul Bezandry All authors (4)

https://doi.org/10.6339/JDS.201604_14(2).0008

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 14, Issue 2 (2016), pp. 365–382

Abstract

Abstract: Incomplete data are common phenomenon in research that adopts the longitudinal design approach. If incomplete observations are present in the longitudinal data structure, ignoring it could lead to bias in statistical inference and interpretation. We adopt the disposition model and extend it to the analysis of longitudinal binary outcomes in the presence of monotone incomplete data. The response variable is modeled using a conditional logistic regression model. The nonresponse mechanism is assumed ignorable and developed as a combination of Markov’s transition and logistic regression model. MLE method is used for parameter estimation. Application of our approach to rheumatoid arthritis clinical trials is presented.

Estimating Bivariate Survival Function by Volterra Estimator Using Dynamic Programming Techniques

Jiantian Wang Pablo Zafra

https://doi.org/10.6339/JDS.2009.07(3).489

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 7, Issue 3 (2009), pp. 365–380

Abstract

Abstract: For estimating bivariate survival function under random censor ship, it is commonly believed that the Dabrowska estimator is among the best ones while the Volterra estimator is far from being computational ef ficiency. As we will see, the Volterra estimator is a natural extension of the Kaplan-Meier estimator to bivariate data setting. We believe that the computational ‘inefficiency’ of the Volterra estimator is largely due to the formidable computational complexity of the traditional recursion method. In this paper, we show by numerical study as well as theoretical analysis that the Volterra estimator, once computed by dynamic programming technique, is more computationally efficient than the Dabrowska estimator. Therefore, the Volterra estimator with dynamic programming would be quite recom mendable in applications owing to its significant computational advantages.

On the Generalized Exponentiated Exponential Lindley Distribution

Hok Shing Kwong Saralees Nadarajah

https://doi.org/10.6339/JDS.201904_17(2).0007

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 17, Issue 2 (2019), pp. 363–382

Abstract

The generalized exponentiated exponential Lindley distribution is a novel three parameter distribution due to Hussain et al. (2017). They studied its properties including estimation issues and illustrated applications to four datasets. Here, we show that several known distributions including those having two parameters can provide better fits. We also correct errors in the derivatives of the likelihood function.

A Bimodal Spike and Slab Model for Variable Selection and Model Exploration

Tanujit Dey

https://doi.org/10.6339/JDS.201207_10(3).0002

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 10, Issue 3 (2012), pp. 363–383

Abstract

Abstract: We have developed an enhanced spike and slab model for variable selection in linear regression models via restricted final prediction error (FPE) criteria; classic examples of which are AIC and BIC. Based on our proposed Bayesian hierarchical model, a Gibbs sampler is developed to sample models. The special structure of the prior enforces a unique mapping between sampling a model and calculating constrained ordinary least squares estimates for that model, which helps to formulate the restricted FPE criteria. Empirical comparisons are done to the lasso, adaptive lasso and relaxed lasso; followed by a real life data example.

Imputation Methods for Missing Categorical Questionnaire Data: A Comparison of Approaches

W. Holmes Finch

https://doi.org/10.6339/JDS.2010.08(3).612

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 8, Issue 3 (2010), pp. 361–378

Abstract

Abstract: Missing data are a common problem for researchers working with surveys and other types of questionnaires. Often, respondents do not respond to one or more items, making the conduct of statistical analyses, as well as the calculation of scores difficult. A number of methods have been developed for dealing with missing data, though most of these have focused on continuous variables. It is not clear that these techniques for imputation are appropriate for the categorical items that make up surveys. However, methods of imputation specifically designed for categorical data are either limited in terms of the number of variables they can accommodate, or have not been fully compared with the continuous data approaches used with categorical variables. The goal of the current study was to compare the performance of these explicitly categorical imputation approaches with the more well established continuous method used with categorical item responses. Results of the simulation study based on real data demonstrate that the continuous based imputation approach and a categorical method based on stochastic regression appear to perform well in terms of creating data that match the complete datasets in terms of logistic regression results.

42 43 44 45 46

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China