Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 877

Order by:

Select: All None Download:

A Statistical Method of Detecting Bioremediation

Dechang Chen Michael Fries M. Lyon

https://doi.org/10.6339/JDS.2003.01(1).109

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 1, Issue 1 (2003), pp. 27–41

Bayesian Semi-Parametric Logistic Regression Model with Application to Credit Scoring Data

Haitham M. Yousof Ahmed M. Gad

https://doi.org/10.6339/JDS.201701_15(1).0002

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 15, Issue 1 (2017), pp. 25–40

Abstract

In this article a new Bayesian regression model, called the Bayesian semi-parametric logistic regression model, is introduced. This model generalizes the semi-parametric logistic regression model (SLoRM) and improves its estimation process. The paper considers Bayesian and non-Bayesian estimation and inference for the parametric and semi-parametric logistic regression model with application to credit scoring data under the square error loss function. The paper introduces a new algorithm for estimating the SLoRM parameters using Bayesian theorem in more detail. Finally, the parametric logistic regression model (PLoRM), the SLoRM and the Bayesian SLoRM are used and compared using a real data set.

Maximum Likelihood Estimation for Ascertainment Bias in Sampling Siblings

Balgobin Nandram Jai-Won Choi Hongyan Xu

https://doi.org/10.6339/JDS.201101_09(1).0003

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 1 (2011), pp. 23–41

Abstract

Abstract: When there is a rare disease in a population, it is inefficient to take a random sample to estimate a parameter. Instead one takes a random sample of all nuclear families with the disease by ascertaining at least one affected sibling (proband) of each family. In these studies, an estimate of the proportion of siblings with the disease will be inflated. For example, studies of the issue of whether a rare disease shows an autosomal recessive pattern of inheritance, where the Mendelian segregation ratios are of interest, have been investigated for several decades. How do we correct for this ascertainment bias? Methods, primarily based on maximum likelihood estimation, are available to correct for the ascertainment bias. We show that for ascertainment bias, although maximum likelihood estimation is optimal under asymptotic theory, it can perform badly. The problem is exasperated in the situation where the proband probabilities are allowed to vary with the number of affected siblings. We use two data sets to illustrate the difficulties of maximum likelihood estimation procedure, and we use a simulation study to assess the quality of the maximum likelihood estimators.

Assessing the Effectiveness of Anti-smoking Media Campaigns by Recall and Rating Scores — A Pattern-Mixture GEE Model Approach

Ming Ji Chengjie Xiong

https://doi.org/10.6339/JDS.2007.05(1).319

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 5, Issue 1 (2007), pp. 23–40

Abstract

Abstract: Anti-smoking media campaign is an effective tobacco control strategy. How to identify what types of advertising messages are effective is important for maximizing the use of limited funding sources for such campaigns. In this paper, we propose a statistical modeling approach for systematically assessing the effectiveness of anti-smoking media campaigns based on ad recall rates and rating scores. This research is motivated by the need for evaluating youth responses to the Massachusetts Tobacco Control Program (MTCP) media campaign. Pattern-mixture GEE models are pro posed to evaluate the impact of viewer and ads characteristics on ad recall rates and rating scores controlling for missing values, confounding and cor relations in the data. A key difficulty for pattern-mixture modeling is that there were too many distinct missing data patterns which cause convergence problem for modeling fitting based on limited data. A heuristic argument based on collapsing missing data patterns is used to test the missing com pletely at random (MCAR) assumption in pattern-mixture GEE models. The proposed modeling approach and the recall-rating study design pro vide a complete system for identifying the most effective type of advertising messages.

Estimation for Flexible Weibull Extension under Progressive Type-II Censoring

Sanjay Kumar Singh Umesh Singh Vikas Kumar Sharma All authors (4)

https://doi.org/10.6339/JDS.201501_13(1).0002

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 13, Issue 1 (2015), pp. 21–42

Abstract

Abstract: : In this paper, we discussed classical and Bayes estimation procedures for estimating the unknown parameters as well as the reliability and hazard functions of the flexible Weibull distribution when observed data are collected under progressively Type-II censoring scheme. The performances of the maximum likelihood and Bayes estimators are compared in terms of their mean squared errors through the simulation study. For the computation of Bayes estimates, we proposed the use of Lindley’s approximation and Markov Chain Monte Carlo (MCMC) techniques since the posteriors of the parameters are not analytically tractable. Further, we also derived the one and two sample posterior predictive densities of future samples and obtained the predictive bounds for future observations using MCMC techniques. To illustrate the discussed procedures, a set of real data is analysed.

A Study of Permutation Tests in the Context of a Problem in Primatology

Thomas L. Moore Vicki Bentley-Condit

https://doi.org/10.6339/JDS.2010.08(1).554

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 8, Issue 1 (2010), pp. 21–41

Abstract

Abstract: Female baboons, some with infants, were observed and counts made of interactions in which females interacted with the infants of other females (so-called infant-handling). Independent of these observations, each baboon is assigned a dominance rank of “low,” “medium,”or “high.” Researchers hypothesized that females tend to handle infants of females ranked below them. The data form an array with row-labels being infant labels and columns being female labels. Entry (i, j) counts total infant handlings of infant i by female j. Each count corresponds to one of 9 combinations of female by infant/mother ranks, which induces a 3-by-3 table of total interactions. We use a permutation test to support the research hypothesis, where ranks are permuted at random. We also discuss statistical properties of our method such as choice of test statistic, power, and stability of results to individual observations. We discover that the data support a nuanced view of baboon interaction, where higher-ranked females prefer to handle down the hierarchy, while lower-ranked females must balance the desire to accede to the desires of the high-ranked females while protecting their infants from the potential risks involved in such interactions.

A Dynamic Spatial Model for Chronic Wasting Disease in Colorado

Craig J. Johns Christopher H. Mehl

https://doi.org/10.6339/JDS.2006.04(1).221

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 4, Issue 1 (2006), pp. 21–37

Abstract

Abstract: A spatio-temporal statistical model for Chronic Wasting Disease is presented. The model has underpinnings from traditional epidemic models with differential equations and uses a Bayesian hierarchy to directly incorporate existing prevalence data. Spatial dynamics are modeled explicitly through a system of difference equations rather than through covariance. The posterior distribution gives evidence of a long term stable level of disease prevalence, and approximates the probability of the movement of the disease from one area to another. Predictions for the future of Chronic Wasting Disease in Colorado are given. The model is used to formulate efficient sampling schemes for future data collection.

Interconnectedness in Credit Market: An Empirical Investigation Using UK and US CDS Data

Ramaprasad Bhar

https://doi.org/10.6339/JDS.201601_14(1).0002

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 14, Issue 1 (2016), pp. 19–32

Abstract

Abstract: This paper uses a structural time series methodology to test the notion of interconnectedness between the UK and the US credit markets. The empirical tests utilise data on premium for the Banking sector credit default swaps (CDS) and covers the recent period of financial turmoil. The methodology based on Kalman filter is robust in the presence of limited convergence. The long-term steady state convergence in CDS premium is clearly noticeable between these two markets from the results. This observation lends support for the coordinated regulatory policy initiatives to deal with the crisis and offer suggestions for sound operations of the international financial systems.

Exact Robust Tests for Detecting Candidate-Gene Association in Case-Parents Trio Design

Zehua Chen Gang Zheng

https://doi.org/10.6339/JDS.2005.03(1).186

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 3, Issue 1 (2005), pp. 19–33

Abstract

Abstract: In the case-parents trio design for testing candidate-gene association, the distribution of the data under the null hypothesis of no association is completely known. Therefore, the exact null distribution of any test statistic can be simulated by using Monte-Carlo method. In the literature, several robust tests have been proposed for testing the association in the case-parents trio design when the genetic model is unknown, but all these tests are based on the asymptotic null distributions of the test statistics. In this article, we promote the exact robust tests using Monte-Carlo simulations. It is because: (i) the asymptotic tests are not accurate in terms of the probability of type I error when sample size is small or moderate; (ii) asymptotic theory is not available for certain good candidates of test statistics. We examined the validity of the asymptotic distributions of some of the test statistics studied in the literature and found that in certain cases the probability of type I error is greatly inflated in the asymptotic tests. In this article, we also propose new robust test statistics which are statistically more reasonable but without asymptotic theory available. The powers of these robust statistics are compared with those of the existent statistics in the literature through a simulation study. It is found that these robust statistics are preferable to the others in terms of their efficiency and robustness.

The Role of Rotation Type used to Extract Dietary Patterns through Principal Component Analysis, on their Short-Term Repeatability

Vassiliki Bountziouka Demosthenes B. Panagiotakos

https://doi.org/10.6339/JDS.2012.10(1).1013

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 10, Issue 1 (2012), pp. 19–36

Abstract

Abstract: Principal components analysis (PCA) is a widely used technique in nutritional epidemiology, to extract dietary patterns. To improve the interpretation of the derived patterns, it has been suggested to rotate the axes defined by PCA. This study aimed to evaluate whether rotation influences the repeatability of these patterns. For this reason PCA was applied in nutrient data of 500 participants (37 ± 15 years, 38% male) who were voluntarily enrolled in the study and asked to complete a semi-quantitative food frequency questionnaire (FFQ), twice within 15 days. The varimax and the quartimax orthogonal rotation methods, as well as the non-orthogonal promax and the oblimin methods were applied. The degree of agreement between the similar extracted patterns by each rotation method was assessed using the Bland and Altman method and Kendall’s tau-b coefficient. Good agreement was observed between the two administrations of the FFQ for the un-rotated components, while low-to-moderate agreement was observed for all rotation types (the quartimax and the oblimin method lead to more repeatable results). To conclude, when rotation is needed to improve food patterns’ interpretation, the quartimax and the oblimin methods seems to produce more robust results.

76 77 78 79 80

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China