Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 892

Order by:

Select: All None Download:

A Study of Permutation Tests in the Context of a Problem in Primatology

Thomas L. Moore Vicki Bentley-Condit

https://doi.org/10.6339/JDS.2010.08(1).554

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 8, Issue 1 (2010), pp. 21–41

Abstract

Abstract: Female baboons, some with infants, were observed and counts made of interactions in which females interacted with the infants of other females (so-called infant-handling). Independent of these observations, each baboon is assigned a dominance rank of “low,” “medium,”or “high.” Researchers hypothesized that females tend to handle infants of females ranked below them. The data form an array with row-labels being infant labels and columns being female labels. Entry (i, j) counts total infant handlings of infant i by female j. Each count corresponds to one of 9 combinations of female by infant/mother ranks, which induces a 3-by-3 table of total interactions. We use a permutation test to support the research hypothesis, where ranks are permuted at random. We also discuss statistical properties of our method such as choice of test statistic, power, and stability of results to individual observations. We discover that the data support a nuanced view of baboon interaction, where higher-ranked females prefer to handle down the hierarchy, while lower-ranked females must balance the desire to accede to the desires of the high-ranked females while protecting their infants from the potential risks involved in such interactions.

A Dynamic Spatial Model for Chronic Wasting Disease in Colorado

Craig J. Johns Christopher H. Mehl

https://doi.org/10.6339/JDS.2006.04(1).221

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 4, Issue 1 (2006), pp. 21–37

Abstract

Abstract: A spatio-temporal statistical model for Chronic Wasting Disease is presented. The model has underpinnings from traditional epidemic models with differential equations and uses a Bayesian hierarchy to directly incorporate existing prevalence data. Spatial dynamics are modeled explicitly through a system of difference equations rather than through covariance. The posterior distribution gives evidence of a long term stable level of disease prevalence, and approximates the probability of the movement of the disease from one area to another. Predictions for the future of Chronic Wasting Disease in Colorado are given. The model is used to formulate efficient sampling schemes for future data collection.

Interconnectedness in Credit Market: An Empirical Investigation Using UK and US CDS Data

Ramaprasad Bhar

https://doi.org/10.6339/JDS.201601_14(1).0002

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 14, Issue 1 (2016), pp. 19–32

Abstract

Abstract: This paper uses a structural time series methodology to test the notion of interconnectedness between the UK and the US credit markets. The empirical tests utilise data on premium for the Banking sector credit default swaps (CDS) and covers the recent period of financial turmoil. The methodology based on Kalman filter is robust in the presence of limited convergence. The long-term steady state convergence in CDS premium is clearly noticeable between these two markets from the results. This observation lends support for the coordinated regulatory policy initiatives to deal with the crisis and offer suggestions for sound operations of the international financial systems.

Exact Robust Tests for Detecting Candidate-Gene Association in Case-Parents Trio Design

Zehua Chen Gang Zheng

https://doi.org/10.6339/JDS.2005.03(1).186

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 3, Issue 1 (2005), pp. 19–33

Abstract

Abstract: In the case-parents trio design for testing candidate-gene association, the distribution of the data under the null hypothesis of no association is completely known. Therefore, the exact null distribution of any test statistic can be simulated by using Monte-Carlo method. In the literature, several robust tests have been proposed for testing the association in the case-parents trio design when the genetic model is unknown, but all these tests are based on the asymptotic null distributions of the test statistics. In this article, we promote the exact robust tests using Monte-Carlo simulations. It is because: (i) the asymptotic tests are not accurate in terms of the probability of type I error when sample size is small or moderate; (ii) asymptotic theory is not available for certain good candidates of test statistics. We examined the validity of the asymptotic distributions of some of the test statistics studied in the literature and found that in certain cases the probability of type I error is greatly inflated in the asymptotic tests. In this article, we also propose new robust test statistics which are statistically more reasonable but without asymptotic theory available. The powers of these robust statistics are compared with those of the existent statistics in the literature through a simulation study. It is found that these robust statistics are preferable to the others in terms of their efficiency and robustness.

The Role of Rotation Type used to Extract Dietary Patterns through Principal Component Analysis, on their Short-Term Repeatability

Vassiliki Bountziouka Demosthenes B. Panagiotakos

https://doi.org/10.6339/JDS.2012.10(1).1013

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 10, Issue 1 (2012), pp. 19–36

Abstract

Abstract: Principal components analysis (PCA) is a widely used technique in nutritional epidemiology, to extract dietary patterns. To improve the interpretation of the derived patterns, it has been suggested to rotate the axes defined by PCA. This study aimed to evaluate whether rotation influences the repeatability of these patterns. For this reason PCA was applied in nutrient data of 500 participants (37 ± 15 years, 38% male) who were voluntarily enrolled in the study and asked to complete a semi-quantitative food frequency questionnaire (FFQ), twice within 15 days. The varimax and the quartimax orthogonal rotation methods, as well as the non-orthogonal promax and the oblimin methods were applied. The degree of agreement between the similar extracted patterns by each rotation method was assessed using the Bland and Altman method and Kendall’s tau-b coefficient. Good agreement was observed between the two administrations of the FFQ for the un-rotated components, while low-to-moderate agreement was observed for all rotation types (the quartimax and the oblimin method lead to more repeatable results). To conclude, when rotation is needed to improve food patterns’ interpretation, the quartimax and the oblimin methods seems to produce more robust results.

Does Sentiment or Anxiety Drive Consumer Demand?

Gordon G. Bechtel

https://doi.org/10.6339/JDS.2014.12(1).1151

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 12, Issue 1 (2014), pp. 19–34

Abstract

Abstract: The University of Michigan’s Consumer Sentiment Index has pre occupied politicians, journalists, and Wall Street for decades (Uchitelle, 2002). This American economic indicator is now co-published with Thomson Reuters in London. The international reach of this index cries out for an other look at George Katona’s consumer sentiment construct as a predictor of consumer demand. Regressions from the British Household Panel Sur vey (BHPS) show that consumer sentiment is ineffectual in predicting micro variation in discretionary spending between consumers, within consumers over time, or between and within consumers overall. Moreover, consumer sentiment bears no relationship whatsoever to national consumer demand over annual BHPS surveys from 1997 to 2008. In contrast, an indicator of economic anxiety accounts for all three types of variation in micro demand, as well as variation in macro demand over time.

The Poisson Inverse Gaussian Regression Model in the Analysis of Clustered Counts Data

M. M. Shoukri M. H. Asyali R. VanDorp All authors (4)

https://doi.org/10.6339/JDS.2004.02(1).135

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 2, Issue 1 (2004), pp. 17–32

Abstract

Abstract: We explore the possibility of modeling clustered count data using the Poisson Inverse Gaussian distribution. We develop a regression model, which relates the number of mastitis cases in a sample of dairy farms in Ontario, Canada, to various farm level covariates, to illustrate the method ology. Residual plots are constructed to explore the quality of the fit. We compare the results with a negative binomial regression model using max imum likelihood estimation, and to the generalized linear mixed regression model fitted in SAS.

Asymptotic Equivalence between Cross-Validations and Akaike Information Criteria in Mixed-Effects Models

Yixin Fang

https://doi.org/10.6339/JDS.201101_09(1).0002

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 1 (2011), pp. 15–21

Abstract

Abstract: For model selection in mixed effects models, Vaida and Blan chard (2005) demonstrated that the marginal Akaike information criterion is appropriate as to the questions regarding the population and the conditional Akaike information criterion is appropriate as to the questions regarding the particular clusters in the data. This article shows that the marginal Akaike information criterion is asymptotically equivalent to the leave-one-cluster-out cross-validation and the conditional Akaike information criterion is asymptotically equivalent to the leave-one-observation-out cross-validation.

Indirect Area Estimates of Disease Prevalence: Bayesian Evidence Synthesis with an Application to Coronary Heart Disease

Peter Congdon

https://doi.org/10.6339/JDS.2008.06(1).387

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 6, Issue 1 (2008), pp. 15–32

Abstract

Abstract: Risks for many chronic diseases (coronary heart disease, can cer, mental illness, diabetes, asthma, etc) are strongly linked both to socio economic and ethnic group and so prevalence varies considerably between areas. Variations in prevalence are important in assessing health care needs and in comparing health care provision (e.g. of surgical intervention rates) to health need. This paper focuses on estimating prevalence of coronary heart disease and uses a Bayesian approach to synthesise information of dif ferent types to make indirect prevalence estimates for geographic units where prevalence data are not otherwise available. One source is information on prevalence risk gradients from national health survey data; such data typ ically provide only regional identifiers (for confidentiality reasons) and so gradients by age, sex, ethnicity, broad region, and socio-economic status may be obtained by regression methods. Often a series of health surveys is available and one may consider pooling strength over surveys by using information on prevalence gradients from earlier surveys (e.g. via a power prior approach). The second source of information is population totals by age, sex, ethnicity, etc from censuses or intercensal population estimates, to which survey based prevalence rates are applied. The other potential data source is information on area mortality, since for heart disease and some other major chronic diseases there is a positive correlation over areas be tween prevalence of disease and mortality from that disease. A case study considers the development of estimates of coronary heart disease prevalence in 354 English areas using (a) data from the Health Surveys for England for 2003 and 1999 (b) population data from the 2001 UK Census, and (c) area mortality data for 2003.

Mixed-effect Models for Truncated Longitudinal Outcomes with Nonignorable Missing Data

Sujuan Gao Rodolphe Thi´ebaut

https://doi.org/10.6339/JDS.2009.07(1).418

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 7, Issue 1 (2009), pp. 13–25

Abstract

Abstract: Mixed effects models are often used for estimating fixed effects and variance components in continuous longitudinal outcomes. An EM based estimation approach for mixed effects models when the outcomes are truncated was proposed by Hughes (1999). We consider the situation when the longitudinal outcomes are also subject to non-ignorable missing in addition to truncation. A shared random effect parameter model is presented where the missing data mechanism depends on the random effects used to model the longitudinal outcomes. Data from the Indianapolis-Ibadan dementia project is used to illustrate the proposed approach

78 79 80 81 82

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China