Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 892

Order by:

Select: All None Download:

Comparing Reliabilities of the Strength of Two Container Designs: A Case Study

Esteban Walker Frank Guess

https://doi.org/10.6339/JDS.2003.01(2).113

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 1, Issue 2 (2003), pp. 185–197

Estimation in Zero-Inflated Generalized Poisson Distribution

Kirtee K. Kamalja Yogita S. Wagh

https://doi.org/10.6339/JDS.201801_16(1).0010

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 16, Issue 1 (2018), pp. 183–206

Abstract

Overdispersion is a common phenomenon in Poisson modelling. The generalized Poisson (GP) distribution accommodates both overdispersion and under dispersion in count data. In this paper, we briefly overview different overdispersed and zero-inflated regression models. To study the impact of fitting inaccurate model to data simulated from some other model, we simulate data from ZIGP distribution and fit Poisson, Generalized Poisson (GP), Zero-inflated Poisson (ZIP), Zero-inflated Generalized Poisson (ZIGP) and Zero-inflated Negative Binomial (ZINB) model. We compare the performance of the estimates of Poisson, GP, ZIP, ZIGP and ZINB through mean square error, bias and standard error when the samples are generated from ZIGP distribution. We propose estimators of parameters of ZIGP distribution based on the first two sample moments and proportion of zeros referred to as MOZE estimator and compare its performance with maximum likelihood estimate (MLE) through a simulation study. It is observed that MOZE are almost equal or even more efficient than that of MLE of the parameters of ZIGP distribution.

Principal Component Analysis in Linear Regression Survival Model with Microarray Data

Steven Ma

https://doi.org/10.6339/JDS.2007.05(2).326

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 5, Issue 2 (2007), pp. 183–198

Abstract

Abstract: As a useful alternative to the Cox proportional hazards model, the linear regression survival model assumes a linear relationship between the covariates and a known monotone transformation, for example logarithm, of an event time of interest. In this article, we study the linear regression survival model with right censored survival data, when high-dimensional microarray measurements are present. Such data may arise in studies in vestigating the statistical influence of molecular features on survival risk. We propose using the principal component regression (PCR) technique for model reduction based on the weight least squared Stute estimate. Com pared with other model reduction techniques, the PCR approach is relatively insensitive to the number of covariates and hence suitable for high dimen sional microarray data. Component selection based on the nonparametric bootstrap, and model evaluation using the time-dependent ROC (receiver operating characteristic) technique are investigated. We demonstrate the proposed approach with datasets from two microarray gene expression pro filing studies of lymphoma cancers

The Stratification Analysis of Sediment Data for Lake Michigan

Xiangsheng Xia David H. Miller

https://doi.org/10.6339/JDS.201104_09(2).0004

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 2 (2011), pp. 181–203

Abstract

Abstract: Accurately understanding the distribution of sediment measurements within large water bodies such as Lake Michigan is critical for modeling and understanding of carbon, nitrogen, silica, and phosphorus dynamics. Several water quality models have been formulated and applied to the Great Lakes to investigate the fate and transport of nutrients and other constituents, as well as plankton dynamics. This paper summarizes the development of spatial statistical tools to study and assess the spatial trends of the sediment data sets, which were collected from Lake Michigan, as part of Lake Michigan Mass Balance Study. Several new spatial measurements were developed to quantify the spatial variation and continuity of sediment data sets under concern. The applications of the newly designed spatial measurements on the sediment data, in conjunction with descriptive statistics, clearly reveal the existence of the intrinsic structure of strata, which is hypothesized based on linear wave theory. Furthermore, a new concept of strata consisting of two components defined based on depth is proposed and justified. The findings presented in this paper may impact the future studies of sediment within Lake Michigan and all of the Great Lakes as well.

Derivation of Sample Size Formula for Cluster Randomized Trials with Binary Responses Using a General Continuity Correction Factor and Identification of Optimal Settings for Small Event Rates

Majnu John

https://doi.org/10.6339/JDS.2013.11(1).1089

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 11, Issue 1 (2013), pp. 181–203

Abstract

Abstract: Trials for comparing interventions where cluster of subjects, rather than individuals, are randomized, are commonly called cluster randomized trials (CRTs). For comparison of binary outcomes in a CRT, although there are a few published formulations for sample size computation, the most commonly used is the one developed by Donner, Birkett, and Buck (Am J Epidemiol, 1981) probably due to its incorporation in the text book by Fleiss, Levin, and Paik (Wiley, 2003). In this paper, we derive a new χ 2 approximation formula with a general continuity correction factor (c) and show that specially for the scenarios of small event rates (< 0.01), the new formulation recommends lower number of clusters than the Donner et al. formulation thereby providing better efficiency. All known formulations can be shown to be special cases at specific value of the general correction factor (e.g., Donner formulation is equivalent to the new formulation for c = 1). Statistical simulation is presented with data on comparative efficacy of the available methods identifying correction factors that are optimal for rare event rates. Table of sample size recommendation for variety of rare event rates along with code in“R” language for easy computation of sample size in other settings is also provided. Sample size calculations for a published CRT (“Pathways to Health study” that evaluates the value of intervention for smoking cessation) are computed for various correction factors to illustrate that with an optimal choice of the correction factor, the study could have maintained the same power with a 20% less sample size.

Measuring the Attenuation in a Subject-specific Random Effect with Paired Data

G. Jones A.D.L Noble B. Schauer All authors (4)

https://doi.org/10.6339/JDS.2009.07(2).449

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 7, Issue 2 (2009), pp. 179–188

Abstract

Abstract: This paper is motivated by an investigation into the growth of pigs, which studied among other things the effect of short–term feed with drawal on live weight. This treatment was thought to reduce the variability in the weights of the pigs. We represent this reduction as an attenuation in an animal–specific random effect. Given data on each pig before and after treatment, we consider the problems of testing for a treatment effect and measuring the strength of the effect, if significant. These problems are related to those of testing the homogeneity of correlated variances, and re gression with errors in variables. We compare three different estimates of the attenuation factor using data on the live weights of pigs, and by simulation.

Designing for Parameter Subsets in Gaussian Nonlinear Regression Models

Timothy E. O’Brien

https://doi.org/10.6339/JDS.2005.03(2).190

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 3, Issue 2 (2005), pp. 179–197

Abstract

Abstract: This article presents and illustrates several important subset design approaches for Gaussian nonlinear regression models and for linear models where interest lies in a nonlinear function of the model parameters. These design strategies are particularly useful in situations where currentlyused subset design procedures fail to provide designs which can be used to fit the model function. Our original design technique is illustrated in conjuction with D-optimality, Bayesian D-optimality and Kiefer’s Φk-optimality, and is extended to yield subset designs which take account of curvature.

Scheff´e Style Simultaneous Credible Bands for Regression Surfaces with Application to Ache Honey Gathering

Timothy Hanson Garnett P. McMillan

https://doi.org/10.6339/JDS.2012.10(2).1022

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 10, Issue 2 (2012), pp. 175–193

Abstract

Abstract: We propose two simple, easy-to-implement methods for obtaining simultaneous credible bands in hierarchical models from standard Markov chain Monte Carlo output. The methods generalize Scheff´e’s (1953) approach to this problem, but in a Bayesian context. A small simulation study is followed by an application of the methods to a seasonal model for Ache honey gathering.

A Comparison of Statistical Tools for Identifying Modality in Body Mass Distributions

Ling Xu Edward J. Bedrick Timothy Hanson All authors (4)

https://doi.org/10.6339/JDS.2014.12(1).1201

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 12, Issue 1 (2014), pp. 175–196

Abstract

Abstract: The assessment of modality or “bumps” in distributions is of in terest to scientists in many areas. We compare the performance of four statistical methods to test for departures from unimodality in simulations, and further illustrate the four methods using well-known ecological datasets on body mass published by Holling in 1992 to illustrate their advantages and disadvantages. Silverman’s kernel density method was found to be very conservative. The excess mass test and a Bayesian mixture model approach showed agreement among the data sets, whereas Hall and York’s test pro vided strong evidence for the existence of two or more modes in all data sets. The Bayesian mixture model also provided a way to quantify the un certainty associated with the number of modes. This work demonstrates the inherent richness of animal body mass distributions but also the difficulties for characterizing it, and ultimately understanding the processes underlying them.

Evaluating Aortic Stenosis Using the Archimedean Copula Methodology

Pranesh Kumar Mohamed M. Shoukri

https://doi.org/10.6339/JDS.2008.06(2).425

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 6, Issue 2 (2008), pp. 173–187

Abstract

Abstract: In modeling and analyzing multivariate data, the conventionally used measure of dependence structure is the Pearson’s correlation coeffi cient. However use of the correlation as a dependence measure has several pitfalls. Copulas recently have emerged as an alternative measure of the de pendence, overcoming most of the drawbacks of the correlation. We discuss Archimedean copulas and their relationships with tail dependence. An algo rithm to construct empirical and Archimedean copulas is described. Monte Carlo simulations are carried out to replicate and analyze data sets by iden tifying the appropriate copula. We apply the Archimedean copula based methodology to assess the accuracy of Doppler echocardiography in deter mining aortic valve area from the Aortic Stenosis: Simultaneous Doppler – Catheter Correlative study carried out at the King Faisal Specialist Hospital and Research Centre, Riyadh, KSA

62 63 64 65 66

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China