Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 892

Order by:

Select: All None Download:

How to Find Multiple Systems Underlying a Two-Way Table of 0’s and 1’s, With Applications to Cognitive Impairments and Medical Laboratory Science

T. P. Hutchinson

https://doi.org/10.6339/JDS.2007.05(3).289

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 5, Issue 3 (2007), pp. 335–356

Abstract

Abstract: Datasets are sometimes encountered that consist of a two-way table of 0’s and 1’s. For example, this might show which patients are im paired on which of a battery of tests, or which compounds are successful at inactivating which of several micro-organisms. The present paper describes a method of analysing such tables, that reveals and specifies two (or more) systems or modes of action, if indeed they are needed to explain the data. The approach is an extension of what, in the context of cognitive impair ments, is termed double dissociation. In order to be simple enough to be practicable, the approach is deterministic rather than probabilistic.

Parametric Fractional Imputation for Longitudinal Data with Intermittent Missing Values

Ahmed M. Gad Hanan E. G. Ahmed

https://doi.org/10.6339/JDS.201904_17(2).0005

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 17, Issue 2 (2019), pp. 331–348

Abstract

Longitudinal data analysis had been widely developed in the past three decades. Longitudinal data are common in many fields such as public health, medicine, biological and social sciences. Longitudinal data have special nature as the individual may be observed during a long period of time. Hence, missing values are common in longitudinal data. The presence of missing values leads to biased results and complicates the analysis. The missing values have two patterns: intermittent and dropout. The missing data mechanisms are missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). The appropriate analysis relies heavily on the assumed mechanism and pattern. The parametric fractional imputation is developed to handle longitudinal data with intermittent missing pattern. The maximum likelihood estimates are obtained and the Jackkife method is used to obtain the standard errors of the parameters estimates. Finally a simulation study is conducted to validate the proposed approach. Also, the proposed approach is applied to a real data.

Estimation of Lifetime Distribution with Missing Censoring

Jiantian Wang

https://doi.org/10.6339/JDS.201107_09(2).0002

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 9, Issue 3 (2011), pp. 331–343

Abstract

Abstract: This paper considers the estimation of lifetime distribution based on missing-censoring data. Using the simple empirical approach rather than the maximum likelihood argument, we obtain the parametric estimations of lifetime distribution under the assumption that the failure time follows exponential or gamma distribution. We also derive the nonparametric estimation for both continuous and discrete failure distributions under the assumption that the censoring distribution is known. The loss of efficiency due to missing-censoring is shown to be generally small if the data model is specified correctly. Identifiability issue of the lifetime distribution with missing-censoring data is also addressed.

Measurement Errors and Imperfect Detection Rates on the Transect Line in Independent Observer Line Transect Surveys

Shenghua Kelly Fan

https://doi.org/10.6339/JDS.2009.07(3).476

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 7, Issue 3 (2009), pp. 331–347

Abstract

Abstract: This paper proposes a parametric method for estimating animal abundance using data from independent observer line transect surveys. This method allows measurement errors in distance and size, and less than 100% detection rates on the transect line. Based on data from southern bluefin tuna surveys and data from a mike whale survey, simulation studies were conducted and the results show that 1) the proposed estimates agree well with the true values, 2) the effect of small measurement errors in distance could still be large if measurements on size are biased, and 3) incorrectly as suming 100% detection rates on the transect line will greatly underestimate the animal abundance.

A Unified Computational Framework to Compare Direct and Sequential False Discovery Rate Algorithms for Exploratory DNA Microarray Studies

Danh V. Nguyen

https://doi.org/10.6339/JDS.2005.03(4).239

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 3, Issue 4 (2005), pp. 331–352

Abstract

Abstract: The problem of detecting differential gene expression with mi croarray data has led to further innovative approaches to controlling false positives in multiple testing. False discovery rate (FDR) has been widely used as a measure of error in this multiple testing context. Direct estima tion of FDR was recently proposed by Storey (2002, Journal of the Royal Statistical Society, Series B 64, 479-498) as a substantially more powerful al ternative to the traditional sequential FDR controlling procedure, pioneered by Benjamini and Hochberg (1995, Journal of the Royal Statistical Society, Series B 57, 289-300). Direct estimation to FDR requires fixing a rejection region of interest and then conservatively estimating the associated FDR. On the other hand, sequential FDR procedure requires fixing a FDR control level and then estimating the rejection region. Thus, sequential and direct approaches to FDR control appear very different. In this paper, we intro duce a unified computational framework for sequential FDR methods and propose a class of more powerful sequential FDR algorithms, that link the direct and sequential approaches. Under the proposed unified compuational framework, both approaches simply approximate the least conservative (op timal) sequential FDR procedure. We illustrate the FDR algorithms and concepts with some numerical studies (simulations) and with two real ex ploratory DNA microarray studies, one on the detection of molecular signa tures in BRCA-mutation breast cancer patients and another on the detection of genetic signatures during colon cancer initiation and progression in the rat.

General Semiparametric Area Under the Curve Regression Model with Discrete Covariates

Som B. Bohora Yan D. Zhao Tatiana N. Balachova

https://doi.org/10.6339/JDS.201704_15(2).0009

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 15, Issue 2 (2017), pp. 329–344

Abstract

In this article, we considered the analysis of data with a non-normally distributed response variable. In particular, we extended an existing Area Under the Curve (AUC) regression model that handles only two discrete covariates to a general AUC regression model that can be used to analyze data with unrestricted number of discrete covariates. Comparing with other similar methods which require iterative algorithms and bootstrap procedure, our method involved only closed-form formulae for parameter estimation. Additionally, we also discussed the issue of model identifiability. Our model has broad applicability in clinical trials due to the ease of interpretation on model parameters. We applied our model to analyze a clinical trial evaluating the effects of educational brochures for preventing Fetal Alcohol Spectrum Disorders (FASD). Finally, for a variety of simulation scenarios, our method produced parameter estimates with small biases and confidence intervals with nominal coverage probabilities.

Estimating Optimal Transformations for Multiple Regression Using the ACE Algorithm

Duolao Wang Michael Murphy

https://doi.org/10.6339/JDS.2004.02(4).156

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 2, Issue 4 (2004), pp. 329–346

The Odd Lindley Burr XII Model: Bayesian Analysis, Classical Inference and Characterizations

Mustafa C¸a˘gatay Korkmaz Haitham M. Yousof Mahdi Rasekhi All authors (4)

https://doi.org/10.6339/JDS.201804_16(2).0006

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 16, Issue 2 (2018), pp. 327–354

Abstract

In this work, we study the odd Lindley Burr XII model initially introduced by Silva et al. [29]. This model has the advantage of being capable of modeling various shapes of aging and failure criteria. Some of its statistical structural properties including ordinary and incomplete moments, quantile and generating function and order statistics are derived. The odd Lindley Burr XII density can be expressed as a simple linear mixture of BurrXII densities. Useful characterizations are presented. The maximum likelihood method is used to estimate the model parameters. Simulation results to assess the performance of the maximum likelihood estimators are discussed. We prove empirically the importance and flexibility of the new model in modeling various types of data. Bayesian estimation is performed by obtaining the posterior marginal distributions as well as using the simulation method of Markov Chain Monte Carlo (MCMC) by the Metropolis-Hastings algorithm in each step of Gibbs algorithm. The trace plots and estimated conditional posterior distributions are also presented.

Combining Unsupervised and Supervised Neural Networks in Cluster Analysis of Gamma-Ray Burst

Basilio de B. Pereira Calyampudi R. Rao Rubens L. Oliveira All authors (4)

https://doi.org/10.6339/JDS.2010.08(2).394

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 8, Issue 2 (2010), pp. 327–338

Abstract

Abstract: The paper proposes the use of Kohonen’s Self Organizing Map (SOM), and supervised neural networks to find clusters in samples of gammaray burst (GRB) using the measurements given in BATSE GRB. The extent of separation between clusters obtained by SOM was examined by cross validation procedure using supervised neural networks for classification. A method is proposed for variable selection to reduce the “curse of dimensionality”. Six variables were chosen for cluster analysis. Additionally, principal components were computed using all the original variables and 6 components which accounted for a high percentage of variance was chosen for SOM analysis. All these methods indicate 4 or 5 clusters. Further analysis based on the average profiles of the GRB indicated a possible reduction in the number of clusters.

A New Polya Tree Construction Facilitating A Goodness-of-Fit Test

Yuhui Chen

https://doi.org/10.6339/JDS.201404_12(2).0007

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 12, Issue 2 (2014), pp. 325–338

Abstract

Abstract: Polya tree, by embedding parametric families as a special case, provides natural suit to test goodness of fit of a parametric null with non parametric alternatives. For this purpose, we present a new construction on Polya tree for random probability measure, which aims to perform an easy multiple χ 2 test for goodness of fit. Examples of data analyses are provided in simulation studies to highlight the performance of the proposed methods.

47 48 49 50 51

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China