Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 892

Order by:

Select: All None Download:

Mutstats: An Ultra-fast Computational Method to Determine Clonal Status of Somatic Mutations

Dehua Bi Subhajit Sengupta Tianjian Zhou All authors (4)

https://doi.org/10.6339/21-JDS1016

Pub. online: 1 Jun 2021 Type: Data Science In Action

Journal: Journal of Data Science Volume 19, Issue 3 (2021), pp. 465–484

Abstract

Tumor cell population is a mixture of heterogeneous cell subpopulations, known as subclones. Identification of clonal status of mutations, i.e., whether a mutation occurs in all tumor cells or in a subset of tumor cells, is crucial for understanding tumor progression and developing personalized treatment strategies. We make three major contributions in this paper: (1) we summarize terminologies in the literature based on a unified mathematical representation of subclones; (2) we develop a simulation algorithm to generate hypothetical sequencing data that are akin to real data; and (3) we present an ultra-fast computational method, Mutstats, to infer clonal status of somatic mutations from sequencing data of tumors. The inference is based on a Gaussian mixture model for mutation multiplicities. To validate Mutstats, we evaluate its performance on simulated datasets as well as two breast carcinoma samples from The Cancer Genome Atlas project.

Random Machines: A Bagged-Weighted Support Vector Model with Free Kernel Choice

Anderson Ara Mateus Maia Francisco Louzada All authors (4)

https://doi.org/10.6339/21-JDS1014

Pub. online: 1 Jun 2021 Type: Statistical Data Science

Journal: Journal of Data Science Volume 19, Issue 3 (2021), pp. 409–428

Abstract

Improvement of statistical learning models to increase efficiency in solving classification or regression problems is a goal pursued by the scientific community. Particularly, the support vector machine model has become one of the most successful algorithms for this task. Despite the strong predictive capacity from the support vector approach, its performance relies on the selection of hyperparameters of the model, such as the kernel function that will be used. The traditional procedures to decide which kernel function will be used are computationally expensive, in general, becoming infeasible for certain datasets. In this paper, we proposed a novel framework to deal with the kernel function selection called Random Machines. The results improved accuracy and reduced computational time, evaluated over simulation scenarios, and real-data benchmarking.

Time Series Regression Models for COVID-19 Deaths

Marinho G. Andrade Jorge A. Achcar Katiane S. Conceição All authors (4)

https://doi.org/10.6339/21-JDS991

Pub. online: 7 May 2021 Type: Data Science In Action

Journal: Journal of Data Science Volume 19, Issue 2 (2021): Special issue: Continued Data Science Contributions to COVID-19 Pandemic, pp. 269–292

Abstract

This article develops nonlinear functional forms for modeling count time series of daily deaths due to the COVID-19 virus. Our models explain the mean levels of the time series while accounting for the time-varying variances. A Bayesian approach using Markov chain Monte Carlo (MCMC) is adopted for analysis, inference and forecasting of the time series under the proposed models. Applications are shown for time series of death counts from several countries affected by the pandemic.

Assessment of Effects of Age and Gender on the Incubation Period of COVID-19 with a Mixture Regression Model

Siming Zheng Jing Qin Yong Zhou

https://doi.org/10.6339/21-JDS992

Pub. online: 7 May 2021 Type: Statistical Data Science

Journal: Journal of Data Science Volume 19, Issue 2 (2021): Special issue: Continued Data Science Contributions to COVID-19 Pandemic, pp. 253–268

Abstract

Following the outbreak of COVID-19, various containment measures have been taken, including the use of quarantine. At present, the quarantine period is the same for everyone, since it is implicitly assumed that the incubation period distribution of COVID-19 is the same regardless of age or gender. For testing the effects of age and gender on the incubation period of COVID-19, a novel two-component mixture regression model is proposed. An expectation-maximization (EM) algorithm is adopted to obtain estimates of the parameters of interest, and the simulation results show that the proposed method outperforms the simple regression method and has robustness. The proposed method is applied to a Zhejiang COVID-19 dataset, and it is found that age and gender statistically have no effect on the incubation period of COVID-19, which indicates that the quarantine measure currently in operation is reasonable.

Testing for COVID-19: Some Statistical Issues

Grace Y. Yi Wenqing He Dennis K. J. Lin All authors (4)

https://doi.org/10.6339/21-JDS993

Pub. online: 7 May 2021 Type: Data Science In Action

Journal: Journal of Data Science Volume 19, Issue 2 (2021): Special issue: Continued Data Science Contributions to COVID-19 Pandemic, pp. 243–252

Abstract

The swift spread of the novel coronavirus is largely attributed to its stealthy transmissions in which infected patients may be asymptomatic or exhibit only flu-like symptoms in the early stage. Undetected transmissions present a remarkable challenge for the containment of the virus and pose an appalling threat to the public. An urgent question is on testing of the coronavirus. In this paper, we evaluate the situation from the statistical viewpoint by discussing the accuracy of test procedures and stress the importance of rationally interpreting test results.

Rejoinder: “Evaluate the Risk of Resumption of Business for the States of New York, New Jersey and Connecticut via a Pre-Symptomatic and Asymptomatic Transmission Model of COVID-19”