Home
Search

Journal of Data Science

Submit your article Information

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 815

Order by:

Select: All None Download:

BDNNSurv: Bayesian Deep Neural Networks for Survival Analysis Using Pseudo Values

Dai Feng Lili Zhao

https://doi.org/10.6339/21-JDS1018

Pub. online: 13 Aug 2021 Type: Statistical Data Science

Journal: Journal of Data Science Volume 19, Issue 4 (2021), pp. 542–554

Abstract

There has been increasing interest in modeling survival data using deep learning methods in medical research. In this paper, we proposed a Bayesian hierarchical deep neural networks model for modeling and prediction of survival data. Compared with previously studied methods, the new proposal can provide not only point estimate of survival probability but also quantification of the corresponding uncertainty, which can be of crucial importance in predictive modeling and subsequent decision making. The favorable statistical properties of point and uncertainty estimates were demonstrated by simulation studies and real data analysis. The Python code implementing the proposed approach was provided.

Shape-Restricted Regression Splines with R Package splines2

Wenjie Wang

Jun Yan

https://doi.org/10.6339/21-JDS1020

Pub. online: 12 Aug 2021 Type: Computing In Data Science

Journal: Journal of Data Science Volume 19, Issue 3 (2021), pp. 498–517

Abstract

Splines are important tools for the flexible modeling of curves and surfaces in regression analyses. Functions for constructing spline basis functions are available in R through the base package splines. When the curves to be modeled have known characteristics in monotonicity or curvature, more efficient statistical inferences are possible with shape-restricted splines. Such splines, however, are not available in the R package splines. The package splines2 provides easy-to-use shape-restricted spline basis functions, along with their derivatives and integrals which are important tools in many inference scenarios. It also provides additional splines and features that are not available in the splines package, such as periodic splines and generalized Bernstein polynomials. The usages of the functions are illustrated with shape-restricted regression, recurrent event data analysis, and extreme-value copulas.

Sign-based Shrinkage Based on an Asymmetric LASSO Penalty

Eric S. Kawaguchi Burcu F. Darst Kan Wang All authors (4)

https://doi.org/10.6339/21-JDS1015

Pub. online: 2 Jun 2021 Type: Statistical Data Science

Journal: Journal of Data Science Volume 19, Issue 3 (2021), pp. 429–449

Abstract

Penalized regression provides an automated approach to preform simultaneous variable selection and parameter estimation and is a popular method to analyze high-dimensional data. Since the conception of the LASSO in the mid-to-late 1990s, extensive research has been done to improve penalized regression. The LASSO, and several of its variations, performs penalization symmetrically around zero. Thus, variables with the same magnitude are shrunk the same regardless of the direction of effect. To the best of our knowledge, sign-based shrinkage, preferential shrinkage based on the sign of the coefficients, has yet to be explored under the LASSO framework. We propose a generalization to the LASSO, asymmetric LASSO, that performs sign-based shrinkage. Our method is motivated by placing an asymmetric Laplace prior on the regression coefficients, rather than a symmetric Laplace prior. This corresponds to an asymmetric ${\ell _{1}}$ penalty under the penalized regression framework. In doing so, preferential shrinkage can be performed through an auxiliary tuning parameter that controls the degree of asymmetry. Our numerical studies indicate that the asymmetric LASSO performs better than the LASSO when effect sizes are sign skewed. Furthermore, in the presence of positively-skewed effects, the asymmetric LASSO is comparable to the non-negative LASSO without the need to place an a priori constraint on the effect estimates and outperforms the non-negative LASSO when negative effects are also present in the model. A real data example using the breast cancer gene expression data from The Cancer Genome Atlas is also provided, where the asymmetric LASSO identifies two potentially novel gene expressions that are associated with BRCA1 with a minor improvement in prediction performance over the LASSO and non-negative LASSO.

Mutstats: An Ultra-fast Computational Method to Determine Clonal Status of Somatic Mutations

Dehua Bi Subhajit Sengupta Tianjian Zhou All authors (4)

https://doi.org/10.6339/21-JDS1016

Pub. online: 1 Jun 2021 Type: Data Science In Action

Journal: Journal of Data Science Volume 19, Issue 3 (2021), pp. 465–484

Abstract

Tumor cell population is a mixture of heterogeneous cell subpopulations, known as subclones. Identification of clonal status of mutations, i.e., whether a mutation occurs in all tumor cells or in a subset of tumor cells, is crucial for understanding tumor progression and developing personalized treatment strategies. We make three major contributions in this paper: (1) we summarize terminologies in the literature based on a unified mathematical representation of subclones; (2) we develop a simulation algorithm to generate hypothetical sequencing data that are akin to real data; and (3) we present an ultra-fast computational method, Mutstats, to infer clonal status of somatic mutations from sequencing data of tumors. The inference is based on a Gaussian mixture model for mutation multiplicities. To validate Mutstats, we evaluate its performance on simulated datasets as well as two breast carcinoma samples from The Cancer Genome Atlas project.

Random Machines: A Bagged-Weighted Support Vector Model with Free Kernel Choice

Anderson Ara Mateus Maia Francisco Louzada All authors (4)

https://doi.org/10.6339/21-JDS1014

Pub. online: 1 Jun 2021 Type: Statistical Data Science

Journal: Journal of Data Science Volume 19, Issue 3 (2021), pp. 409–428

Abstract

Improvement of statistical learning models to increase efficiency in solving classification or regression problems is a goal pursued by the scientific community. Particularly, the support vector machine model has become one of the most successful algorithms for this task. Despite the strong predictive capacity from the support vector approach, its performance relies on the selection of hyperparameters of the model, such as the kernel function that will be used. The traditional procedures to decide which kernel function will be used are computationally expensive, in general, becoming infeasible for certain datasets. In this paper, we proposed a novel framework to deal with the kernel function selection called Random Machines. The results improved accuracy and reduced computational time, evaluated over simulation scenarios, and real-data benchmarking.

Time Series Regression Models for COVID-19 Deaths

Marinho G. Andrade Jorge A. Achcar Katiane S. Conceição All authors (4)

https://doi.org/10.6339/21-JDS991

Pub. online: 7 May 2021 Type: Data Science In Action

Journal: Journal of Data Science Volume 19, Issue 2 (2021), pp. 269–292

Abstract

This article develops nonlinear functional forms for modeling count time series of daily deaths due to the COVID-19 virus. Our models explain the mean levels of the time series while accounting for the time-varying variances. A Bayesian approach using Markov chain Monte Carlo (MCMC) is adopted for analysis, inference and forecasting of the time series under the proposed models. Applications are shown for time series of death counts from several countries affected by the pandemic.

Assessment of Effects of Age and Gender on the Incubation Period of COVID-19 with a Mixture Regression Model

Siming Zheng Jing Qin Yong Zhou

https://doi.org/10.6339/21-JDS992

Pub. online: 7 May 2021 Type: Statistical Data Science

Journal: Journal of Data Science Volume 19, Issue 2 (2021), pp. 253–268

Abstract

Following the outbreak of COVID-19, various containment measures have been taken, including the use of quarantine. At present, the quarantine period is the same for everyone, since it is implicitly assumed that the incubation period distribution of COVID-19 is the same regardless of age or gender. For testing the effects of age and gender on the incubation period of COVID-19, a novel two-component mixture regression model is proposed. An expectation-maximization (EM) algorithm is adopted to obtain estimates of the parameters of interest, and the simulation results show that the proposed method outperforms the simple regression method and has robustness. The proposed method is applied to a Zhejiang COVID-19 dataset, and it is found that age and gender statistically have no effect on the incubation period of COVID-19, which indicates that the quarantine measure currently in operation is reasonable.

Testing for COVID-19: Some Statistical Issues

Grace Y. Yi Wenqing He Dennis K. J. Lin All authors (4)

https://doi.org/10.6339/21-JDS993

Pub. online: 7 May 2021 Type: Data Science In Action

Journal: Journal of Data Science Volume 19, Issue 2 (2021), pp. 243–252

Abstract

The swift spread of the novel coronavirus is largely attributed to its stealthy transmissions in which infected patients may be asymptomatic or exhibit only flu-like symptoms in the early stage. Undetected transmissions present a remarkable challenge for the containment of the virus and pose an appalling threat to the public. An urgent question is on testing of the coronavirus. In this paper, we evaluate the situation from the statistical viewpoint by discussing the accuracy of test procedures and stress the importance of rationally interpreting test results.

Rejoinder: “Evaluate the Risk of Resumption of Business for the States of New York, New Jersey and Connecticut via a Pre-Symptomatic and Asymptomatic Transmission Model of COVID-19”

Ting Tian Jianbin Tan Yukang Jiang All authors (5)

https://doi.org/10.6339/21-JDS994REJ

Pub. online: 7 May 2021 Type: Discussion

Journal: Journal of Data Science Volume 19, Issue 2 (2021), pp. 215–218

Discussion of “Evaluate the Risk of Resumption of Business for the States of New York, New Jersey and Connecticut via a Pre-Symptomatic and Asymptomatic Transmission Model of COVID-19”

Chuanrong Zhang Xinba Li

https://doi.org/10.6339/21-JDS994E

Pub. online: 7 May 2021 Type: Discussion

Journal: Journal of Data Science Volume 19, Issue 2 (2021), pp. 210–214

77 78 79 80 81

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

Journal of data science

Online ISSN: 1683-8602
Print ISSN: 1680-743X

About

About journal

For contributors

Submit
OA Policy
Become a Peer-reviewer

Contact us

JDS@ruc.edu.cn
No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China