Extensive literature has been proposed for the analysis of correlated survival data. Subjects within a cluster share some common characteristics, e.g., genetic and environmental factors, so their time-to-event outcomes are correlated. The frailty model under proportional hazards assumption has been widely applied for the analysis of clustered survival outcomes. However, the prediction performance of this method can be less satisfactory when the risk factors have complicated effects, e.g., nonlinear and interactive. To deal with these issues, we propose a neural network frailty Cox model that replaces the linear risk function with the output of a feed-forward neural network. The estimation is based on quasi-likelihood using Laplace approximation. A simulation study suggests that the proposed method has the best performance compared with existing methods. The method is applied to the clustered time-to-failure prediction within the kidney transplantation facility using the national kidney transplant registry data from the U.S. Organ Procurement and Transplantation Network. All computer programs are available at https://github.com/rivenzhou/deep_learning_clustered.
Deep neural networks have a wide range of applications in data science. This paper reviews neural network modeling algorithms and their applications in both supervised and unsupervised learning. Key examples include: (i) binary classification and (ii) nonparametric regression function estimation, both implemented with feedforward neural networks (FNN); (iii) sequential data prediction using long short-term memory (LSTM) networks; and (iv) image classification using convolutional neural networks (CNN). All implementations are provided in MATLAB, making these methods accessible to statisticians and data scientists to support learning and practical application.
There is a great deal of prior knowledge about gene function and regulation in the form of annotations or prior results that, if directly integrated into individual prognostic or diagnostic studies, could improve predictive performance. For example, in a study to develop a predictive model for cancer survival based on gene expression, effect sizes from previous studies or the grouping of genes based on pathways constitute such prior knowledge. However, this external information is typically only used post-analysis to aid in the interpretation of any findings. We propose a new hierarchical two-level ridge regression model that can integrate external information in the form of “meta features” to predict an outcome. We show that the model can be fit efficiently using cyclic coordinate descent by recasting the problem as a single-level regression model. In a simulation-based evaluation we show that the proposed method outperforms standard ridge regression and competing methods that integrate prior information, in terms of prediction performance when the meta features are informative on the mean of the features, and that there is no loss in performance when the meta features are uninformative. We demonstrate our approach with applications to the prediction of chronological age based on methylation features and breast cancer mortality based on gene expression features.
Pub. online:22 Feb 2021Type:Computing In Data Science
Journal:Journal of Data Science
Volume 19, Issue 2 (2021): Special issue: Continued Data Science Contributions to COVID-19 Pandemic, pp. 293–313
Abstract
The COVID-19 (COrona VIrus Disease 2019) pandemic has had profound global consequences on health, economic, social, behavioral, and almost every major aspect of human life. Therefore, it is of great importance to model COVID-19 and other pandemics in terms of the broader social contexts in which they take place. We present the architecture of an artificial intelligence enhanced COVID-19 analysis (in short AICov), which provides an integrative deep learning framework for COVID-19 forecasting with population covariates, some of which may serve as putative risk factors. We have integrated multiple different strategies into AICov, including the ability to use deep learning strategies based on Long Short-Term Memory (LSTM) and event modeling. To demonstrate our approach, we have introduced a framework that integrates population covariates from multiple sources. Thus, AICov not only includes data on COVID-19 cases and deaths but, more importantly, the population’s socioeconomic, health, and behavioral risk factors at their specific locations. The compiled data are fed into AICov, and thus we obtain improved prediction by the integration of the data to our model as compared to one that only uses case and death data. As we use deep learning our models adapt over time while learning the model from past data.
Climate change is widely recognized as one of the most challenging, urgent and complex problem facing humanity. There are rising interests in understanding and quantifying climate changing. We analyze the climate trend in Canada using Canadian monthly surface air temperature, which is longitudinal data in nature with long time span. Analysis of such data is challenging due to the complexity of modeling and associated computation burdens. In this paper, we divide this type of longitudinal data into time blocks, conduct multivariate regression and utilize a vine copula model to account for the dependence among the multivariate error terms. This vine copula model allows separate specification of within-block and between-block dependence structure and has great flexibility of modeling complex association structures. To release the computational burden and concentrate on the structure of interest, we construct composite likelihood functions, which leave the connecting structure between time blocks unspecified. We discuss different estimation procedures and issues regarding model selection and prediction. We explore the prediction performance of our vine copula model by extensive simulation studies. An analysis of the Canada climate dataset is provided.