There are many methods of scoring the importance of variables in prediction of a response but not much is known about their accuracy. This paper partially fills the gap by introducing a new method based on the GUIDE algorithm and comparing it with 11 existing methods. For data without missing values, eight methods are shown to give biased scores that are too high or too low, depending on the type of variables (ordinal, binary or nominal) and whether or not they are dependent on other variables, even when all of them are independent of the response. Among the remaining four methods, only GUIDE continues to give unbiased scores if there are missing data values. It does this with a self-calibrating bias-correction step that is applicable to data with and without missing values. GUIDE also provides threshold scores for differentiating important from unimportant variables with 95 and 99 percent confidence. Correlations of the scores to the predictive power of the methods are studied in three real data sets. For many methods, correlations with marginal predictive power are much higher than with conditional predictive power.
The COVID-19 (COrona VIrus Disease 2019) pandemic has had profound global consequences on health, economic, social, behavioral, and almost every major aspect of human life. Therefore, it is of great importance to model COVID-19 and other pandemics in terms of the broader social contexts in which they take place. We present the architecture of an artificial intelligence enhanced COVID-19 analysis (in short AICov), which provides an integrative deep learning framework for COVID-19 forecasting with population covariates, some of which may serve as putative risk factors. We have integrated multiple different strategies into AICov, including the ability to use deep learning strategies based on Long Short-Term Memory (LSTM) and event modeling. To demonstrate our approach, we have introduced a framework that integrates population covariates from multiple sources. Thus, AICov not only includes data on COVID-19 cases and deaths but, more importantly, the population’s socioeconomic, health, and behavioral risk factors at their specific locations. The compiled data are fed into AICov, and thus we obtain improved prediction by the integration of the data to our model as compared to one that only uses case and death data. As we use deep learning our models adapt over time while learning the model from past data.
Climate change is widely recognized as one of the most challenging, urgent and complex problem facing humanity. There are rising interests in understanding and quantifying climate changing. We analyze the climate trend in Canada using Canadian monthly surface air temperature, which is longitudinal data in nature with long time span. Analysis of such data is challenging due to the complexity of modeling and associated computation burdens. In this paper, we divide this type of longitudinal data into time blocks, conduct multivariate regression and utilize a vine copula model to account for the dependence among the multivariate error terms. This vine copula model allows separate specification of within-block and between-block dependence structure and has great flexibility of modeling complex association structures. To release the computational burden and concentrate on the structure of interest, we construct composite likelihood functions, which leave the connecting structure between time blocks unspecified. We discuss different estimation procedures and issues regarding model selection and prediction. We explore the prediction performance of our vine copula model by extensive simulation studies. An analysis of the Canada climate dataset is provided.
We develop a health informatics toolbox that enables timely analysis and evaluation of the timecourse dynamics of a range of infectious disease epidemics. As a case study, we examine the novel coronavirus (COVID-19) epidemic using the publicly available data from the China CDC. This toolbox is built upon a hierarchical epidemiological model in which two observed time series of daily proportions of infected and removed cases are generated from the underlying infection dynamics governed by a Markov Susceptible-Infectious-Removed (SIR) infectious disease process. We extend the SIR model to incorporate various types of time-varying quarantine protocols, including government-level ‘macro’ isolation policies and community-level ‘micro’ social distancing (e.g. self-isolation and self-quarantine) measures. We develop a calibration procedure for underreported infected cases. This toolbox provides forecasts, in both online and offline forms, as well as simulating the overall dynamics of the epidemic. An R software package is made available for the public, and examples on the use of this software are illustrated. Some possible extensions of our novel epidemiological models are discussed.