Data Visualization and Descriptive Analysis for Understanding Epidemiological Characteristics of COVID-19: A Case Study of a Dataset from January 22, 2020 to March 29, 2020
Pub. online:4 Aug 2022Type:Research ArticleOpen Access
Journal:Journal of Data Science
Volume 18, Issue 5 (2020): Special Issue S1 in Chinese (with abstract in English), pp. 907–921
Abstract
The Corona Virus Disease 2019 (COVID-19) emerged in Wuhan, China in December 2019. In order to control the epidemic, the Chinese government adopted several public health measures. To study the influence of these measures on the transmissibility of COVID-19 in the city of Wuhan and other cities in the Hubei province, China, we establish generalized semi-varying coefficient models for the number of new diagnosed cases and estimate the varying coefficient for the covariates by the spline method. Since the pandemic was most severe in Wuhan, we fitted separate models for Wuhan and the remaining 16 cities in Hubei. Estimators for the incubation periods, the real-time transmission rates, and the real-time reproduction numbers were obtained. The results demonstrate that the changes in the real-time transmission rate in Wuhan and other cities in Hubei are almost simultaneous. Futher, public health interventions such as restriction of traffic, adjustment of the diagnosed standard, deployment of medical resources, and improvement of nucleic acid testing capacity, had positive effects on reducing the transmission of COVID-19.
Abstract: Simulation studies are important statistical tools used to inves-tigate the performance, properties and adequacy of statistical models. The simulation of right censored time-to-event data involves the generation of two independent survival distributions, where the rst distribution repre-sents the uncensored survival times and the second distribution represents the censoring mechanism. In this brief report we discuss how we can make it so that the percentage of censored data is previously de ned. The described method was used to generate data from a Weibull distribution, but it can be adapted to any other lifetime distribution. We further presented an R code function for generating random samples, considering the proposed approach.
Pub. online:4 Aug 2022Type:Research ArticleOpen Access
Journal:Journal of Data Science
Volume 18, Issue 3 (2020): Special issue: Data Science in Action in Response to the Outbreak of COVID-19, pp. 483–495
Abstract
Coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronvirus, which was declared as a global pandemic by the World Health Organization on March 11, 2020. In this work, we conduct a cross-sectional study to investigate how the infection fatality rate (IFR) of COVID-19 may be associated with possible geographical or demographical features of the infected population. We employ a multiple index model in combination with sliced inverse regression to facilitate the relationship between the IFR and possible risk factors. To select associated features for the infection fatality rate, we utilize an adaptive Lasso penalized sliced inverse regression method, which achieves variable selection and sufficient dimension reduction simultaneously with unimportant features removed automatically. We apply the proposed method to conduct a cross-sectional study for the COVID-19 data obtained from two time points of the outbreak.
Pub. online:22 Feb 2021Type:Computing In Data Science
Journal:Journal of Data Science
Volume 19, Issue 2 (2021): Special issue: Continued Data Science Contributions to COVID-19 Pandemic, pp. 293–313
Abstract
The COVID-19 (COrona VIrus Disease 2019) pandemic has had profound global consequences on health, economic, social, behavioral, and almost every major aspect of human life. Therefore, it is of great importance to model COVID-19 and other pandemics in terms of the broader social contexts in which they take place. We present the architecture of an artificial intelligence enhanced COVID-19 analysis (in short AICov), which provides an integrative deep learning framework for COVID-19 forecasting with population covariates, some of which may serve as putative risk factors. We have integrated multiple different strategies into AICov, including the ability to use deep learning strategies based on Long Short-Term Memory (LSTM) and event modeling. To demonstrate our approach, we have introduced a framework that integrates population covariates from multiple sources. Thus, AICov not only includes data on COVID-19 cases and deaths but, more importantly, the population’s socioeconomic, health, and behavioral risk factors at their specific locations. The compiled data are fed into AICov, and thus we obtain improved prediction by the integration of the data to our model as compared to one that only uses case and death data. As we use deep learning our models adapt over time while learning the model from past data.