Data Visualization and Descriptive Analysis for Understanding Epidemiological Characteristics of COVID-19: A Case Study of a Dataset from January 22, 2020 to March 29, 2020
Volume 18, Issue 3 (2020): Special issue: Data Science in Action in Response to the Outbreak of COVID-19, pp. 526–535
Pub. online: 4 August 2022
Type: Research Article
Open Access
Published
4 August 2022
4 August 2022
Abstract
COVID-19 is a disease caused by the severe acute respiratory syndrome coronavirus 2 (SARSCoV-2) that was reported to spread in people in December 2019. Understanding epidemiological
features of COVID-19 is important for the ongoing global efforts to contain the virus. As a
complement to the available work, in this article we analyze the Kaggle novel coronavirus dataset
of 3397 patients dated from January 22, 2020 to March 29, 2020. We employ semiparametric
and nonparametric survival models as well as text mining and data visualization techniques to
examine the clinical manifestations and epidemiological features of COVID-19. Our analysis
shows that: (i) the median incubation time is about 5 days and older people tend to have a
longer incubation period; (ii) the median time for infected people to recover is about 20 days,
and the recovery time is significantly associated with age but not gender; (iii) the fatality rate
is higher for older infected patients than for younger patients