Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 19, Issue 2 (2021): Special issue: Continued Data Science Contributions to COVID-19 Pandemic
  4. Methods, Challenges, and Practical Issue ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Methods, Challenges, and Practical Issues of COVID-19 Projection: A Data Science Perspective
Volume 19, Issue 2 (2021): Special issue: Continued Data Science Contributions to COVID-19 Pandemic, pp. 219–242
Myungjin Kim   Zhiling Gu   Shan Yu     All authors (5)

Authors

 
Placeholder
https://doi.org/10.6339/21-JDS1013
Pub. online: 27 April 2021      Type: Philosophies Of Data Science     

Received
21 April 2021
Accepted
22 April 2021
Published
27 April 2021

Abstract

The coronavirus disease 2019 (COVID-19) pandemic caused by the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has placed epidemic modeling at the center of attention of public policymaking. Predicting the severity and speed of transmission of COVID-19 is crucial to resource management and developing strategies to deal with this epidemic. Based on the available data from current and previous outbreaks, many efforts have been made to develop epidemiological models, including statistical models, computer simulations, mathematical representations of the virus and its impacts, and many more. Despite their usefulness, modeling and forecasting the spread of COVID-19 remains a challenge. In this article, we give an overview of the unique features and issues of COVID-19 data and how they impact epidemic modeling and projection. In addition, we illustrate how various models could be connected to each other. Moreover, we provide new data science perspectives on the challenges of COVID-19 forecasting, from data collection, curation, and validation to the limitations of models, as well as the uncertainty of the forecast. Finally, we discuss some data science practices that are crucial to more robust and accurate epidemic forecasting.

References

 
Altieri N, Barter RL, Duncan J, Dwivedi R, Kumbier K, Li X, et al. (2021). Curating a COVID-19 data repository and forecasting county-level death counts in the United States. Harvard Data Science Review. https://doi.org/10.1162/99608f92.1d4e0dae
 
Arik SO, Li CL, Yoon J, Sinha R, Epshteyn A, Le LT, et al. (2020). Interpretable sequence learning for COVID-19 forecasting. arXiv preprint: https://arxiv.org/abs/2008.00646.
 
Arora P, Kumar H, Panigrahi BK (2020). Prediction and analysis of COVID-19 positive cases using deep learning models: A descriptive case study of India. Chaos, Solitons & Fractals, 139: 110017. https://doi.org/10.1016/j.chaos.2020.110017
 
Atlantic (2021). The COVID tracking project. https://covidtracking.com
 
Brauer F (2008). Compartmental models in epidemiology. In: Brauer F, van den Driessche P and Wu J (eds.), Mathematical Epidemiology, 19–79. Springer.
 
Brauer F, Castillo-Chavez C, Feng Z (2019). Mathematical Models in Epidemiology, Texts in Applied Mathematics, volume 32. Springer, New York.
 
Brooks L (2020). Pancasting: forecasting epidemics from provisional data, Ph.D. thesis, Centers for Disease Control and Prevention.
 
Brown RG (1959). Statistical Forecasting for Inventory Control. McGraw-Hill, New York.
 
Cao W, Chen C, Li M, Nie R, Lu Q, Song D, et al. (2021). Important factors affecting COVID-19 transmission and fatality in metropolises. Public Health, 190: e21.
 
Carson JS (2002). Model verification and validation. In: Yücesan E, Chen C-H, Snowdon JL and Charnes JM (eds.), Proceedings of the Winter Simulation Conference, volume 1, 52–58.
 
Castro L, Fairchild G, Michaud I, Osthus D (2020). COFFEE: COVID-19 forecasts using fast evaluations and estimation. https://covid-19.bsvgateway.org/static/COFFEE-methodology.pdf
 
Chen LP, Zhang Q, Yi GY, He W (2021). Model-based forecasting for Canadian COVID-19 data. PLoS One, 16(1): e0244536.
 
Council of State and Territorial Epidemiologists (2020). Standardized surveillance case definition and national notification for 2019 novel coronavirus disease (COVID-19). https://cdn.ymaws.com/www.cste.org/resource/resmgr/2020ps/Interim-20-ID-01_covid-19.pdf
 
Cramer EY, Ray EL, Lopez VK, Bracher J, Brennen A, Rivadeneira AJC, et al. (2021). Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the US. medRxiv preprint: https://www.medrxiv.org/content/10.1101/2021.02.03.21250974v1.
 
Devaraj J, Madurai Elavarasan R, Pugazhendhi R, Shafiullah GM, Ganesan S, Jeysree AK, et al. (2021). Forecasting of COVID-19 cases using deep learning models: Is it reliable and practically significant? Results in Physics, 21: 103817.
 
Dietz K, Heesterbeek JA (2002). Daniel Bernoulli’s epidemiological model revisited. Mathematical Biosciences, 180(1–2): 1–21.
 
Efimov D, Ushirobira R (2021). On an interval prediction of COVID-19 development based on a SEIR epidemic model. Annual Reviews in Control, in press. https://doi.org/10.1016/j.arcontrol.2021.01.006
 
Farrington CP, Kanaan MN, Gay NJ (2003). Branching process models for surveillance of infectious diseases controlled by mass vaccination. Biostatistics, 4(2): 279–295.
 
Funk S, Camacho A, Kucharski AJ, Lowe R, Eggo RM, Edmunds WJ (2019). Assessing the performance of real-time epidemic forecasts: A case study of Ebola in the Western Area region of Sierra Leone, 2014–15. PLoS. Computational Biology, 15(2): e1006785.
 
Gardner L, Ratcliff J, Dong E, Katz A (2021). A need for open public data standards and sharing in light of COVID-19. The Lancet Infectious Diseases, 21(4): e80.
 
Gardner ES Jr, Mckenzie E (1985). Forecasting trends in time series. Management Science, 31(10): 1237–1246.
 
Gneiting T (2011). Making and evaluating point forecasts. Journal of the American Statistical Association, 106(494): 746–762.
 
Godio A, Pace F, Vergnano A (2020). SEIR modeling of the Italian epidemic of SARS-CoV-2 using computational swarm intelligence. International Journal of Environmental Research and Public Health, 17(10): 3535.
 
Held L, Hens N, O’Neill P, Wallinga J (2020). Handbook of Infectious Disease Data Analysis. Chapman and Hall/CRC.
 
Held L, Meyer S, Bracher J (2017). Probabilistic forecasting in infectious disease epidemiology: the 13th Armitage lecture. Statistics in Medicine, 36(22): 3443–3460.
 
Hochreiter S, Schmidhuber J (1997). Long short-term memory. Neural Computation, 9(8): 1735–1780.
 
Hoertel N, Blachier M, Blanco C, Olfson M, Massetti M, Sánchez-Rico M, et al. (2020). A stochastic agent-based model of the SARS-CoV-2 epidemic in France. Nature Medicine, 26(9): 1417–1421.
 
Hoffman H (2021). How day-of-week effects impact COVID-19 data. https://covidtracking.com/analysis-updates/how-day-of-week-effects-impact-covid-19-data.
 
Holt CC (1957). Forecasting seasonals and trends by exponentially weighted moving averages. Office of Naval Research Memorandum, Carnegie Institute of Technology.
 
Huppert A, Katriel G (2013). Mathematical modelling and prediction in infectious disease epidemiology. Clinical Microbiology and Infection, 19(11): 999–1005.
 
Hyndman RJ, Athanasopoulos G (2018). Forecasting: Principles and Practice. OTexts, Melbourne.
 
IHME COVID-19 Forecasting Team (2021). Modeling COVID-19 scenarios for the United States. Nature Medicine, 27(1): 94.
 
Ioannidis JPA, Cripps S, Tanner MA (2020). Forecasting for COVID-19 has failed. International Journal of Forecasting, in press. https://doi.org/10.1016/j.ijforecast.2020.08.004
 
Jarvis CI, Van Zandvoort K, Gimma A, Prem K, Klepac P, Rubin GJ, et al. (2020). Quantifying the impact of physical distance measures on the transmission of COVID-19 in the UK. BMC Medicine, 18: 1–10.
 
Jiang X, Wallstrom G, Cooper GF, Wagner MM (2009). Bayesian prediction of an epidemic curve. Journal of Biomedical Informatics, 42(1): 90–99.
 
Johns Hopkins University Center for Systems Science and Engineering (2021). COVID-19 data repository. https://github.com/CSSEGISandData/COVID-19
 
Kermack WO, McKendrick AG (1927). A contribution to the mathematical theory of epidemics. Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character, 115(772): 700–721.
 
KRR G, KVR M, SSP PR, Casella F (2020). Non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality. SSRN preprint: https://doi.org/10.2139/ssrn.3560688
 
Kucirka LM, Lauer SA, Laeyendecker O, Boon D, Lessler J (2020). Variation in false-negative rate of reverse transcriptase polymerase chain reaction–based SARS-CoV-2 tests by time since exposure. Annals of Internal Medicine, 173(4): 262–267.
 
Lega J, Brown HE (2016). Data-driven outbreak forecasting with a simple nonlinear growth model. Epidemics, 17: 19–26.
 
Liu W, Hethcote HW, Levin SA (1987). Dynamical behavior of epidemiological models with nonlinear incidence rates. Journal of Mathematical Biology, 25: 359–380.
 
Neto OP, Reis JC, Brizzi ACB, Zambrano GJ, de Souza JM, Pedroso W, et al. (2020). Compartmentalized mathematical model to predict future number of active cases and deaths of COVID-19. Research on Biomedical Engineering, in press. https://doi.org/10.1007/s42600-020-00084-6
 
New York Times (2021). Coronavirus (COVID-19) data in the United States. https://github.com/nytimes/covid-19-data.
 
Oreshkin BN, Carpov D, Chapados N, Bengio Y (2019). N-BEATS: neural basis expansion analysis for interpretable time series forecasting. arXiv preprint: https://arxiv.org/abs/1905.10437.
 
Papastefanopoulos V, Linardatos P, Kotsiantis S (2020). COVID-19: A comparison of time series methods to forecast percentage of active cases per population. Applied Sciences, 10(11): 3880.
 
Paul M, Held L (2011). Predictive assessment of a non-linear random effects model for multivariate time series of infectious disease counts. Statistics in Medicine, 30(10): 1118–1136.
 
Pei S, Kandula S, Yang W, Shaman J (2018). Forecasting the spatial transmission of influenza in the United States. Proceedings of the National Academy of Sciences, 115(11): 2752–2757.
 
Peng L, Yang W, Zhang D, Zhuge C, Hong L (2020). Epidemic analysis of COVID-19 in China by dynamical modeling. arXiv preprint: https://arxiv.org/abs/2002.06563.
 
Rahmandad H, Sterman J (2008). Heterogeneity and network structure in the dynamics of diffusion: comparing agent-based and differential equation models. Management Science, 54(5): 998–1014.
 
Ray EL, Sakrejda K, Lauer SA, Johansson MA, Reich NG (2017). Infectious disease prediction with kernel conditional density estimation. Statistics in Medicine, 36(30): 4908–4929.
 
Ray EL, Wattanachit N, Niemi J, Kanji AH, House K, Cramer EY, et al. (2020). Ensemble forecasts of coronavirus disease 2019 (COVID-19) in the U.S. medRxiv preprint: https://www.medrxiv.org/content/10.1101/2020.08.19.20177493v1.
 
Reich NG, Lauer SA, Sakrejda K, Iamsirithaworn S, Hinjoy S, Suangtho P, et al. (2016). Challenges in real-time prediction of infectious disease: a case study of dengue in Thailand. PLoS Neglected Tropical Diseases, 10(6): e0004761.
 
Rittel HWJ, Webber MM (1973). Dilemmas in a general theory of planning. Policy Sciences, 4(2): 155–169.
 
Rodriguez A, Tabassum A, Cui J, Xie J, Ho J, Agarwal P, et al. (2020). DeepCOVID: An operational deep learning-driven framework for explainable real-time COVID-19 forecasting. medRxiv preprint: https://www.medrxiv.org/content/10.1101/2020.09.28.20203109v2.
 
Rosner B (1983). Percentage points for a generalized ESD many-outlier procedure. Technometrics, 25(2): 165–172.
 
SAGE Working Group on Measles and Rubella (2019). Feasibility assessment of measles and rubella eradication. https://www.who.int/immunization/sage/meetings/2019/october/Feasibility_Assessment_of_Measles_and_Rubella_Eradication_final.pdf.
 
Salinas D, Flunkert V, Gasthaus J, Januschowski T (2020). Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36: 1181–1191.
 
Santosh KC (2020). COVID-19 prediction models and unexploited data. Journal of Medical Systems, 44(9): 1–4.
 
Sargent RG (2011). Verification and validation of simulation models. In: Jain S, Creasey RR, Himmelspach J and White KP (eds.), Proceedings of the 2011 Winter Simulation Conference (WSC), 183–198.
 
Singh A, Bajpai MK, Gupta SL (2020). A time-dependent mathematical model for COVID-19 transmission dynamics and analysis of critical and hospitalized cases with bed requirements. medRxiv preprint: https://www.medrxiv.org/content/10.1101/2020.10.28.20221721v1.full.
 
Sun J, Chen X, Zhang Z, Lai S, Zhao B, Liu H, et al. (2020). Forecasting the long-term trend of COVID-19 epidemic using a dynamic model. Scientific Reports, 10(1): 1–10.
 
Tang L, Zhou Y, Wang L, Purkayastha S, Zhang L, He J, et al. (2020). A review of multi-compartment infectious disease models. International Statistical Review, 88(2): 462–513.
 
Tenforde MW, Kim SS, Lindsell CJ, Rose EB, Shapiro NI, Files DC, et al. (2020). Symptom duration and risk factors for delayed return to usual health among outpatients with COVID-19 in a multistate health care systems network. Morbidity and Mortality Weekly Report, 69(30): 993–998.
 
USAFacts (2021). Coronavirus locations: COVID-19 map by county and state. https://usafacts.org/visualizations/coronavirus-covid-19-spread-map
 
Wang G, Gu Z, Li X, Yu S, Kim M, Wang Y, et al. (2020). Comparing and integrating us COVID-19 data from multiple sources with anomaly detection and repairing. arXiv preprint: https://arxiv.org/abs/2006.01333.
 
Wang L, Wang G, Li X, Yu S, Kim M, Wang Y, et al. (2021). Modeling and forecasting COVID-19. Notices of the American Mathematical Society, 68(4): 585–595.
 
Wang Q, Xie S, Wang Y, Zeng D (2020). Survival-convolution models for predicting COVID-19 cases and assessing effects of mitigation strategies. Frontiers in Public Health, 8: 325.
 
Winters PR (1960). Forecasting sales by exponentially weighted moving averages. Management Science, 6(3): 324–342.
 
World Health Organization (2020). Public health surveillance for COVID-19: interim guidance, 16 December 2020. https://www.who.int/publications/i/item/who-2019-nCoV-surveillanceguidance-2020.8
 
Zhang N, Jia W, Lei H, Wang P, Zhao P, Guo Y, et al. (2020). Effects of human behaviour changes during the COVID-19 pandemic on influenza spread in Hong Kong. Clinical Infectious Diseases, in press. https://doi.org/10.1093/cid/ciaa1818
 
Zou D, Wang L, Xu P, Chen J, Zhang W, Gu Q (2020). Epidemic model guided machine learning for COVID-19 forecasts in the United States. medRxiv preprint: https://www.medrxiv.org/content/10.1101/2020.05.24.20111989v1.

Related articles PDF XML
Related articles PDF XML

Copyright
© 2021 The Author(s)
This is a free to read article.

Keywords
COVID-19 disease spread epidemic models forecast uncertainty

Metrics
since February 2021
2347

Article info
views

753

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy