<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn>
<issn pub-type="ppub">1680-743X</issn>
<issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1002</article-id>
<article-id pub-id-type="doi">10.6339/21-JDS1002</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Statistical Data Science</subject></subj-group></article-categories>
<title-group>
<article-title>Estimating the Number of Infected Cases in COVID-19 Pandemic</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Yan</surname><given-names>Donghui</given-names></name><email xlink:href="mailto:dyan@umassd.edu">dyan@umassd.edu</email><xref ref-type="aff" rid="j_jds1002_aff_001">1</xref><xref ref-type="corresp" rid="cor1">∗</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Xu</surname><given-names>Ying</given-names></name><xref ref-type="aff" rid="j_jds1002_aff_002">2</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Wang</surname><given-names>Pei</given-names></name><xref ref-type="aff" rid="j_jds1002_aff_003">3</xref>
</contrib>
<aff id="j_jds1002_aff_001"><label>1</label>Mathematics and Data Science, <institution>University of Massachusetts</institution>, Dartmouth, MA, <country>USA</country></aff>
<aff id="j_jds1002_aff_002"><label>2</label><institution>Indigo Agriculture Inc</institution>, Boston, MA, <country>USA</country></aff>
<aff id="j_jds1002_aff_003"><label>3</label><institution>Icahn School of Medicine at Mount Sinai</institution>, NYC, NY, <country>USA</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author. Email: <ext-link ext-link-type="uri" xlink:href="mailto:dyan@umassd.edu">dyan@umassd.edu</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2021</year></pub-date><pub-date pub-type="epub"><day>23</day><month>2</month><year>2021</year></pub-date><volume>19</volume><issue>2</issue><fpage>348</fpage><lpage>364</lpage><supplementary-material id="S1" content-type="archive" xlink:href="jds1002_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>In the online supplementary, we provide all R scripts and datasets used to produce the figures and results reported in the paper. All the R scripts are placed in the main directory of a .zip archive, along with a dataset for 2020 US population and a README document that briefly describes the R scripts and the datasets. Data collected as of Apr 20, 2020 and Aug 31, 2020 are placed in respective subdirectories; there is a separate time series dataset for each US state with such information as the report case numbers, death tolls etc.</p>
</caption>
</supplementary-material>
<history>
<date date-type="received"><day>7</day><month>12</month><year>2020</year></date>
<date date-type="accepted"><day>10</day><month>1</month><year>2021</year></date>
</history>
<permissions><copyright-statement>2021 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2021</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>It is widely acknowledged that the reported numbers of infected cases with COVID-19 were not complete. A structured approach is proposed where we distinguish cases reflected later in the numbers of confirmed cases and those with mild or no symptoms thus not captured by any systems at all. The number of infected cases in the US is estimated to be 220.54% of that reported as of Apr 20, 2020. This implies an overall infection ratio of 0.53%, and a case mortality rate at 2.85% which is close to the 3.4% suggested by WHO in March 2020.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>number of unreported cases</kwd>
<kwd>population match</kwd>
</kwd-group>
</article-meta>
</front>
<back>
<ref-list id="j_jds1002_reflist_001">
<title>References</title>
<ref id="j_jds1002_ref_001">
<mixed-citation publication-type="journal"> <string-name><surname>Backer</surname> <given-names>JA</given-names></string-name>, <string-name><surname>Klinkenberg</surname> <given-names>D</given-names></string-name>, <string-name><surname>Wallinga</surname> <given-names>J</given-names></string-name> (<year>2020</year>). <article-title>Incubation period of 2019 novel coronavirus (2019-nCoV) infections among travellers from Wuhan, China, 20–28 January 2020</article-title>. <source>Eurosurveillance</source>, <volume>25</volume>(<issue>5</issue>): <elocation-id>2000062</elocation-id>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_002">
<mixed-citation publication-type="other"> <string-name><surname>Baquero</surname> <given-names>C</given-names></string-name>, <string-name><surname>Casari</surname> <given-names>P</given-names></string-name>, <string-name><surname>Anta</surname> <given-names>AF</given-names></string-name>, <string-name><surname>Frey</surname> <given-names>D</given-names></string-name>, <string-name><surname>Garcia-Agundez</surname> <given-names>A</given-names></string-name>, <string-name><surname>Georgiou</surname> <given-names>C</given-names></string-name>, et al. (2020). Measuring icebergs: Using different methods to estimate the number of COVID-19 cases in Portugal and Spain. medRxiv preprint: <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1101/2020.04.20.20073056" xlink:type="simple">https://doi.org/10.1101/2020.04.20.20073056</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_003">
<mixed-citation publication-type="journal"> <string-name><surname>Bi</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Mei</surname> <given-names>S</given-names></string-name>, <string-name><surname>Ye</surname> <given-names>C</given-names></string-name>, <string-name><surname>Zou</surname> <given-names>X</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name>, <etal>et al.</etal> (<year>2020</year>). <article-title>Epidemiology and transmission of COVID-19 in Shenzhen China: Analysis of 391 cases and 1286 of their close contacts</article-title>. medRxiv preprint: <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1101/2020.03.03.20028423" xlink:type="simple">https://doi.org/10.1101/2020.03.03.20028423</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_004">
<mixed-citation publication-type="journal"> <string-name><surname>Bohk-Ewald</surname> <given-names>C</given-names></string-name>, <string-name><surname>Dudel</surname> <given-names>C</given-names></string-name>, <string-name><surname>Myrskyla</surname> <given-names>M</given-names></string-name> (<year>2020</year>). <article-title>A demographic scaling model for estimating the total number of COVID-19 infections</article-title>. medRxiv preprint: <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1101/2020.04.23.20077719" xlink:type="simple">https://doi.org/10.1101/2020.04.23.20077719</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_005">
<mixed-citation publication-type="other"> <string-name><surname>Bottcher</surname> <given-names>L</given-names></string-name>, <string-name><surname>Xia</surname> <given-names>M</given-names></string-name>, <string-name><surname>Chou</surname> <given-names>T</given-names></string-name> (2020). Why estimating population-based case fatality rates during epidemics may be misleading. medRxiv preprint: <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1101/2020.03.26.20044693" xlink:type="simple">https://doi.org/10.1101/2020.03.26.20044693</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_006">
<mixed-citation publication-type="journal"> <string-name><surname>Byambasuren</surname> <given-names>O</given-names></string-name>, <string-name><surname>Bell</surname> <given-names>MCK</given-names></string-name>, <string-name><surname>Clark</surname> <given-names>J</given-names></string-name>, <string-name><surname>McLaws</surname> <given-names>M</given-names></string-name>, <string-name><surname>Glasziou</surname> <given-names>P</given-names></string-name> (<year>2020</year>). <article-title>Estimating the extent of asymptomatic COVID-19 and its potential for community transmission: Systematic review and meta-analysis</article-title>. <source>Journal of the Association of Medical Microbiology and Infectious Disease</source>, <volume>5</volume>(<issue>4</issue>): <fpage>223</fpage>–<lpage>234</lpage>. doi: <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.3138/jammi-2020-0030" xlink:type="simple">https://doi.org/10.3138/jammi-2020-0030</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_007">
<mixed-citation publication-type="book"> <string-name><surname>Easley</surname> <given-names>D</given-names></string-name>, <string-name><surname>Kleinberg</surname> <given-names>J</given-names></string-name> (<year>2010</year>). <source>Networks, Crowds, and Markets: Reasoning about a Highly Connected World</source>. <publisher-name>Cambridge University Press</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_008">
<mixed-citation publication-type="other"> <string-name><surname>Gupta</surname> <given-names>S</given-names></string-name>, <string-name><surname>Shankar</surname> <given-names>R</given-names></string-name> (2020). Estimating the number of COVID-19 infections in Indian hot-spots using fatality data. arXiv preprint: <uri>https://arxiv.org/abs/arXiv:2004.04025</uri>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_009">
<mixed-citation publication-type="other"> <string-name><surname>Jagodnik</surname> <given-names>KM</given-names></string-name>, <string-name><surname>Ray</surname> <given-names>F</given-names></string-name>, <string-name><surname>Giorgi</surname> <given-names>FM</given-names></string-name>, <string-name><surname>Lachmann</surname> <given-names>A</given-names></string-name> (2020). Correcting under-reported COVID-19 case numbers: estimating the true scale of the pandemic. medRxiv preprint: <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1101/2020.03.14.20036178" xlink:type="simple">https://doi.org/10.1101/2020.03.14.20036178</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_010">
<mixed-citation publication-type="journal"> <string-name><surname>Kaplan</surname> <given-names>EL</given-names></string-name>, <string-name><surname>Meier</surname> <given-names>P</given-names></string-name> (<year>1958</year>). <article-title>Nonparametric estimation from incomplete observations</article-title>. <source>Journal of the American Statistical Association</source>, <volume>53</volume>(<issue>282</issue>): <fpage>457</fpage>–<lpage>481</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_011">
<mixed-citation publication-type="journal"> <string-name><surname>Lauer</surname> <given-names>SA</given-names></string-name>, <string-name><surname>Grantz</surname> <given-names>K</given-names></string-name>, <string-name><surname>Bi</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Jones</surname> <given-names>FK</given-names></string-name>, <string-name><surname>Zheng</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Meredith</surname> <given-names>H</given-names></string-name>, <etal>et al.</etal> (<year>2020</year>). <article-title>The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: Estimation and application</article-title>. <source>Annals of Internal Medicine</source>, <volume>172</volume>(<issue>9</issue>): <fpage>577</fpage>–<lpage>582</lpage>. doi: <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.7326/M20-0504" xlink:type="simple">https://doi.org/10.7326/M20-0504</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_012">
<mixed-citation publication-type="journal"> <string-name><surname>Linton</surname> <given-names>NM</given-names></string-name>, <string-name><surname>Kobayashi</surname> <given-names>T</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Hayashi</surname> <given-names>K</given-names></string-name>, <string-name><surname>Akhmetzhanov</surname> <given-names>AR</given-names></string-name>, <string-name><surname>Jung</surname> <given-names>S</given-names></string-name>, <etal>et al.</etal> (<year>2020</year>). <article-title>Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: A statistical analysis of publicly available case data</article-title>. <source>Journal of Clinical Medicine</source>, <volume>9</volume>(<issue>2</issue>): <fpage>538</fpage>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_013">
<mixed-citation publication-type="book"> <string-name><surname>Little</surname> <given-names>RJ</given-names></string-name>, <string-name><surname>Rubin</surname> <given-names>D</given-names></string-name> (<year>2002</year>). <source>Statistical Analysis with Missing Data</source>. <publisher-name>Wiley</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_014">
<mixed-citation publication-type="journal"> <string-name><surname>McAloon</surname> <given-names>C</given-names></string-name>, <string-name><surname>Collins</surname> <given-names>A</given-names></string-name>, <string-name><surname>Hunt</surname> <given-names>K</given-names></string-name>, <string-name><surname>Barber</surname> <given-names>A</given-names></string-name>, <string-name><surname>Byrne</surname> <given-names>A</given-names></string-name>, <string-name><surname>Butler</surname> <given-names>F</given-names></string-name>, <etal>et al.</etal> (<year>2020</year>). <article-title>Incubation period of COVID-19: a rapid systematic review and meta-analysis of observational research</article-title>. <source>British Medical Journal Open</source>, <volume>10</volume>(<issue>8</issue>): <fpage>1</fpage>–<lpage>9</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_015">
<mixed-citation publication-type="journal"> <string-name><surname>Prem</surname> <given-names>K</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Russell</surname> <given-names>T</given-names></string-name>, <string-name><surname>Kucharski</surname> <given-names>A</given-names></string-name>, <string-name><surname>Eggo</surname> <given-names>R</given-names></string-name>, <string-name><surname>Davies</surname> <given-names>N</given-names></string-name>, <etal>et al.</etal> (<year>2020</year>). <article-title>The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study</article-title>. <source>The Lancet Public Health</source>, <volume>5</volume>(<issue>5</issue>): <fpage>261</fpage>–<lpage>270</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_016">
<mixed-citation publication-type="journal"> <string-name><surname>Richterich</surname> <given-names>P</given-names></string-name> (<year>2020</year>). <article-title>Severe underestimation of COVID-19 case numbers: effect of epidemic growth rate and test restrictions</article-title>. medRxiv preprint: <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1101/2020.04.13.20064220" xlink:type="simple">https://doi.org/10.1101/2020.04.13.20064220</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_017">
<mixed-citation publication-type="other"> <string-name><surname>Tian</surname> <given-names>T</given-names></string-name>, <string-name><surname>Tan</surname> <given-names>J</given-names></string-name>, <string-name><surname>Jiang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>H</given-names></string-name> (2020). Evaluate the timing of resumption of business for the states of New York, New Jersey, and California via a pre-symptomatic and asymptomatic transmission model of COVID-19. medRxiv preprint: <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1101/2020.05.16.20103747" xlink:type="simple">https://doi.org/10.1101/2020.05.16.20103747</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_018">
<mixed-citation publication-type="other"> WHO (2020). Coronavirus (COVID-19) Mortality Rate. <uri>https://www.worldometers.info/coronavirus/coronavirus-death-rate/</uri>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_019">
<mixed-citation publication-type="other"> Wikipedia (2020). COVID-19 pandemic in the United States. <uri>https://en.wikipedia.org/wiki/COVID-19_pandemic_in_the_United_States</uri>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_020">
<mixed-citation publication-type="journal"> <string-name><surname>Yan</surname> <given-names>D</given-names></string-name>, <string-name><surname>Li</surname> <given-names>C</given-names></string-name>, <string-name><surname>Cong</surname> <given-names>N</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>L</given-names></string-name>, <string-name><surname>Gong</surname> <given-names>P</given-names></string-name> (<year>2019</year>). <article-title>A structured approach to the analysis of remote sensing images</article-title>. <source>International Journal of Remote Sensing</source>, <volume>40</volume>(<issue>20</issue>): <fpage>7874</fpage>–<lpage>7897</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_021">
<mixed-citation publication-type="journal"> <string-name><surname>Zhang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Litvinova</surname> <given-names>M</given-names></string-name>, <string-name><surname>Liang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>W</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>S</given-names></string-name>, <etal>et al.</etal> (<year>2020</year>). <article-title>Changes in contact patterns shape the dynamics of the COVID-19 outbreak in China</article-title>. <source>Science</source>, <volume>368</volume>: <fpage>1481</fpage>–<lpage>1486</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1002_ref_022">
<mixed-citation publication-type="other"> <string-name><surname>Zhou</surname> <given-names>T</given-names></string-name>, <string-name><surname>Ji</surname> <given-names>Y</given-names></string-name> (2020). Semiparametric Bayesian inference for the transmission dynamics of COVID-19 with a state-space model. arXiv preprint: <uri>https://arxiv.org/abs/arXiv:2006.05581</uri>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
