<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">JDS</journal-id>
      <journal-title-group>
        <journal-title>Journal of Data Science</journal-title>
      </journal-title-group>
      <issn pub-type="epub">1680-743X</issn>
      <issn pub-type="ppub">1680-743X</issn>
      <publisher>
        <publisher-name>SOSRUC</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">JULY8</article-id>
      <article-id pub-id-type="doi">10.6339/JDS.202007_18(3).0018</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Research Article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Data Visualization and Descriptive Analysis for Understanding Epidemiological Characteristics of COVID-19: A Case Study of a Dataset from January 22, 2020 to March 29, 2020</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Charvadeh</surname>
            <given-names>Yasin Khadem</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_000"/>
        </contrib>
        <aff id="j_JDS_aff_000">Department of Statistical and Actuarial Sciences, University of Western Ontario, London, Ontario, Canada</aff>
        <contrib contrib-type="author">
          <name>
            <surname>Yi</surname>
            <given-names>Grace Y</given-names>
          </name>
          <xref ref-type="aff" rid="j_JDS_aff_001"/>
        </contrib>
        <aff id="j_JDS_aff_001">Department of Statistical and Actuarial Sciences, University of Western Ontario, London, Ontario, Canada
2Department of Computer Science, University of Western Ontario, London, Ontario, Canada</aff>
      </contrib-group>
      <volume>18</volume>
      <issue>3</issue>
      <fpage>526</fpage>
      <lpage>535</lpage>
      <permissions>
        <ali:free_to_read xmlns:ali="http://www.niso.org/schemas/ali/1.0/"/>
      </permissions>
      <abstract>
        <p>COVID-19 is a disease caused by the severe acute respiratory syndrome coronavirus 2 (SARSCoV-2) that was reported to spread in people in December 2019. Understanding epidemiological</p>
        <p>features of COVID-19 is important for the ongoing global efforts to contain the virus. As a</p>
        <p>complement to the available work, in this article we analyze the Kaggle novel coronavirus dataset</p>
        <p>of 3397 patients dated from January 22, 2020 to March 29, 2020. We employ semiparametric</p>
        <p>and nonparametric survival models as well as text mining and data visualization techniques to</p>
        <p>examine the clinical manifestations and epidemiological features of COVID-19. Our analysis</p>
        <p>shows that: (i) the median incubation time is about 5 days and older people tend to have a</p>
        <p>longer incubation period; (ii) the median time for infected people to recover is about 20 days,</p>
        <p>and the recovery time is significantly associated with age but not gender; (iii) the fatality rate</p>
        <p>is higher for older infected patients than for younger patients</p>
      </abstract>
      <kwd-group>
        <label>Keywords</label>
        <kwd>incubation time</kwd>
        <kwd>recovery time</kwd>
        <kwd>risk factors</kwd>
        <kwd>survival analysis</kwd>
        <kwd>symptom onset</kwd>
        <kwd>text mining</kwd>
      </kwd-group>
    </article-meta>
  </front>
</article>
