<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JDS</journal-id>
<journal-title-group><journal-title>Journal of Data Science</journal-title></journal-title-group>
<issn pub-type="epub">1683-8602</issn>
<issn pub-type="ppub">1680-743X</issn>
<issn-l>1680-743X</issn-l>
<publisher>
<publisher-name>School of Statistics, Renmin University of China</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">JDS1005</article-id>
<article-id pub-id-type="doi">10.6339/21-JDS1005</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Data Science in Action</subject></subj-group></article-categories>
<title-group>
<article-title>Five Critical Genes Related to Seven COVID-19 Subtypes: A Data Science Discovery</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0003-2615-1539</contrib-id>
<name><surname>Zhang</surname><given-names>Zhengjun</given-names></name><email xlink:href="mailto:zjz@stat.wisc.edu">zjz@stat.wisc.edu</email><xref ref-type="aff" rid="j_jds1005_aff_001">1</xref><xref ref-type="fn" rid="cor1">∗</xref>
</contrib>
<aff id="j_jds1005_aff_001"><label>1</label><institution>Department of Statistics, University of Wisconsin</institution>, Madison, WI 53706, <country>USA</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Email: <ext-link ext-link-type="uri" xlink:href="mailto:zjz@stat.wisc.edu">zjz@stat.wisc.edu</ext-link>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2021</year></pub-date><pub-date pub-type="epub"><day>3</day><month>2</month><year>2021</year></pub-date>
<volume>19</volume><issue>1</issue><fpage>142</fpage><lpage>150</lpage>
<supplementary-material id="S1" content-type="archive" xlink:href="jds1005_s001.zip" mimetype="application" mime-subtype="x-zip-compressed">
<caption>
<title>Supplementary Material</title>
<p>Outcome Table 1 is in a supplementary file available online. A Matlab<sup>®</sup> demo code for solving Equation (4) is also available.</p>
</caption>
</supplementary-material>
<history>
<date date-type="received"><month>12</month><year>2020</year></date>
<date date-type="accepted"><month>1</month><year>2021</year></date>
</history>
<permissions><copyright-statement>2021 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.</copyright-statement><copyright-year>2021</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>Since the first confirmed case of COVID-19 was identified in December 2019, the total COVID-19 patients are up to 80,675,745, and the number of deaths is 1,764,185 as of December 27, 2020. The problem is that researchers are still learning about it, and new variants of SARS-CoV-2 are not stopping. For medical treatment, essential and informative genes can lead to accurate tests of whether an individual has contracted COVID-19 and help develop highly efficient vaccines, antiviral drugs, and treatments. As a result, identifying critical genes related to COVID-19 has been an urgent task for medical researchers. We conducted a competing risk analysis using the max-linear logistic regression model to analyze 126 blood samples from COVID-19-positive and COVID-19-negative patients. Our research led to a competing COVID-19 risk classifier derived from 19,472 genes and their differential expression values. The final classifier model only involves five critical genes, ABCB6, KIAA1614, MND1, SMG1, RIPK3, which led to 100% sensitivity and 100% specificity of the 126 samples. Given their 100% accuracy in predicting COVID-19 positive or negative status, these five genes can be critical in developing proper, focused, and accurate COVID-19 testing procedures, guiding the second-generation vaccine development, studying antiviral drugs and treatments. It is expected that these five genes can motivate numerous new COVID-19 researches.</p>
</abstract>
<kwd-group>
<label>Key words</label>
<kwd>classification</kwd>
<kwd>competing risk</kwd>
<kwd>COVID-19 test</kwd>
<kwd>COVID-19 treatment</kwd>
<kwd>COVID-19 vaccine</kwd>
<kwd>gene-gene interaction</kwd>
</kwd-group>
<funding-group>
<award-group>
<funding-source xlink:href="https://doi.org/10.13039/100000001">NSF</funding-source>
<award-id>NSF-DMS-2012298</award-id>
</award-group>
<funding-statement>The work was partially supported by NSF-DMS-2012298 (NSF). </funding-statement>
</funding-group>
</article-meta>
</front>
<back>
<ref-list id="j_jds1005_reflist_001">
<title>References</title>
<ref id="j_jds1005_ref_001">
<mixed-citation publication-type="journal"> <string-name><surname>Andersen</surname> <given-names>K</given-names></string-name>, <string-name><surname>Rambaut</surname> <given-names>A</given-names></string-name>, <string-name><surname>Lipkin</surname> <given-names>W</given-names></string-name>, <etal>et al.</etal> (<year>2020</year>). <article-title>The proximal origin of SARS-COV-2</article-title>. <source>Nature Medicine</source>, <volume>26</volume>: <fpage>450</fpage>–<lpage>452</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_002">
<mixed-citation publication-type="journal"> <string-name><surname>Cao</surname> <given-names>W</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name> (<year>2020</year>). <article-title>New extreme value theory for maxima of maxima</article-title>. <source>Statistical Theory and Related Fields</source>. Forthcoming, <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/24754269.2020.1846115" xlink:type="simple">https://doi.org/10.1080/24754269.2020.1846115</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_003">
<mixed-citation publication-type="journal"> <string-name><surname>Cui</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Chan</surname> <given-names>V</given-names></string-name> (<year>2020</year>). <article-title>Max-linear regression models with regularization</article-title>. <source>Journal of Econometrics</source>. Forthcoming, <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.jeconom.2020.07.017" xlink:type="simple">https://doi.org/10.1016/j.jeconom.2020.07.017</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_004">
<mixed-citation publication-type="journal"> <string-name><surname>Cui</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name> (<year>2018</year>). <article-title>Max-linear competing factor models</article-title>. <source>Journal of Business &amp; Economic Statistics</source>, <volume>36</volume>(<issue>1</issue>): <fpage>62</fpage>–<lpage>74</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_005">
<mixed-citation publication-type="book"> <string-name><surname>Fan</surname> <given-names>J</given-names></string-name>, <string-name><surname>Li</surname> <given-names>R</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>CH</given-names></string-name>, <string-name><surname>Zou</surname> <given-names>H</given-names></string-name> (<year>2020</year>). <source>Statistical Foundations of Data Science</source>. <publisher-name>Chapman and Hall/CRC</publisher-name>.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_006">
<mixed-citation publication-type="journal"> <string-name><surname>Guglielmi</surname> <given-names>G</given-names></string-name> (<year>2020</year>). <article-title>Fast coronavirus tests: What they can and can’t do</article-title>. <source>Nature</source>, <volume>585</volume>: <fpage>496</fpage>–<lpage>498</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_007">
<mixed-citation publication-type="journal"> <string-name><surname>Lu</surname> <given-names>R</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>X</given-names></string-name>, <string-name><surname>Li</surname> <given-names>J</given-names></string-name>, <string-name><surname>Niu</surname> <given-names>P</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>B</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>H</given-names></string-name> (<year>2020</year>). <article-title>Genomic characterisation and epidemiology of 2019 novel coronavirus: Implications for virus origins and receptor binding</article-title>. <source>The Lancet</source>, <volume>395</volume>: <fpage>565</fpage>–<lpage>574</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_008">
<mixed-citation publication-type="journal"> <string-name><surname>Malinowski</surname> <given-names>A</given-names></string-name>, <string-name><surname>Schlather</surname> <given-names>M</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name> (<year>2016</year>). <article-title>Intrinsically weighted means and non-ergodic marked point processes</article-title>. <source>Annals of the Institute of Statistical Mathematics</source>, <volume>68</volume>(<issue>1</issue>): <fpage>1</fpage>–<lpage>24</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_009">
<mixed-citation publication-type="journal"> <string-name><surname>Mick</surname> <given-names>E</given-names></string-name>, <string-name><surname>Kamm</surname> <given-names>J</given-names></string-name>, <string-name><surname>Pisco</surname> <given-names>A</given-names></string-name>, <string-name><surname>Ratnasiri</surname> <given-names>K</given-names></string-name>, <etal>et al.</etal> (<year>2020</year>). <article-title>Upper airway gene expression reveals suppressed immune responses to SARS-COV-2 compared with other respiratory viruses</article-title>. <source>Nature Communications</source>, <volume>11</volume>: <fpage>5854</fpage>.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_010">
<mixed-citation publication-type="journal"> <string-name><surname>Overmyer</surname> <given-names>KA</given-names></string-name>, <string-name><surname>Shishkova</surname> <given-names>E</given-names></string-name>, <string-name><surname>Miller</surname> <given-names>IJ</given-names></string-name>, <string-name><surname>Balnis</surname> <given-names>J</given-names></string-name>, <string-name><surname>Bernstein</surname> <given-names>MN</given-names></string-name>, <string-name><surname>Peters-Clarke</surname> <given-names>TM</given-names></string-name>, <etal>et al.</etal> (<year>2020</year>). <article-title>Large-scale multi-omic analysis of COVID-19 severity</article-title>. <source>Cell Systems</source>, <volume>12</volume>(<issue>1</issue>): <fpage>23</fpage>–<lpage>40</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.cels.2020.10.003" xlink:type="simple">https://doi.org/10.1016/j.cels.2020.10.003</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_011">
<mixed-citation publication-type="other"> <string-name><surname>Rowland</surname> <given-names>C</given-names></string-name> (2020). Doctors and nurses want more data before championing vaccines to end the pandemic: Health systems are launching bids to assure their medical workers that vaccines will be safe and effective. <italic>CNN</italic>, November 21, 2020 at 6:00 a.m. CST.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_012">
<mixed-citation publication-type="other"> <string-name><surname>Teng</surname> <given-names>HY</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name> (2020). Absolute and relative treatment effects in clinical trials: Models and applications in COVID-19 treatments. <italic>Manuscript submitted</italic>, University of Wisconsin.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_013">
<mixed-citation publication-type="journal"> <collab>The-RECOVERY-Collaborative-Group</collab> (<year>2020</year>). <article-title>Effect of hydroxychloroquine in hospitalized patients with COVID-19</article-title>. <source>The New England Journal of Medicine</source>, <volume>383</volume>(<issue>21</issue>): <fpage>2030</fpage>–<lpage>2040</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_014">
<mixed-citation publication-type="journal"> <collab>The-Severe-Covid-19-GWAS-Group</collab> (<year>2020</year>). <article-title>Genomewide association study of severe COVID-19 with respiratory failure</article-title>. <source>The New England Journal of Medicine</source>, <volume>383</volume>(<issue>16</issue>): <fpage>1522</fpage>–<lpage>1534</lpage>. <comment>PMID: 32558485</comment>.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_015">
<mixed-citation publication-type="journal"> <string-name><surname>Xie</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Rathouz</surname> <given-names>PJ</given-names></string-name>, <string-name><surname>Barrett</surname> <given-names>B</given-names></string-name> (<year>2019</year>). <article-title>Multivariate semi-continuous proportionally constrained two-part fixed effects models and applications</article-title>. <source>Statistical Methods in Medical Research</source>, <volume>28</volume>: <fpage>3516</fpage>–<lpage>3533</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_016">
<mixed-citation publication-type="other"> <string-name><surname>Xu</surname> <given-names>Y</given-names></string-name> (2019). Regression models with max-linear structure, <italic>PhD Dissertation</italic>, University of Wisconsin.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_017">
<mixed-citation publication-type="journal"> <string-name><surname>Yu</surname> <given-names>WB</given-names></string-name>, <string-name><surname>Tang</surname> <given-names>GD</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Corlett</surname> <given-names>RT</given-names></string-name> (<year>2020</year>). <article-title>Decoding the evolution and transmissions of the novel pneumonia coronavirus (SARS-CoV-2/HCoV-19) using whole genomic data</article-title>. <source>Zoology Research</source>, <volume>41</volume>(<issue>3</issue>): <fpage>247</fpage>–<lpage>257</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_018">
<mixed-citation publication-type="journal"> <string-name><surname>Zhang</surname> <given-names>R</given-names></string-name>, <string-name><surname>Tie</surname> <given-names>X</given-names></string-name>, <string-name><surname>Qi</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Bevins</surname> <given-names>NB</given-names></string-name>, <etal>et al.</etal> (<year>2020</year>). <article-title>Diagnosis of coronavirus disease 2019 pneumonia by using chest radiography: Value of artificial intelligence</article-title>. <source>Radiology</source>, <volume>298</volume>(<issue>2</issue>): <fpage>E88</fpage>–<lpage>E97</lpage>. <comment>Published Online: September 24, 2020</comment>.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_019">
<mixed-citation publication-type="journal"> <string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name> (<year>2005</year>). <article-title>A new class of tail-dependent time series models and its applications in financial time series</article-title>. <source>Advances in Econometrics</source>, <volume>20</volume>(<issue>B</issue>): <fpage>323</fpage>–<lpage>358</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_020">
<mixed-citation publication-type="journal"> <string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name> (<year>2008</year>). <article-title>Quotient correlation: A sample based alternative to Pearson’s correlation</article-title>. <source>The Annals of Statistics</source>, <volume>36</volume>(<issue>2</issue>): <fpage>1007</fpage>–<lpage>1030</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_021">
<mixed-citation publication-type="journal"> <string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name> (<year>2020</year>). <article-title>On studying extreme values and systematic risks with nonlinear time series models and tail dependence measures (with discussions)</article-title>. <source>Statistical Theory and Related Fields</source>, Forthcoming, <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/24754269.2020.1856590" xlink:type="simple">https://doi.org/10.1080/24754269.2020.1856590</ext-link>.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_022">
<mixed-citation publication-type="journal"> <string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Qi</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>X</given-names></string-name> (<year>2011</year>). <article-title>Asymptotic independence of correlation coefficients with application to testing hypothesis of independence</article-title>. <source>Electronic Journal of Statistics</source>, <volume>5</volume>: <fpage>342</fpage>–<lpage>372</lpage>.</mixed-citation>
</ref>
<ref id="j_jds1005_ref_023">
<mixed-citation publication-type="journal"> <string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>C</given-names></string-name>, <string-name><surname>Cui</surname> <given-names>Q</given-names></string-name> (<year>2017</year>). <article-title>Random threshold driven tail dependence measures with application to precipitation data analysis</article-title>. <source>Statistica Sinica</source>, <volume>27</volume>(<issue>2</issue>): <fpage>685</fpage>–<lpage>709</lpage>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
