Journal of Data Science logo


Login Register

  1. Home
  2. Issues
  3. Volume 19, Issue 1 (2021)
  4. Five Critical Genes Related to Seven COV ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Five Critical Genes Related to Seven COVID-19 Subtypes: A Data Science Discovery
Volume 19, Issue 1 (2021), pp. 142–150
Zhengjun Zhang ORCID icon link to view author Zhengjun Zhang details  

Authors

 
Placeholder
https://doi.org/10.6339/21-JDS1005
Pub. online: 3 February 2021      Type: Data Science In Action     

Received
1 December 2020
Accepted
1 January 2021
Published
3 February 2021

Abstract

Since the first confirmed case of COVID-19 was identified in December 2019, the total COVID-19 patients are up to 80,675,745, and the number of deaths is 1,764,185 as of December 27, 2020. The problem is that researchers are still learning about it, and new variants of SARS-CoV-2 are not stopping. For medical treatment, essential and informative genes can lead to accurate tests of whether an individual has contracted COVID-19 and help develop highly efficient vaccines, antiviral drugs, and treatments. As a result, identifying critical genes related to COVID-19 has been an urgent task for medical researchers. We conducted a competing risk analysis using the max-linear logistic regression model to analyze 126 blood samples from COVID-19-positive and COVID-19-negative patients. Our research led to a competing COVID-19 risk classifier derived from 19,472 genes and their differential expression values. The final classifier model only involves five critical genes, ABCB6, KIAA1614, MND1, SMG1, RIPK3, which led to 100% sensitivity and 100% specificity of the 126 samples. Given their 100% accuracy in predicting COVID-19 positive or negative status, these five genes can be critical in developing proper, focused, and accurate COVID-19 testing procedures, guiding the second-generation vaccine development, studying antiviral drugs and treatments. It is expected that these five genes can motivate numerous new COVID-19 researches.

Supplementary material

 Supplementary Material
Outcome Table 1 is in a supplementary file available online. A Matlab® demo code for solving Equation (4) is also available.

References

 
Andersen K, Rambaut A, Lipkin W, et al. (2020). The proximal origin of SARS-COV-2. Nature Medicine, 26: 450–452.
 
Cao W, Zhang Z (2020). New extreme value theory for maxima of maxima. Statistical Theory and Related Fields. Forthcoming, https://doi.org/10.1080/24754269.2020.1846115.
 
Cui Q, Xu Y, Zhang Z, Chan V (2020). Max-linear regression models with regularization. Journal of Econometrics. Forthcoming, https://doi.org/10.1016/j.jeconom.2020.07.017.
 
Cui Q, Zhang Z (2018). Max-linear competing factor models. Journal of Business & Economic Statistics, 36(1): 62–74.
 
Fan J, Li R, Zhang CH, Zou H (2020). Statistical Foundations of Data Science. Chapman and Hall/CRC.
 
Guglielmi G (2020). Fast coronavirus tests: What they can and can’t do. Nature, 585: 496–498.
 
Lu R, Zhao X, Li J, Niu P, Yang B, Wu H (2020). Genomic characterisation and epidemiology of 2019 novel coronavirus: Implications for virus origins and receptor binding. The Lancet, 395: 565–574.
 
Malinowski A, Schlather M, Zhang Z (2016). Intrinsically weighted means and non-ergodic marked point processes. Annals of the Institute of Statistical Mathematics, 68(1): 1–24.
 
Mick E, Kamm J, Pisco A, Ratnasiri K, et al. (2020). Upper airway gene expression reveals suppressed immune responses to SARS-COV-2 compared with other respiratory viruses. Nature Communications, 11: 5854.
 
Overmyer KA, Shishkova E, Miller IJ, Balnis J, Bernstein MN, Peters-Clarke TM, et al. (2020). Large-scale multi-omic analysis of COVID-19 severity. Cell Systems, 12(1): 23–40. https://doi.org/10.1016/j.cels.2020.10.003.
 
Rowland C (2020). Doctors and nurses want more data before championing vaccines to end the pandemic: Health systems are launching bids to assure their medical workers that vaccines will be safe and effective. CNN, November 21, 2020 at 6:00 a.m. CST.
 
Teng HY, Zhang Z (2020). Absolute and relative treatment effects in clinical trials: Models and applications in COVID-19 treatments. Manuscript submitted, University of Wisconsin.
 
The-RECOVERY-Collaborative-Group (2020). Effect of hydroxychloroquine in hospitalized patients with COVID-19. The New England Journal of Medicine, 383(21): 2030–2040.
 
The-Severe-Covid-19-GWAS-Group (2020). Genomewide association study of severe COVID-19 with respiratory failure. The New England Journal of Medicine, 383(16): 1522–1534. PMID: 32558485.
 
Xie Y, Zhang Z, Rathouz PJ, Barrett B (2019). Multivariate semi-continuous proportionally constrained two-part fixed effects models and applications. Statistical Methods in Medical Research, 28: 3516–3533.
 
Xu Y (2019). Regression models with max-linear structure, PhD Dissertation, University of Wisconsin.
 
Yu WB, Tang GD, Zhang L, Corlett RT (2020). Decoding the evolution and transmissions of the novel pneumonia coronavirus (SARS-CoV-2/HCoV-19) using whole genomic data. Zoology Research, 41(3): 247–257.
 
Zhang R, Tie X, Qi Z, Bevins NB, et al. (2020). Diagnosis of coronavirus disease 2019 pneumonia by using chest radiography: Value of artificial intelligence. Radiology, 298(2): E88–E97. Published Online: September 24, 2020.
 
Zhang Z (2005). A new class of tail-dependent time series models and its applications in financial time series. Advances in Econometrics, 20(B): 323–358.
 
Zhang Z (2008). Quotient correlation: A sample based alternative to Pearson’s correlation. The Annals of Statistics, 36(2): 1007–1030.
 
Zhang Z (2020). On studying extreme values and systematic risks with nonlinear time series models and tail dependence measures (with discussions). Statistical Theory and Related Fields, Forthcoming, https://doi.org/10.1080/24754269.2020.1856590.
 
Zhang Z, Qi Y, Ma X (2011). Asymptotic independence of correlation coefficients with application to testing hypothesis of independence. Electronic Journal of Statistics, 5: 342–372.
 
Zhang Z, Zhang C, Cui Q (2017). Random threshold driven tail dependence measures with application to precipitation data analysis. Statistica Sinica, 27(2): 685–709.

Related articles PDF XML
Related articles PDF XML

Copyright
© 2021 The Author(s).
This is a free to read article.

Keywords
classification competing risk COVID-19 test COVID-19 treatment COVID-19 vaccine gene-gene interaction

Funding
The work was partially supported by NSF-DMS-2012298 (NSF).

Metrics
since February 2021
5500

Article info
views

1150

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy