Five Critical Genes Related to Seven COVID-19 Subtypes: A Data Science Discovery
Volume 19, Issue 1 (2021), pp. 142–150
Pub. online: 3 February 2021
Type: Data Science In Action
Received
1 December 2020
1 December 2020
Accepted
1 January 2021
1 January 2021
Published
3 February 2021
3 February 2021
Abstract
Since the first confirmed case of COVID-19 was identified in December 2019, the total COVID-19 patients are up to 80,675,745, and the number of deaths is 1,764,185 as of December 27, 2020. The problem is that researchers are still learning about it, and new variants of SARS-CoV-2 are not stopping. For medical treatment, essential and informative genes can lead to accurate tests of whether an individual has contracted COVID-19 and help develop highly efficient vaccines, antiviral drugs, and treatments. As a result, identifying critical genes related to COVID-19 has been an urgent task for medical researchers. We conducted a competing risk analysis using the max-linear logistic regression model to analyze 126 blood samples from COVID-19-positive and COVID-19-negative patients. Our research led to a competing COVID-19 risk classifier derived from 19,472 genes and their differential expression values. The final classifier model only involves five critical genes, ABCB6, KIAA1614, MND1, SMG1, RIPK3, which led to 100% sensitivity and 100% specificity of the 126 samples. Given their 100% accuracy in predicting COVID-19 positive or negative status, these five genes can be critical in developing proper, focused, and accurate COVID-19 testing procedures, guiding the second-generation vaccine development, studying antiviral drugs and treatments. It is expected that these five genes can motivate numerous new COVID-19 researches.
Supplementary material
Supplementary MaterialOutcome Table 1 is in a supplementary file available online. A Matlab® demo code for solving Equation (4) is also available.
References
Cao W, Zhang Z (2020). New extreme value theory for maxima of maxima. Statistical Theory and Related Fields. Forthcoming, https://doi.org/10.1080/24754269.2020.1846115.
Cui Q, Xu Y, Zhang Z, Chan V (2020). Max-linear regression models with regularization. Journal of Econometrics. Forthcoming, https://doi.org/10.1016/j.jeconom.2020.07.017.
Overmyer KA, Shishkova E, Miller IJ, Balnis J, Bernstein MN, Peters-Clarke TM, et al. (2020). Large-scale multi-omic analysis of COVID-19 severity. Cell Systems, 12(1): 23–40. https://doi.org/10.1016/j.cels.2020.10.003.
Zhang Z (2020). On studying extreme values and systematic risks with nonlinear time series models and tail dependence measures (with discussions). Statistical Theory and Related Fields, Forthcoming, https://doi.org/10.1080/24754269.2020.1856590.