Statistical Challenges in the Analysis of Sequence and Structure Data for the COVID-19 Spike Protein
Volume 19, Issue 2 (2021), pp. 314–333
Pub. online: 22 February 2021
Type: COVID-19 Special Issue
Received
31 December 2020
31 December 2020
Accepted
18 January 2021
18 January 2021
Published
22 February 2021
22 February 2021
Abstract
As the major target of many vaccines and neutralizing antibodies against SARS-CoV-2, the spike (S) protein is observed to mutate over time. In this paper, we present statistical approaches to tackle some challenges associated with the analysis of S-protein data. We build a Bayesian hierarchical model to study the temporal and spatial evolution of S-protein sequences, after grouping the sequences into representative clusters. We then apply sampling methods to investigate possible changes to the S-protein’s 3-D structure as a result of commonly observed mutations. While the increasing spread of D614G variants has been noted in other research, our results also show that the co-occurring mutations of D614G together with S477N or A222V may spread even more rapidly, as quantified by our model estimates.
Supplementary material
Supplementary MaterialThe processed data, R code, and instructions for reproducing the results in this paper are provided in a supplementary .zip file.
References
Chen AT, Altschuler K, Zhan SH, Chan YA, Deverman BE (2020a). COVID-19 CG: Tracking SARS-CoV-2 mutations by locations and dates of interest. bioRxiv preprint: https://doi.org/10.1101/2020.09.23.310565.
European Commission (2020). Coronavirus: Commission proposes more clarity and predictability of any measures restricting free movement in the European Union. https://ec.europa.eu/commission/presscorner/detail/en/ip_20_1555. Last checked on Dec 20, 2020.
Hodcroft EB, Zuber M, Nadeau S, Comas I, Candelas FG, Stadler T, et al. (2020). Emergence and spread of a SARS-CoV-2 variant through Europe in the summer of 2020. medRxiv preprint: https://doi.org/10.1101/2020.10.25.20219063.
WHO (2020a). Coronavirus disease (COVID-19) situation dashboard. https://who.sprinklr.com/. Last checked on Dec 19, 2020.
WHO (2020b). Draft landscape of COVID-19 candidate vaccines. https://www.who.int/publications/m/item/draft-landscape-of-covid-19-candidate-vaccines. Last checked on Dec 20, 2020.
Zhou T, Tsybovsky Y, Olia AS, Gorman J, Rapp M, Cerutti G, et al. (2020b). Cryo-EM structures delineate a ph-dependent switch that mediates endosomal positioning of SARS-CoV-2 spike receptor-binding domains. bioRxiv preprint: https://doi.org/10.1101/2020.07.04.187989.