Pub. online:4 Aug 2022Type:Research ArticleOpen Access
Journal:Journal of Data Science
Volume 18, Issue 3 (2020): Special issue: Data Science in Action in Response to the Outbreak of COVID-19, pp. 550–580
Abstract
The COVID-19 pandemic has triggered explosive activities in searching for cures, including vaccines against the SARS-CoV-2 infection. As of April 30, 2020, there are at least 102 COVID-19 vaccine development programs worldwide, the majority of which are in preclinical development phases, five are in phase I trial, and three are in phase I/II trial. Experts caution against rushing COVID-19 vaccine development, not only because the knowledge about SARS-CoV-2 is lacking (albeit rapidly accumulating), but also because vaccine development is a complex, lengthy process with its own rules and timelines. Clinical trials are critically important in vaccine development, usually starting from small-scale phase I trials and gradually moving to the next phases (II and III) after the primary objectives are met. This paper is intended to provide an overview on design considerations for vaccine clinical trials, with a special focus on COVID-19 vaccine development. Given the current pandemic paradigm and unique features of vaccine development, our recommendations from statistical design perspective for COVID-19 vaccine trials include: (1) novel trial design (e.g., master protocol) to expedite the simultaneous evaluation of multiple candidate vaccines or vaccine doses, (2) human challenge studies to accelerate clinical development, (3) adaptive design strategies (e.g., group sequential designs) for early termination due to futility, efficacy, and/or safety, (4) extensive modeling and simulation to characterize and establish long-term efficacy based on early-phase or short-term follow-up data, (5) safety evaluation as one of the primary focuses throughout all phases of clinical trials, (6) leveraging real-world data and evidence in vaccine trial design and analysis to establish vaccine effectiveness, and (7) global collaboration to form a joint development effort for more efficient use of resource and expertise and data sharing.
Pub. online:4 Aug 2022Type:Research ArticleOpen Access
Journal:Journal of Data Science
Volume 18, Issue 3 (2020): Special issue: Data Science in Action in Response to the Outbreak of COVID-19, pp. 511–525
Abstract
Proteins play a key role in facilitating the infectiousness of the 2019 novel coronavirus. A specific spike protein enables this virus to bind to human cells, and a thorough understanding of its 3-dimensional structure is therefore critical for developing effective therapeutic interventions. However, its structure may continue to evolve over time as a result of mutations. In this paper, we use a data science perspective to study the potential structural impacts due to ongoing mutations in its amino acid sequence. To do so, we identify a key segment of the protein and apply a sequential Monte Carlo sampling method to detect possible changes to the space of lowenergy conformations for different amino acid sequences. Such computational approaches can further our understanding of this protein structure and complement laboratory efforts.
Researchers and public officials tend to agree that until a vaccine is readily available, stopping SARS-CoV-2 transmission is the name of the game. Testing is the key to preventing the spread, especially by asymptomatic individuals. With testing capacity restricted, group testing is an appealing alternative for comprehensive screening and has recently received FDA emergency authorization. This technique tests pools of individual samples, thereby often requiring fewer testing resources while potentially providing multiple folds of speedup. We approach group testing from a data science perspective and offer two contributions. First, we provide an extensive empirical comparison of modern group testing techniques based on simulated data. Second, we propose a simple one-round method based on ${\ell _{1}}$-norm sparse recovery, which outperforms current state-of-the-art approaches at certain disease prevalence rates.
Pub. online:22 Feb 2021Type:COVID-19 Special Issue
Journal:Journal of Data Science
Volume 19, Issue 2 (2021): Special issue: Continued Data Science Contributions to COVID-19 Pandemic, pp. 314–333
Abstract
As the major target of many vaccines and neutralizing antibodies against SARS-CoV-2, the spike (S) protein is observed to mutate over time. In this paper, we present statistical approaches to tackle some challenges associated with the analysis of S-protein data. We build a Bayesian hierarchical model to study the temporal and spatial evolution of S-protein sequences, after grouping the sequences into representative clusters. We then apply sampling methods to investigate possible changes to the S-protein’s 3-D structure as a result of commonly observed mutations. While the increasing spread of D614G variants has been noted in other research, our results also show that the co-occurring mutations of D614G together with S477N or A222V may spread even more rapidly, as quantified by our model estimates.