Mutstats: An Ultra-fast Computational Method to Determine Clonal Status of Somatic Mutations
Volume 19, Issue 3 (2021), pp. 465–484
Pub. online: 1 June 2021
Type: Data Science In Action
Received
23 January 2021
23 January 2021
Accepted
16 May 2021
16 May 2021
Published
1 June 2021
1 June 2021
Abstract
Tumor cell population is a mixture of heterogeneous cell subpopulations, known as subclones. Identification of clonal status of mutations, i.e., whether a mutation occurs in all tumor cells or in a subset of tumor cells, is crucial for understanding tumor progression and developing personalized treatment strategies. We make three major contributions in this paper: (1) we summarize terminologies in the literature based on a unified mathematical representation of subclones; (2) we develop a simulation algorithm to generate hypothetical sequencing data that are akin to real data; and (3) we present an ultra-fast computational method, Mutstats, to infer clonal status of somatic mutations from sequencing data of tumors. The inference is based on a Gaussian mixture model for mutation multiplicities. To validate Mutstats, we evaluate its performance on simulated datasets as well as two breast carcinoma samples from The Cancer Genome Atlas project.
Supplementary material
Supplementary MaterialWe include an Appendix on the Bayes model used by the PyClone method. In addition, the simulation data can be obtained from the website https://compgenome.shinyapps.io/tumorsim. Finally, the code of the Mutstats method and the real data used in this analysis can be found in the author’s Github page https://github.com/edwardbi/Mutstats.
References
Fraley C, Raftery A, Scrucca L (2016). mclust: Gaussian mixture modelling for model-based clustering, classification, and density estimation. URL https://CRAN.R-project.org/package=mclust. R package version, 5: 1.