EMixed: Probabilistic Multi-Omics Cellular Deconvolution of Bulk Omics Data
Pub. online: 26 February 2025
Type: Statistical Data Science
Open Access
†
These authors contributed equally to this work.
Received
1 October 2024
1 October 2024
Accepted
24 January 2025
24 January 2025
Published
26 February 2025
26 February 2025
Abstract
Cellular deconvolution is a key approach to deciphering the complex cellular makeup of tissues by inferring the composition of cell types from bulk data. Traditionally, deconvolution methods have focused on a single molecular modality, relying either on RNA sequencing (RNA-seq) to capture gene expression or on DNA methylation (DNAm) to reveal epigenetic profiles. While these single-modality approaches have provided important insights, they often lack the depth needed to fully understand the intricacies of cellular compositions, especially in complex tissues. To address these limitations, we introduce EMixed, a versatile framework designed for both single-modality and multi-omics cellular deconvolution. EMixed models raw RNA counts and DNAm counts or frequencies via allocation models that assign RNA transcripts and DNAm reads to cell types, and uses an expectation-maximization (EM) algorithm to estimate parameters. Benchmarking results demonstrate that EMixed significantly outperforms existing methods across both single-modality and multi-modality applications, underscoring the broad utility of this approach in enhancing our understanding of cellular heterogeneity.
Supplementary material
Supplementary MaterialR package EMixed is publicly hosted on GitHub (https://github.com/manqicai/EMixed)
References
Altboum Z, Steuerman Y, David E, Barnett-Itzhaki Z, Valadarsky L, Keren-Shaul H, et al. (2014). Digital cell quantification identifies global immune cell dynamics during influenza infection. Molecular Systems Biology, 10: 720. https://doi.org/10.1002/msb.134947
Avila Cobos F, Alquicira-Hernandez J, Powell JE, Mestdagh P, De Preter K (2020). Benchmarking of cell type deconvolution pipelines for transcriptomics data. Nature Communications, 11: 1–14. https://doi.org/10.1038/s41467-019-13993-7
Cai M, Yue M, Chen T, Liu J, Forno E, Lu X, et al. (2022). Robust and accurate estimation of cellular fraction from tissue omics data via ensemble deconvolution. Bioinformatics, 38: 3004–3010. https://doi.org/10.1093/bioinformatics/btac279
Cai M, Zhou J, McKennan C, Wang J (2024). scmd facilitates cell type deconvolution using single-cell dna methylation references. Communications Biology, 7: 1. https://doi.org/10.1038/s42003-023-05690-5
Chang W, Wan C, Lu X, Tu S-w, Sun Y, Zhang X, et al. (2019). Ictd: A semi-supervised cell type identification and deconvolution method for multi-omics data. bioRxiv preprint: https://doi.org/10.1101/426593.
Chen W, Wang T, Pino-Yanes M, Forno E, Liang L, Yan Q, et al. (2017). An epigenome-wide association study of total serum ige in hispanic children. Journal of Allergy and Clinical Immunology, 140: 571–577. https://doi.org/10.1016/j.jaci.2016.11.030
Chu T, Wang Z, Pe’er D, Danko CG (2022). Cell type and gene expression deconvolution with bayesprism enables bayesian integrative analysis across bulk and single-cell rna sequencing in oncology. Nature Cancer, 3: 505–517. https://doi.org/10.1038/s43018-022-00356-3
Gasparoni G, Bultmann S, Lutsik P, Kraus TF, Sordon S, Vlcek J, et al. (2018). Dna methylation analysis on purified neurons and glia dissects age and alzheimer’s disease-specific changes in the human cortex. Epigenetics & Chromatin, 11: 1–19. https://doi.org/10.1186/s13072-017-0171-z
Guintivano J, Aryee MJ, Kaminsky ZA (2013). A cell epigenotype specific model for the correction of brain cellular heterogeneity bias and its application to age, brain region and major depression. Epigenetics, 8: 290–302. https://doi.org/10.4161/epi.23924
Jaffe AE, Irizarry RA (2014). Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biology, 15: R31. https://doi.org/10.1186/gb-2014-15-2-r31
Jeong Y, de Andrade e Sousa LB, Thalmeier D, Toth R, Ganslmeier M, Breuer K, et al. (2022). Systematic evaluation of cell-type deconvolution pipelines for sequencing-based bulk dna methylomes. Briefings in Bioinformatics, 23: bbac248. https://doi.org/10.1093/bib/bbac248
Luo C, Liu H, Xie F, Armand EJ, Siletti K, Bakken TE, et al. (2022). Single nucleus multi-omics identifies human cortical cell regulatory genome diversity. Cell genomics, 2: 100107. https://doi.org/10.1016/j.xgen.2022.100107
Newman AM, Liu CL, Green MR, Gentles AJ, Feng W, Xu Y, et al. (2015). Robust enumeration of cell subsets from tissue expression profiles. Nature Methods, 12: 453–457. https://doi.org/10.1038/nmeth.3337
Patrick E, Taga M, Ergun A, Ng B, Casazza W, Cimpean M, et al. (2020). Deconvolving the contributions of cell-type heterogeneity on cortical gene expression. PLOS Computational Biology, 16: e1008120. https://doi.org/10.1371/journal.pcbi.1008120
Racle J, de Jonge K, Baumgaertner P, Speiser DE, Gfeller D (2017). Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data. eLife, 6: e26476. https://doi.org/10.7554/eLife.26476
Swapna LS, Huang M, Li Y (2023). Gtm-decon: Guided-topic modeling of single-cell transcriptomes enables sub-cell-type and disease-subtype deconvolution of bulk transcriptomes. Genome Biology, 24: 190. https://doi.org/10.1186/s13059-023-03034-4
Teschendorff AE, Zhu T, Breeze CE, Beck S (2020). Episcore: Cell type deconvolution of bulk tissue DNA methylomes from single-cell RNA-seq data. Genome Biology, 21: 1–33. https://doi.org/10.1186/s13059-019-1906-x
Zhang Z, Wiencke JK, Kelsey KT, Koestler DC, Molinaro AM, Pike SC, et al. (2023). Hierarchical deconvolution for extensive cell type resolution in the human brain using dna methylation. Frontiers in Neuroscience, 17: 1198243. https://doi.org/10.3389/fnins.2023.1198243