High-dimensional Confounding in Causal Mediation: A Comparison Study of Double Machine Learning and Regularized Partial Correlation Network
Pub. online: 3 March 2025
Type: Statistical Data Science
Open Access
Received
1 October 2024
1 October 2024
Accepted
9 January 2025
9 January 2025
Published
3 March 2025
3 March 2025
Abstract
In causal mediation analyses, of interest are the direct or indirect pathways from exposure to an outcome variable. For observation studies, massive baseline characteristics are collected as potential confounders to mitigate selection bias, possibly approaching or exceeding the sample size. Accordingly, flexible machine learning approaches are promising in filtering a subset of relevant confounders, along with estimation using the efficient influence function to avoid overfitting. Among various confounding selection strategies, two attract growing attention. One is the popular debiased, or double machine learning (DML), and another is the penalized partial correlation via fitting a Gaussian graphical network model between the confounders and the response variable. Nonetheless, for causal mediation analyses when encountering high-dimensional confounders, there is a gap in determining the best strategy for confounding selection. Therefore, we exemplify a motivating study on the human microbiome, where the dimensions of mediator and confounders approach or exceed the sample size to compare possible combinations of confounding selection methods. By deriving the multiply robust causal direct and indirect effects across various hypotheses, our comprehensive illustrations offer methodological implications on how the confounding selection impacts the final causal target parameter estimation while generating causality insights in demystifying the “gut-brain axis”. Our results highlighted the practicality and necessity of the discussed methods, which not only guide real-world applications for practitioners but also motivate future advancements for this crucial topic in the era of big data.
References
Booth SL, Centi A, Smith SR, Gundberg C (2013). The role of osteocalcin in human glucose metabolism: marker or mediator? Nature Reviews Endocrinology, 9(1): 43–55. https://doi.org/10.1038/nrendo.2012.201
Borsboom D, Cramer AO (2013). Network analysis: an integrative approach to the structure of psychopathology. Annual Review of Clinical Psychology, 9(1): 91–121. https://doi.org/10.1146/annurev-clinpsy-050212-185608
Chen J, Chen Z (2008). Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95(3): 759–771. https://doi.org/10.1093/biomet/asn034
Chernozhukov V, Escanciano JC, Ichimura H, Newey WK, Robins JM (2022). Locally robust semiparametric estimation. Econometrica, 90(4): 1501–1535. https://doi.org/10.3982/ECTA16294
Chi WE, Huang S, Jeon M, Park ES, Melguizo T, Kezar A (2022). A practical guide to causal mediation analysis: illustration with a comprehensive college transition program and nonprogram peer and faculty interactions. Frontiers in Education, 7: 886722. https://doi.org/10.3389/feduc.2022.886722
Cho I, Blaser MJ (2012). The human microbiome: at the interface of health and disease. Nature Reviews. Genetics, 13(4): 260–270. https://doi.org/10.1038/nrg3182
Cohen-Kadosh K (2020). The role of the microbiota-gut-brain axis in brain development and mental health: Behavioural. ClinicalTrials.gov Identifier: NCT04616937. Updated November 15, 2020. Accessed November 28, 2022. Available at: https://clinicaltrials.gov/ct2/show/NCT04616937.
Farbmacher H, Huber M, Lafférs L, Langen H, Spindler M (2022). Causal mediation analysis with double machine learning. Econometrics Journal, 25(2): 277–300. https://doi.org/10.1093/ectj/utac003
Foster JA, Baker GB, Dursun SM (2021). The relationship between the gut microbiome-immune system-brain axis and major depressive disorder. Frontiers in Neurology, 12: 721126. https://doi.org/10.3389/fneur.2021.721126
Gunzler D, Tang W, Lu N, Wu P, Tu X (2014). A class of distribution-free models for longitudinal mediation analysis. Psychometrika, 79: 543–568. https://doi.org/10.1007/s11336-013-9355-z
Holt-Lunstad J (2017). The potential public health relevance of social isolation and loneliness: prevalence, epidemiology, and risk factors. Public Policy & Aging Report, 27(4): 127–130. https://doi.org/10.1093/ppar/prx030
Hünermund P, Louw B, Caspi I (2023). Double machine learning and automated confounder selection: a cautionary tale. Journal of Causal Inference, 11(1): 20220078. https://doi.org/10.1515/jci-2022-0078
Imai K, Keele L, Tingley D (2010). A general approach to causal mediation analysis. Psychological Methods, 15(4): 309. https://doi.org/10.1037/a0020761
Lindquist MA (2012). Functional causal mediation analysis with an application to brain connectivity. Journal of the American Statistical Association, 107(500): 1297–1309. https://doi.org/10.1080/01621459.2012.695640
Liu J, Lin T, Chen T, Zhang X, Tu XM (2022). On semiparametric efficiency of an emerging class of regression models for between-subject attributes. arXiv preprint: https://arxiv.org/abs/2205.08036.
Liu J, Xu K, Wu T, Yao L, Nguyen TT, Jeste D, et al. (2023). Deciphering the ‘gut–brain axis’ through microbiome diversity. General Psychiatry, 36(5): e101090. https://doi.org/10.1136/gpsych-2023-101090
Liu J, Zhang X, Chen T, Wu T, Lin T, Jiang L, et al. (2022). A semiparametric model for between-subject attributes: applications to beta-diversity of microbiome data. Biometrics, 78(3): 950–962. https://doi.org/10.1111/biom.13487
Liu J, Zhang X, Lin T, Chen R, Zhong Y, Chen T, et al. (2024). A new paradigm for high-dimensional data: distance-based semiparametric feature aggregation framework via between-subject attributes. Scandinavian Journal of Statistics, 51(2): 672–696. https://doi.org/10.1111/sjos.12695
McGinty EE, Presskreischer R, Han H, Barry CL (2020). Psychological distress and loneliness reported by us adults in 2018 and April 2020. JAMA, 324(1): 93–94. https://doi.org/10.1001/jama.2020.9740
McNally RJ, Robinaugh DJ, Wu GW, Wang L, Deserno MK, Borsboom D (2015). Mental disorders as causal systems: a network approach to posttraumatic stress disorder. Clinical Psychological Science, 3(6): 836–849. https://doi.org/10.1177/2167702614553230
Meyer K, Lulla A, Debroy K, Shikany JM, Yaffe K, Meirelles O, et al. (2022). Association of the gut microbiota with cognitive function in midlife. JAMA Network Open, 5(2): e2143941. https://doi.org/10.1001/jamanetworkopen.2021.43941
Morais LH, Schreiber HL IV, Mazmanian SK (2021). The gut microbiota–brain axis in behaviour and brain disorders. Nature Review, Microbiology, 19(4): 241–255. https://doi.org/10.1038/s41579-020-00460-0
Nasreddine ZS, Phillips NA, Bédirian V, Charbonneau S, Whitehead V, Collin I, et al. (2005). The Montreal cognitive assessment, moca: a brief screening tool for mild cognitive impairment. Journal of the American Geriatrics Society, 53(4): 695–699. https://doi.org/10.1111/j.1532-5415.2005.53221.x
Nguyen TT, Zhang X, Wu TC, Liu J, Le C, Tu XM, et al. (2021). Association of loneliness and wisdom with gut microbial diversity and composition: an exploratory study. Frontiers in Psychiatry, 12: 648475. https://doi.org/10.3389/fpsyt.2021.648475
Northumbria University (2019). The cognitive effects of 6 weeks administration with a probiotic: a randomized, placebo controlled proof-of-concept study in healthy elderly humans. ClinicalTrials.gov Identifier: NCT03601559. Updated June 18, 2019. Accessed November 28, 2022. Available at: https://clinicaltrials.gov/ct2/show/NCT03601559.
Pearl J (2014). Interpretation and identification of causal mediation. Psychological Methods, 19(4): 459. https://doi.org/10.1037/a0036434
Robins JM, Greenland S (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology, 3(2): 143–155. https://doi.org/10.1097/00001648-199203000-00013
Rubin DB (1990). Formal mode of statistical inference for causal effects. Journal of Statistical Planning and Inference, 25(3): 279–292. https://doi.org/10.1016/0378-3758(90)90077-8
Sasso J, Ammar R, Tenchov R, Lemmel S, Kelber O, Grieswelle M, et al. (2023). Gut microbiome–brain alliance: a landscape view into mental and gastrointestinal health and disorders. ACS Chemical Neuroscience, 14(10): 1717–1763. https://doi.org/10.1021/acschemneuro.3c00127
Sgritta M, Dooling SW, Buffington SA, Momin EN, Francis MB, Britton RA, et al. (2019). Mechanisms underlying microbial-mediated changes in social behavior in mouse models of autism spectrum disorder. Neuron, 101(2): 246–259. https://doi.org/10.1016/j.neuron.2018.11.018
Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 58(1): 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Virgin HW, Todd JA (2011). Metagenomics and personalized medicine. Cell, 147(1): 44–56. https://doi.org/10.1016/j.cell.2011.09.009
Williams DR, Rast P (2020). Back to the basics: rethinking partial correlation network methodology. British Journal of Mathematical & Statistical Psychology, 73(2): 187–212. https://doi.org/10.1111/bmsp.12173
Wilson RS, De Leon CFM, Barnes LL, Schneider JA, Bienias JL, Evans DA, et al. (2002). Participation in cognitively stimulating activities and risk of incident Alzheimer disease. JAMA, 287(6): 742–748. https://doi.org/10.1001/jama.287.6.742
Zheng W, Van Der Laan MJ (2012). Targeted maximum likelihood estimation of natural direct effects. The International Journal of Biostatistics, 8(1): 1–40. https://doi.org/10.2202/1557-4679.1361
Zou H (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476): 1418–1429. https://doi.org/10.1198/016214506000000735