Journal of Data Science logo


Login Register

  1. Home
  2. To appear
  3. High-dimensional Confounding in Causal M ...

Journal of Data Science

Submit your article Information
  • Article info
  • More
    Article info

High-dimensional Confounding in Causal Mediation: A Comparison Study of Double Machine Learning and Regularized Partial Correlation Network
Ming Chen   Tanya T. Nguyen   Jinyuan Liu  

Authors

 
Placeholder
https://doi.org/10.6339/25-JDS1169
Pub. online: 3 March 2025      Type: Statistical Data Science      Open accessOpen Access

Received
1 October 2024
Accepted
9 January 2025
Published
3 March 2025

Abstract

In causal mediation analyses, of interest are the direct or indirect pathways from exposure to an outcome variable. For observation studies, massive baseline characteristics are collected as potential confounders to mitigate selection bias, possibly approaching or exceeding the sample size. Accordingly, flexible machine learning approaches are promising in filtering a subset of relevant confounders, along with estimation using the efficient influence function to avoid overfitting. Among various confounding selection strategies, two attract growing attention. One is the popular debiased, or double machine learning (DML), and another is the penalized partial correlation via fitting a Gaussian graphical network model between the confounders and the response variable. Nonetheless, for causal mediation analyses when encountering high-dimensional confounders, there is a gap in determining the best strategy for confounding selection. Therefore, we exemplify a motivating study on the human microbiome, where the dimensions of mediator and confounders approach or exceed the sample size to compare possible combinations of confounding selection methods. By deriving the multiply robust causal direct and indirect effects across various hypotheses, our comprehensive illustrations offer methodological implications on how the confounding selection impacts the final causal target parameter estimation while generating causality insights in demystifying the “gut-brain axis”. Our results highlighted the practicality and necessity of the discussed methods, which not only guide real-world applications for practitioners but also motivate future advancements for this crucial topic in the era of big data.

Supplementary material

 Supplementary Material
Contains Figures 2, 3, 4, and 5.

References

 
Booth SL, Centi A, Smith SR, Gundberg C (2013). The role of osteocalcin in human glucose metabolism: marker or mediator? Nature Reviews Endocrinology, 9(1): 43–55. https://doi.org/10.1038/nrendo.2012.201
 
Borsboom D, Cramer AO (2013). Network analysis: an integrative approach to the structure of psychopathology. Annual Review of Clinical Psychology, 9(1): 91–121. https://doi.org/10.1146/annurev-clinpsy-050212-185608
 
Chen J, Chen Z (2008). Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95(3): 759–771. https://doi.org/10.1093/biomet/asn034
 
Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W, et al. (2018). Double/debiased machine learning for treatment and structural parameters.
 
Chernozhukov V, Escanciano JC, Ichimura H, Newey WK, Robins JM (2022). Locally robust semiparametric estimation. Econometrica, 90(4): 1501–1535. https://doi.org/10.3982/ECTA16294
 
Chi WE, Huang S, Jeon M, Park ES, Melguizo T, Kezar A (2022). A practical guide to causal mediation analysis: illustration with a comprehensive college transition program and nonprogram peer and faculty interactions. Frontiers in Education, 7: 886722. https://doi.org/10.3389/feduc.2022.886722
 
Cho I, Blaser MJ (2012). The human microbiome: at the interface of health and disease. Nature Reviews. Genetics, 13(4): 260–270. https://doi.org/10.1038/nrg3182
 
Cohen-Kadosh K (2020). The role of the microbiota-gut-brain axis in brain development and mental health: Behavioural. ClinicalTrials.gov Identifier: NCT04616937. Updated November 15, 2020. Accessed November 28, 2022. Available at: https://clinicaltrials.gov/ct2/show/NCT04616937.
 
Costantini G, et al. (2015). Network analysis: A new perspective on personality psychology.
 
Epskamp S, Maris G, Waldorp LJ, Borsboom D (2018). Network psychometrics. In: Irwing P, Booth T, Hughes DJ (eds.), The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test Development, 953–986.
 
Farbmacher H, Huber M, Lafférs L, Langen H, Spindler M (2022). Causal mediation analysis with double machine learning. Econometrics Journal, 25(2): 277–300. https://doi.org/10.1093/ectj/utac003
 
Foster JA, Baker GB, Dursun SM (2021). The relationship between the gut microbiome-immune system-brain axis and major depressive disorder. Frontiers in Neurology, 12: 721126. https://doi.org/10.3389/fneur.2021.721126
 
Foygel R, Drton M (2010). Extended bayesian information criteria for gaussian graphical models. Advances in Neural Information Processing Systems, 23.
 
Gunzler D, Tang W, Lu N, Wu P, Tu X (2014). A class of distribution-free models for longitudinal mediation analysis. Psychometrika, 79: 543–568. https://doi.org/10.1007/s11336-013-9355-z
 
Holt-Lunstad J (2017). The potential public health relevance of social isolation and loneliness: prevalence, epidemiology, and risk factors. Public Policy & Aging Report, 27(4): 127–130. https://doi.org/10.1093/ppar/prx030
 
Hünermund P, Louw B, Caspi I (2023). Double machine learning and automated confounder selection: a cautionary tale. Journal of Causal Inference, 11(1): 20220078. https://doi.org/10.1515/jci-2022-0078
 
Imai K, Keele L, Tingley D (2010). A general approach to causal mediation analysis. Psychological Methods, 15(4): 309. https://doi.org/10.1037/a0020761
 
Koller D, Friedman N (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press.
 
Lauritzen S (1996). Graphical Models. Clarendon Press.
 
Lindquist MA (2012). Functional causal mediation analysis with an application to brain connectivity. Journal of the American Statistical Association, 107(500): 1297–1309. https://doi.org/10.1080/01621459.2012.695640
 
Liu J, Lin T, Chen T, Zhang X, Tu XM (2022). On semiparametric efficiency of an emerging class of regression models for between-subject attributes. arXiv preprint: https://arxiv.org/abs/2205.08036.
 
Liu J, Xu K, Wu T, Yao L, Nguyen TT, Jeste D, et al. (2023). Deciphering the ‘gut–brain axis’ through microbiome diversity. General Psychiatry, 36(5): e101090. https://doi.org/10.1136/gpsych-2023-101090
 
Liu J, Zhang X, Chen T, Wu T, Lin T, Jiang L, et al. (2022). A semiparametric model for between-subject attributes: applications to beta-diversity of microbiome data. Biometrics, 78(3): 950–962. https://doi.org/10.1111/biom.13487
 
Liu J, Zhang X, Lin T, Chen R, Zhong Y, Chen T, et al. (2024). A new paradigm for high-dimensional data: distance-based semiparametric feature aggregation framework via between-subject attributes. Scandinavian Journal of Statistics, 51(2): 672–696. https://doi.org/10.1111/sjos.12695
 
McGinty EE, Presskreischer R, Han H, Barry CL (2020). Psychological distress and loneliness reported by us adults in 2018 and April 2020. JAMA, 324(1): 93–94. https://doi.org/10.1001/jama.2020.9740
 
McNally RJ, Robinaugh DJ, Wu GW, Wang L, Deserno MK, Borsboom D (2015). Mental disorders as causal systems: a network approach to posttraumatic stress disorder. Clinical Psychological Science, 3(6): 836–849. https://doi.org/10.1177/2167702614553230
 
Meyer K, Lulla A, Debroy K, Shikany JM, Yaffe K, Meirelles O, et al. (2022). Association of the gut microbiota with cognitive function in midlife. JAMA Network Open, 5(2): e2143941. https://doi.org/10.1001/jamanetworkopen.2021.43941
 
Morais LH, Schreiber HL IV, Mazmanian SK (2021). The gut microbiota–brain axis in behaviour and brain disorders. Nature Review, Microbiology, 19(4): 241–255. https://doi.org/10.1038/s41579-020-00460-0
 
Murphy KP (2012). Machine Learning: A Probabilistic Perspective. MIT Press.
 
Nasreddine ZS, Phillips NA, Bédirian V, Charbonneau S, Whitehead V, Collin I, et al. (2005). The Montreal cognitive assessment, moca: a brief screening tool for mild cognitive impairment. Journal of the American Geriatrics Society, 53(4): 695–699. https://doi.org/10.1111/j.1532-5415.2005.53221.x
 
Neyman J (1979). $C(\alpha )$ tests and their use. Sankhya. Series A, 41(1/2): 1–21.
 
Nguyen TT, Zhang X, Wu TC, Liu J, Le C, Tu XM, et al. (2021). Association of loneliness and wisdom with gut microbial diversity and composition: an exploratory study. Frontiers in Psychiatry, 12: 648475. https://doi.org/10.3389/fpsyt.2021.648475
 
Northumbria University (2019). The cognitive effects of 6 weeks administration with a probiotic: a randomized, placebo controlled proof-of-concept study in healthy elderly humans. ClinicalTrials.gov Identifier: NCT03601559. Updated June 18, 2019. Accessed November 28, 2022. Available at: https://clinicaltrials.gov/ct2/show/NCT03601559.
 
Pearl J (2014). Interpretation and identification of causal mediation. Psychological Methods, 19(4): 459. https://doi.org/10.1037/a0036434
 
Robins JM, Greenland S (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology, 3(2): 143–155. https://doi.org/10.1097/00001648-199203000-00013
 
Rubin DB (1990). Formal mode of statistical inference for causal effects. Journal of Statistical Planning and Inference, 25(3): 279–292. https://doi.org/10.1016/0378-3758(90)90077-8
 
Sasso J, Ammar R, Tenchov R, Lemmel S, Kelber O, Grieswelle M, et al. (2023). Gut microbiome–brain alliance: a landscape view into mental and gastrointestinal health and disorders. ACS Chemical Neuroscience, 14(10): 1717–1763. https://doi.org/10.1021/acschemneuro.3c00127
 
Sgritta M, Dooling SW, Buffington SA, Momin EN, Francis MB, Britton RA, et al. (2019). Mechanisms underlying microbial-mediated changes in social behavior in mouse models of autism spectrum disorder. Neuron, 101(2): 246–259. https://doi.org/10.1016/j.neuron.2018.11.018
 
Tchetgen EJT, Shpitser I (2012). Semiparametric theory for causal mediation analysis: efficiency bounds, multiple robustness, and sensitivity analysis. The Annals of Statistics, 40(3): 1816.
 
Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 58(1): 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
 
Tsiatis AA (2006). Semiparametric Theory and Missing Data, volume 4. Springer.
 
Virgin HW, Todd JA (2011). Metagenomics and personalized medicine. Cell, 147(1): 44–56. https://doi.org/10.1016/j.cell.2011.09.009
 
Wang Z, van der Laan L, Petersen M, Gerds T, Kvist K, van der Laan M (2023). Targeted maximum likelihood based estimation for longitudinal mediation analysis. arXiv preprint.
 
Williams DR, Rast P (2020). Back to the basics: rethinking partial correlation network methodology. British Journal of Mathematical & Statistical Psychology, 73(2): 187–212. https://doi.org/10.1111/bmsp.12173
 
Wilson RS, De Leon CFM, Barnes LL, Schneider JA, Bienias JL, Evans DA, et al. (2002). Participation in cognitively stimulating activities and risk of incident Alzheimer disease. JAMA, 287(6): 742–748. https://doi.org/10.1001/jama.287.6.742
 
Xue F, Qu A (2022). Semi-standard partial covariance variable selection when irrepresentable conditions fail. Statistica Sinica, 32(4): 1881–1909.
 
Zheng W, Van Der Laan MJ (2012). Targeted maximum likelihood estimation of natural direct effects. The International Journal of Biostatistics, 8(1): 1–40. https://doi.org/10.2202/1557-4679.1361
 
Zou H (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476): 1418–1429. https://doi.org/10.1198/016214506000000735

PDF XML
PDF XML

Copyright
2025 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
efficient influence functions gut-brain axis multiply robust Neyman orthogonality regularization bias

Metrics
since February 2021
224

Article info
views

62

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy