Causal Discovery for Observational Sciences Using Supervised Machine Learning

Petersen, Anne Helby; Ramsey, Joseph; Ekstrøm, Claus Thorn; Spirtes, Peter

doi:10.6339/23-JDS1088

Journal of Data Science

Causal Discovery for Observational Sciences Using Supervised Machine Learning

Volume 21, Issue 2 (2023): Special Issue: Symposium Data Science and Statistics 2022, pp. 255–280

Anne Helby Petersen Joseph Ramsey Claus Thorn Ekstrøm All authors (4)

https://doi.org/10.6339/23-JDS1088

Pub. online: 13 March 2023 Type: Computing In Data Science

Open Access

Received
16 July 2022

Accepted
1 February 2023

Published
13 March 2023

Abstract

Causal inference can estimate causal effects, but unless data are collected experimentally, statistical analyses must rely on pre-specified causal models. Causal discovery algorithms are empirical methods for constructing such causal models from data. Several asymptotically correct discovery methods already exist, but they generally struggle on smaller samples. Moreover, most methods focus on very sparse causal models, which may not always be a realistic representation of real-life data generating mechanisms. Finally, while causal relationships suggested by the methods often hold true, their claims about causal non-relatedness have high error rates. This non-conservative error trade off is not ideal for observational sciences, where the resulting model is directly used to inform causal inference: A causal model with many missing causal relations entails too strong assumptions and may lead to biased effect estimates. We propose a new causal discovery method that addresses these three shortcomings: Supervised learning discovery (SLdisco). SLdisco uses supervised machine learning to obtain a mapping from observational data to equivalence classes of causal models. We evaluate SLdisco in a large simulation study based on Gaussian data and we consider several choices of model size and sample size. We find that SLdisco is more conservative, only moderately less informative and less sensitive towards sample size than existing procedures. We furthermore provide a real epidemiological data application. We use random subsampling to investigate real data performance on small samples and again find that SLdisco is less sensitive towards sample size and hence seems to better utilize the information available in small datasets.

Supplementary material

Supplementary Material

The following supplementary materials are available: Appendices: A: Terminology and notationB: Details about data simulationC: Details about the neural networkD: Results: Extra figuresE: Application: Extra figuresApplication data: Correlation matrices from the Metropolit data application and estimated adjacency matrices.GES simulation study results: Estimated adjacency matrices from the GES applications for the simulation study (estimated using TETRAD).Neural network models: Trained neural network models from the simulation study (.h5 files).Replication code: R code for replicating the simulation study and the application.

References

Allaire J, Chollet F (2021). keras: R Interface to ‘Keras’. R package version 2.4.0.

Chickering DM (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3: 507–554.

Colombo D, Maathuis MH, et al. (2014). Order-independent constraint-based causal structure learning. Journal of Machine Learning Research, 15(1): 3741–3782.

De Stavola BL, Nitsch D, dos Santos Silva I, McCormack V, Hardy R, Mann V, et al. (2006). Statistical issues in life course epidemiology. American Journal of Epidemiology, 163(1): 84–96. https://doi.org/10.1093/aje/kwj003.

Goudet O, Kalainathan D, Caillou P, Guyon I, Lopez-Paz D, Sebag M (2018). Learning functional causal models with generative neural networks. In: Explainable and Interpretable Models in Computer Vision and Machine Learning (HJ Escalante, S Escalera, I Guyon, X Baró, Y Güçlütürk, U Güçlü, M van Gerven, R van Lier, eds.), 39–80. Springer.

Greenland S, Pearl J, Robins JM (1999). Causal diagrams for epidemiologic research. Epidemiology, 10(1): 37–48. https://doi.org/10.1097/00001648-199901000-00008.

Hornik K, Stinchcombe M, White H (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5): 359–366. https://doi.org/10.1016/0893-6080(89)90020-8.

Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, et al. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint: https://arxiv.org/abs/1704.04861.

Kalisch M, Bühlman P (2007). Estimating high-dimensional directed acyclic graphs with the pc-algorithm. Journal of Machine Learning Research, 8(3): 613–636.

Kalisch M, Mächler M, Colombo D, Maathuis MH, Bühlmann P (2012). Causal inference using graphical models with the r package pcalg. Journal of Statistical Software, 47(11): 1–26. https://doi.org/10.18637/jss.v047.i11.

Ke NR, Chiappa S, Wang J, Bornschein J, Weber T, Goyal A, et al. (2022). Learning to induce causal structure. arXiv preprint: https://arxiv.org/abs/2204.04875.

Li H, Xiao Q, Tian J (2020). Supervised whole dag causal discovery. arXiv preprint: https://arxiv.org/abs/2006.04697.

Meek C (1995). Causal inference and causal explanation with background knowledge. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, UAI’95, 403–410. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Ogarrio JM, Spirtes P, Ramsey J (2016). A hybrid causal search algorithm for latent variable models. In: Conference on Probabilistic Graphical Models, 368–379. PMLR.

Osler M, Andersen AMN, Lund R, Batty GD, Hougaard CØ, Damsgaard MT, et al. (2004). Revitalising the metropolit 1953 danish male birth cohort: Background, aims and design. Paediatric and Perinatal Epidemiology, 18(5): 385–394. https://doi.org/10.1111/j.1365-3016.2004.00584.x.

Osler M, Lund R, Kriegbaum M, Christensen U, Andersen AMN (2006). Cohort profile: The metropolit 1953 Danish male birth cohort. International Journal of Epidemiology, 35(3): 541–545. https://doi.org/10.1093/ije/dyi300.

Pearl J (2009). Causality. Cambridge university press.

Perkovic E (2020). Identifying causal effects in maximally oriented partially directed acyclic graphs. In: Conference on Uncertainty in Artificial Intelligence, 530–539. PMLR.

Peters J, Janzing D, Schölkopf B (2017). Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press.

Petersen AH (2022). causalDisco: Tools for Causal Discovery on Observational Data. R package version 0.9.1.

Petersen AH, Osler M, Ekstrøm CT (2021). Data-driven model building for life-course epidemiology. American Journal of Epidemiology, 190(9): 1898–1907. https://doi.org/10.1093/aje/kwab087.

Ramsey J, Glymour M, Sanchez-Romero R, Glymour C (2017). A million variables and more: The fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. International Journal of Data Science and Analytics, 3(2): 121–129. https://doi.org/10.1007/s41060-016-0032-z.

Ramsey JD, Zhang K, Glymour M, Romero RS, Huang B, Ebert-Uphoff I, et al. (2018). Tetrad—a toolbox for causal discovery. In: 8th International Workshop on Climate Informatics.

Richardson T, Spirtes P (2002). Ancestral graph Markov models. The Annals of Statistics, 30(4): 962–1030. https://doi.org/10.1214/aos/1031689015.

Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2008). The graph neural network model. IEEE Transactions on Neural Networks, 20(1): 61–80. https://doi.org/10.1109/TNN.2008.2005605.

Shah RD, Peters J (2020). The hardness of conditional independence testing and the generalised covariance measure. The Annals of Statistics, 48(3): 1514–1538. https://doi.org/10.1214/19-AOS1857.

Spirtes P, Glymour C (1991). An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review, 9(1): 62–72. https://doi.org/10.1177/089443939100900106.

Spirtes P, Glymour CN, Scheines R, Heckerman D (2000). Causation, Prediction, and Search. MIT press.

Von Stumm S, Plomin R (2015). Socioeconomic status and the growth of intelligence from infancy through adolescence. Intelligence, 48: 30–36. https://doi.org/10.1016/j.intell.2014.10.002.

Xu C, Xu W (2021). Causal structure learning with one-dimensional convolutional neural networks. IEEE Access, 9: 162147–162155. https://doi.org/10.1109/ACCESS.2021.3133496.

Yu Y, Chen J, Gao T, Yu M (2019). Dag-gnn: Dag structure learning with graph neural networks. In: International Conference on Machine Learning, 7154–7163. PMLR.

Zheng X, Aragam B, Ravikumar P, Xing EP (2018). Dags with no tears: Continuous optimization for structure learning. arXiv preprint: https://arxiv.org/abs/1803.01422.

Zheng X, Dan C, Aragam B, Ravikumar P, Xing E (2020). Learning sparse nonparametric dags. In: International Conference on Artificial Intelligence and Statistics, 3414–3425. PMLR.

2023 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

Open access article under the CC BY license.

Keywords

causal learning causality deep learning neural network observational data structure learning

Funding

This work was funded by the Independent Research Fund Denmark (grant 8020-00031B) and the National Institutes of Health (grant AWD000044521364471).

Metrics

since February 2021

1294

Article info
views

463

PDF
downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file