Causal Discovery for Observational Sciences Using Supervised Machine Learning
Volume 21, Issue 2 (2023): Special Issue: Symposium Data Science and Statistics 2022, pp. 255–280
Pub. online: 13 March 2023
Type: Computing In Data Science
Open Access
Received
16 July 2022
16 July 2022
Accepted
1 February 2023
1 February 2023
Published
13 March 2023
13 March 2023
Abstract
Causal inference can estimate causal effects, but unless data are collected experimentally, statistical analyses must rely on pre-specified causal models. Causal discovery algorithms are empirical methods for constructing such causal models from data. Several asymptotically correct discovery methods already exist, but they generally struggle on smaller samples. Moreover, most methods focus on very sparse causal models, which may not always be a realistic representation of real-life data generating mechanisms. Finally, while causal relationships suggested by the methods often hold true, their claims about causal non-relatedness have high error rates. This non-conservative error trade off is not ideal for observational sciences, where the resulting model is directly used to inform causal inference: A causal model with many missing causal relations entails too strong assumptions and may lead to biased effect estimates. We propose a new causal discovery method that addresses these three shortcomings: Supervised learning discovery (SLdisco). SLdisco uses supervised machine learning to obtain a mapping from observational data to equivalence classes of causal models. We evaluate SLdisco in a large simulation study based on Gaussian data and we consider several choices of model size and sample size. We find that SLdisco is more conservative, only moderately less informative and less sensitive towards sample size than existing procedures. We furthermore provide a real epidemiological data application. We use random subsampling to investigate real data performance on small samples and again find that SLdisco is less sensitive towards sample size and hence seems to better utilize the information available in small datasets.
Supplementary material
Supplementary MaterialThe following supplementary materials are available:
Appendices:
A:
Terminology and notationB:
Details about data simulationC:
Details about the neural networkD:
Results: Extra figuresE:
Application: Extra figuresApplication data:
Correlation matrices from the Metropolit data application and estimated adjacency matrices.GES simulation study results:
Estimated adjacency matrices from the GES applications for the simulation study (estimated using TETRAD).Neural network models:
Trained neural network models from the simulation study (.h5 files).Replication code:
R code for replicating the simulation study and the application.
References
De Stavola BL, Nitsch D, dos Santos Silva I, McCormack V, Hardy R, Mann V, et al. (2006). Statistical issues in life course epidemiology. American Journal of Epidemiology, 163(1): 84–96. https://doi.org/10.1093/aje/kwj003.
Goudet O, Kalainathan D, Caillou P, Guyon I, Lopez-Paz D, Sebag M (2018). Learning functional causal models with generative neural networks. In: Explainable and Interpretable Models in Computer Vision and Machine Learning (HJ Escalante, S Escalera, I Guyon, X Baró, Y Güçlütürk, U Güçlü, M van Gerven, R van Lier, eds.), 39–80. Springer.
Greenland S, Pearl J, Robins JM (1999). Causal diagrams for epidemiologic research. Epidemiology, 10(1): 37–48. https://doi.org/10.1097/00001648-199901000-00008.
Hornik K, Stinchcombe M, White H (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5): 359–366. https://doi.org/10.1016/0893-6080(89)90020-8.
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, et al. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint: https://arxiv.org/abs/1704.04861.
Kalisch M, Mächler M, Colombo D, Maathuis MH, Bühlmann P (2012). Causal inference using graphical models with the r package pcalg. Journal of Statistical Software, 47(11): 1–26. https://doi.org/10.18637/jss.v047.i11.
Ke NR, Chiappa S, Wang J, Bornschein J, Weber T, Goyal A, et al. (2022). Learning to induce causal structure. arXiv preprint: https://arxiv.org/abs/2204.04875.
Li H, Xiao Q, Tian J (2020). Supervised whole dag causal discovery. arXiv preprint: https://arxiv.org/abs/2006.04697.
Osler M, Andersen AMN, Lund R, Batty GD, Hougaard CØ, Damsgaard MT, et al. (2004). Revitalising the metropolit 1953 danish male birth cohort: Background, aims and design. Paediatric and Perinatal Epidemiology, 18(5): 385–394. https://doi.org/10.1111/j.1365-3016.2004.00584.x.
Osler M, Lund R, Kriegbaum M, Christensen U, Andersen AMN (2006). Cohort profile: The metropolit 1953 Danish male birth cohort. International Journal of Epidemiology, 35(3): 541–545. https://doi.org/10.1093/ije/dyi300.
Petersen AH, Osler M, Ekstrøm CT (2021). Data-driven model building for life-course epidemiology. American Journal of Epidemiology, 190(9): 1898–1907. https://doi.org/10.1093/aje/kwab087.
Ramsey J, Glymour M, Sanchez-Romero R, Glymour C (2017). A million variables and more: The fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. International Journal of Data Science and Analytics, 3(2): 121–129. https://doi.org/10.1007/s41060-016-0032-z.
Richardson T, Spirtes P (2002). Ancestral graph Markov models. The Annals of Statistics, 30(4): 962–1030. https://doi.org/10.1214/aos/1031689015.
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2008). The graph neural network model. IEEE Transactions on Neural Networks, 20(1): 61–80. https://doi.org/10.1109/TNN.2008.2005605.
Shah RD, Peters J (2020). The hardness of conditional independence testing and the generalised covariance measure. The Annals of Statistics, 48(3): 1514–1538. https://doi.org/10.1214/19-AOS1857.
Spirtes P, Glymour C (1991). An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review, 9(1): 62–72. https://doi.org/10.1177/089443939100900106.
Von Stumm S, Plomin R (2015). Socioeconomic status and the growth of intelligence from infancy through adolescence. Intelligence, 48: 30–36. https://doi.org/10.1016/j.intell.2014.10.002.
Xu C, Xu W (2021). Causal structure learning with one-dimensional convolutional neural networks. IEEE Access, 9: 162147–162155. https://doi.org/10.1109/ACCESS.2021.3133496.
Zheng X, Aragam B, Ravikumar P, Xing EP (2018). Dags with no tears: Continuous optimization for structure learning. arXiv preprint: https://arxiv.org/abs/1803.01422.