Supplementary Material

JDS

Journal of Data Science

1683-86021680-743X

1680-743X

School of Statistics, Renmin University of China

JDS1088

10.6339/23-JDS1088

Computing in Data Science

Causal Discovery for Observational Sciences Using Supervised Machine Learning

Petersen

Anne Helby

ahpe@sund.ku.dk1∗ Ramsey

Joseph

2 Ekstrøm

Claus Thorn

1 Spirtes

Peter

2 1Section of Biostatistics, Department of Public Health, University of Copenhagen, Denmark 2Department of Philosophy, Carnegie Mellon University, USA

∗Corresponding author. Email: ahpe@sund.ku.dk.

2023

1332023

212255280

Supplementary Material

The following supplementary materials are available: Appendices:

Terminology and notation

Details about data simulation

Details about the neural network

Results: Extra figures

Application: Extra figures

Application data:

Correlation matrices from the Metropolit data application and estimated adjacency matrices.

GES simulation study results:

Estimated adjacency matrices from the GES applications for the simulation study (estimated using TETRAD).

Neural network models:

Trained neural network models from the simulation study (.h5 files).

Replication code:

R code for replicating the simulation study and the application.

1672022122023

2023 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

2023

Open access article under the CC BY license.

Causal inference can estimate causal effects, but unless data are collected experimentally, statistical analyses must rely on pre-specified causal models. Causal discovery algorithms are empirical methods for constructing such causal models from data. Several asymptotically correct discovery methods already exist, but they generally struggle on smaller samples. Moreover, most methods focus on very sparse causal models, which may not always be a realistic representation of real-life data generating mechanisms. Finally, while causal relationships suggested by the methods often hold true, their claims about causal non-relatedness have high error rates. This non-conservative error trade off is not ideal for observational sciences, where the resulting model is directly used to inform causal inference: A causal model with many missing causal relations entails too strong assumptions and may lead to biased effect estimates. We propose a new causal discovery method that addresses these three shortcomings: Supervised learning discovery (SLdisco). SLdisco uses supervised machine learning to obtain a mapping from observational data to equivalence classes of causal models. We evaluate SLdisco in a large simulation study based on Gaussian data and we consider several choices of model size and sample size. We find that SLdisco is more conservative, only moderately less informative and less sensitive towards sample size than existing procedures. We furthermore provide a real epidemiological data application. We use random subsampling to investigate real data performance on small samples and again find that SLdisco is less sensitive towards sample size and hence seems to better utilize the information available in small datasets.

Keywords causal learning causality deep learning neural network observational data structure learning

National Institutes of Health

AWD000044521364471

This work was funded by the Independent Research Fund Denmark (grant 8020-00031B) and the National Institutes of Health (grant AWD000044521364471).

References

Allaire

, Chollet

(2021). keras: R Interface to ‘Keras’. R package version 2.4.0.

Chickering

(2002). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3: 507–554.

Colombo

, Maathuis

, et al. (2014). Order-independent constraint-based causal structure learning. Journal of Machine Learning Research, 15(1): 3741–3782.

De Stavola

, Nitsch

, dos Santos Silva

, McCormack

, Hardy

, Mann

, et al. (2006). Statistical issues in life course epidemiology. American Journal of Epidemiology, 163(1): 84–96. https://doi.org/10.1093/aje/kwj003.

Goudet

, Kalainathan

, Caillou

, Guyon

, Lopez-Paz

, Sebag

(2018). Learning functional causal models with generative neural networks. In: Explainable and Interpretable Models in Computer Vision and Machine Learning (

Escalante,

Escalera,

Guyon,

Baró,

Güçlütürk,

Güçlü,

van Gerven,

van Lier, eds.), 39–80. Springer.

Greenland

, Pearl

, Robins

(1999). Causal diagrams for epidemiologic research. Epidemiology, 10(1): 37–48. https://doi.org/10.1097/00001648-199901000-00008.

Hornik

, Stinchcombe

, White

(1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5): 359–366. https://doi.org/10.1016/0893-6080(89)90020-8.

Howard

, Zhu

, Chen

, Kalenichenko

, Wang

, Weyand

, et al. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint: https://arxiv.org/abs/1704.04861.

Kalisch

, Bühlman

(2007). Estimating high-dimensional directed acyclic graphs with the pc-algorithm. Journal of Machine Learning Research, 8(3): 613–636.

Kalisch

, Mächler

, Colombo

, Maathuis

, Bühlmann

(2012). Causal inference using graphical models with the r package pcalg. Journal of Statistical Software, 47(11): 1–26. https://doi.org/10.18637/jss.v047.i11.

, Chiappa

, Wang

, Bornschein

, Weber

, Goyal

, et al. (2022). Learning to induce causal structure. arXiv preprint: https://arxiv.org/abs/2204.04875.

, Xiao

, Tian

(2020). Supervised whole dag causal discovery. arXiv preprint: https://arxiv.org/abs/2006.04697.

Meek

(1995). Causal inference and causal explanation with background knowledge. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, UAI’95, 403–410. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Ogarrio

, Spirtes

, Ramsey

(2016). A hybrid causal search algorithm for latent variable models. In: Conference on Probabilistic Graphical Models, 368–379. PMLR.

Osler

, Andersen

AMN

, Lund

, Batty

, Hougaard

CØ

, Damsgaard

, et al. (2004). Revitalising the metropolit 1953 danish male birth cohort: Background, aims and design. Paediatric and Perinatal Epidemiology, 18(5): 385–394. https://doi.org/10.1111/j.1365-3016.2004.00584.x.

Osler

, Lund

, Kriegbaum

, Christensen

, Andersen

AMN

(2006). Cohort profile: The metropolit 1953 Danish male birth cohort. International Journal of Epidemiology, 35(3): 541–545. https://doi.org/10.1093/ije/dyi300.

Pearl

(2009). Causality. Cambridge university press.

Perkovic

(2020). Identifying causal effects in maximally oriented partially directed acyclic graphs. In: Conference on Uncertainty in Artificial Intelligence, 530–539. PMLR.

Peters

, Janzing

, Schölkopf

(2017). Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press.

Petersen

(2022). causalDisco: Tools for Causal Discovery on Observational Data. R package version 0.9.1.

Petersen

, Osler

, Ekstrøm

(2021). Data-driven model building for life-course epidemiology. American Journal of Epidemiology, 190(9): 1898–1907. https://doi.org/10.1093/aje/kwab087.

Ramsey

, Glymour

, Sanchez-Romero

, Glymour

(2017). A million variables and more: The fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. International Journal of Data Science and Analytics, 3(2): 121–129. https://doi.org/10.1007/s41060-016-0032-z.

Ramsey

, Zhang

, Glymour

, Romero

, Huang

, Ebert-Uphoff

, et al. (2018). Tetrad—a toolbox for causal discovery. In: 8th International Workshop on Climate Informatics.

Richardson

, Spirtes

(2002). Ancestral graph Markov models. The Annals of Statistics, 30(4): 962–1030. https://doi.org/10.1214/aos/1031689015.

Scarselli

, Gori

, Tsoi

, Hagenbuchner

, Monfardini

(2008). The graph neural network model. IEEE Transactions on Neural Networks, 20(1): 61–80. https://doi.org/10.1109/TNN.2008.2005605.

Shah

, Peters

(2020). The hardness of conditional independence testing and the generalised covariance measure. The Annals of Statistics, 48(3): 1514–1538. https://doi.org/10.1214/19-AOS1857.

Spirtes

, Glymour

(1991). An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review, 9(1): 62–72. https://doi.org/10.1177/089443939100900106.

Spirtes

, Glymour

, Scheines

, Heckerman

(2000). Causation, Prediction, and Search. MIT press.

Von Stumm

, Plomin

(2015). Socioeconomic status and the growth of intelligence from infancy through adolescence. Intelligence, 48: 30–36. https://doi.org/10.1016/j.intell.2014.10.002.

, Xu

(2021). Causal structure learning with one-dimensional convolutional neural networks. IEEE Access, 9: 162147–162155. https://doi.org/10.1109/ACCESS.2021.3133496.

, Chen

, Gao

, Yu

(2019). Dag-gnn: Dag structure learning with graph neural networks. In: International Conference on Machine Learning, 7154–7163. PMLR.

Zheng

, Aragam

, Ravikumar

, Xing

(2018). Dags with no tears: Continuous optimization for structure learning. arXiv preprint: https://arxiv.org/abs/1803.01422.

Zheng

, Dan

, Aragam

, Ravikumar

, Xing

(2020). Learning sparse nonparametric dags. In: International Conference on Artificial Intelligence and Statistics, 3414–3425. PMLR.