Journal of Data Science logo


Login Register

  1. Home
  2. To appear
  3. Causal Discovery for Observational Scien ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

Causal Discovery for Observational Sciences Using Supervised Machine Learning
Anne Helby Petersen   Joseph Ramsey   Claus Thorn Ekstrøm     All authors (4)

Authors

 
Placeholder
https://doi.org/10.6339/23-JDS1088
Pub. online: 13 March 2023      Type: Computing In Data Science      Open accessOpen Access

Received
16 July 2022
Accepted
1 February 2023
Published
13 March 2023

Abstract

Causal inference can estimate causal effects, but unless data are collected experimentally, statistical analyses must rely on pre-specified causal models. Causal discovery algorithms are empirical methods for constructing such causal models from data. Several asymptotically correct discovery methods already exist, but they generally struggle on smaller samples. Moreover, most methods focus on very sparse causal models, which may not always be a realistic representation of real-life data generating mechanisms. Finally, while causal relationships suggested by the methods often hold true, their claims about causal non-relatedness have high error rates. This non-conservative error trade off is not ideal for observational sciences, where the resulting model is directly used to inform causal inference: A causal model with many missing causal relations entails too strong assumptions and may lead to biased effect estimates. We propose a new causal discovery method that addresses these three shortcomings: Supervised learning discovery (SLdisco). SLdisco uses supervised machine learning to obtain a mapping from observational data to equivalence classes of causal models. We evaluate SLdisco in a large simulation study based on Gaussian data and we consider several choices of model size and sample size. We find that SLdisco is more conservative, only moderately less informative and less sensitive towards sample size than existing procedures. We furthermore provide a real epidemiological data application. We use random subsampling to investigate real data performance on small samples and again find that SLdisco is less sensitive towards sample size and hence seems to better utilize the information available in small datasets.

Supplementary material

 Supplementary Material
The following supplementary materials are available: Appendices: A: Terminology and notationB: Details about data simulationC: Details about the neural networkD: Results: Extra figuresE: Application: Extra figuresApplication data: Correlation matrices from the Metropolit data application and estimated adjacency matrices.GES simulation study results: Estimated adjacency matrices from the GES applications for the simulation study (estimated using TETRAD).Neural network models: Trained neural network models from the simulation study (.h5 files).Replication code: R code for replicating the simulation study and the application.

References

 
Allaire J, Chollet F (2021). keras: R Interface to ‘Keras’. R package version 2.4.0.
 
Chickering DM (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3: 507–554.
 
Colombo D, Maathuis MH, et al. (2014). Order-independent constraint-based causal structure learning. Journal of Machine Learning Research, 15(1): 3741–3782.
 
De Stavola BL, Nitsch D, dos Santos Silva I, McCormack V, Hardy R, Mann V, et al. (2006). Statistical issues in life course epidemiology. American Journal of Epidemiology, 163(1): 84–96. https://doi.org/10.1093/aje/kwj003.
 
Goudet O, Kalainathan D, Caillou P, Guyon I, Lopez-Paz D, Sebag M (2018). Learning functional causal models with generative neural networks. In: Explainable and Interpretable Models in Computer Vision and Machine Learning (HJ Escalante, S Escalera, I Guyon, X Baró, Y Güçlütürk, U Güçlü, M van Gerven, R van Lier, eds.), 39–80. Springer.
 
Greenland S, Pearl J, Robins JM (1999). Causal diagrams for epidemiologic research. Epidemiology, 10(1): 37–48. https://doi.org/10.1097/00001648-199901000-00008.
 
Hornik K, Stinchcombe M, White H (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5): 359–366. https://doi.org/10.1016/0893-6080(89)90020-8.
 
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, et al. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint: https://arxiv.org/abs/1704.04861.
 
Kalisch M, Bühlman P (2007). Estimating high-dimensional directed acyclic graphs with the pc-algorithm. Journal of Machine Learning Research, 8(3): 613–636.
 
Kalisch M, Mächler M, Colombo D, Maathuis MH, Bühlmann P (2012). Causal inference using graphical models with the r package pcalg. Journal of Statistical Software, 47(11): 1–26. https://doi.org/10.18637/jss.v047.i11.
 
Ke NR, Chiappa S, Wang J, Bornschein J, Weber T, Goyal A, et al. (2022). Learning to induce causal structure. arXiv preprint: https://arxiv.org/abs/2204.04875.
 
Li H, Xiao Q, Tian J (2020). Supervised whole dag causal discovery. arXiv preprint: https://arxiv.org/abs/2006.04697.
 
Meek C (1995). Causal inference and causal explanation with background knowledge. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, UAI’95, 403–410. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
 
Ogarrio JM, Spirtes P, Ramsey J (2016). A hybrid causal search algorithm for latent variable models. In: Conference on Probabilistic Graphical Models, 368–379. PMLR.
 
Osler M, Andersen AMN, Lund R, Batty GD, Hougaard CØ, Damsgaard MT, et al. (2004). Revitalising the metropolit 1953 danish male birth cohort: Background, aims and design. Paediatric and Perinatal Epidemiology, 18(5): 385–394. https://doi.org/10.1111/j.1365-3016.2004.00584.x.
 
Osler M, Lund R, Kriegbaum M, Christensen U, Andersen AMN (2006). Cohort profile: The metropolit 1953 Danish male birth cohort. International Journal of Epidemiology, 35(3): 541–545. https://doi.org/10.1093/ije/dyi300.
 
Pearl J (2009). Causality. Cambridge university press.
 
Perkovic E (2020). Identifying causal effects in maximally oriented partially directed acyclic graphs. In: Conference on Uncertainty in Artificial Intelligence, 530–539. PMLR.
 
Peters J, Janzing D, Schölkopf B (2017). Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press.
 
Petersen AH (2022). causalDisco: Tools for Causal Discovery on Observational Data. R package version 0.9.1.
 
Petersen AH, Osler M, Ekstrøm CT (2021). Data-driven model building for life-course epidemiology. American Journal of Epidemiology, 190(9): 1898–1907. https://doi.org/10.1093/aje/kwab087.
 
Ramsey J, Glymour M, Sanchez-Romero R, Glymour C (2017). A million variables and more: The fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. International Journal of Data Science and Analytics, 3(2): 121–129. https://doi.org/10.1007/s41060-016-0032-z.
 
Ramsey JD, Zhang K, Glymour M, Romero RS, Huang B, Ebert-Uphoff I, et al. (2018). Tetrad—a toolbox for causal discovery. In: 8th International Workshop on Climate Informatics.
 
Richardson T, Spirtes P (2002). Ancestral graph Markov models. The Annals of Statistics, 30(4): 962–1030. https://doi.org/10.1214/aos/1031689015.
 
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2008). The graph neural network model. IEEE Transactions on Neural Networks, 20(1): 61–80. https://doi.org/10.1109/TNN.2008.2005605.
 
Shah RD, Peters J (2020). The hardness of conditional independence testing and the generalised covariance measure. The Annals of Statistics, 48(3): 1514–1538. https://doi.org/10.1214/19-AOS1857.
 
Spirtes P, Glymour C (1991). An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review, 9(1): 62–72. https://doi.org/10.1177/089443939100900106.
 
Spirtes P, Glymour CN, Scheines R, Heckerman D (2000). Causation, Prediction, and Search. MIT press.
 
Von Stumm S, Plomin R (2015). Socioeconomic status and the growth of intelligence from infancy through adolescence. Intelligence, 48: 30–36. https://doi.org/10.1016/j.intell.2014.10.002.
 
Xu C, Xu W (2021). Causal structure learning with one-dimensional convolutional neural networks. IEEE Access, 9: 162147–162155. https://doi.org/10.1109/ACCESS.2021.3133496.
 
Yu Y, Chen J, Gao T, Yu M (2019). Dag-gnn: Dag structure learning with graph neural networks. In: International Conference on Machine Learning, 7154–7163. PMLR.
 
Zheng X, Aragam B, Ravikumar P, Xing EP (2018). Dags with no tears: Continuous optimization for structure learning. arXiv preprint: https://arxiv.org/abs/1803.01422.
 
Zheng X, Dan C, Aragam B, Ravikumar P, Xing E (2020). Learning sparse nonparametric dags. In: International Conference on Artificial Intelligence and Statistics, 3414–3425. PMLR.

Related articles PDF XML
Related articles PDF XML

Copyright
2023 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
causal learning causality deep learning neural network observational data structure learning

Funding
This work was funded by the Independent Research Fund Denmark (grant 8020-00031B) and the National Institutes of Health (grant AWD000044521364471).

Metrics (since February 2021)
7

Article info
views

0

Full article
views

9

PDF
downloads

10

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy