Supplementary Material

JDS

Journal of Data Science

1683-86021680-743X

1680-743X

School of Statistics, Renmin University of China

JDS1097

10.6339/23-JDS1097

Statistical Data Science

FROSTY: A High-Dimensional Scale-Free Bayesian Network Learning Method

Bang

Joshua

joshuaybang@gmail.com1∗ Oh

Sang-Yun

2 1Department of Statistics and Applied Probability, University of California, Santa Barbara, Santa Barbara, CA, USA 2Lawrence Berkeley National Lab, Berkeley, CA, USA

∗Corresponding author. Email: joshuaybang@gmail.com.

2023

1432023

212354367

Supplementary Material

Supplementary material includes proofs of the theorems and the Python code for simulation.

3172022732023

2023 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

2023

Open access article under the CC BY license.

We propose a scalable Bayesian network learning algorithm based on sparse Cholesky decomposition. Our approach only requires observational data and user-specified confidence level as inputs and can estimate networks with thousands of variables. The computational complexity of the proposed method is O ( p 3 ) for a graph with p vertices. Extensive numerical experiments illustrate the usefulness of our method with promising results. In simulation, the initial step in our approach also improves an alternative Bayesian network structure estimation method that uses an undirected graph as an input.

Keywords Bayesian networks robust selection sparse Cholesky decomposition structure learning

References

Abramson

, Brown

, Edwards

, Murphy

, Winkler

(1996). Hailfinder: A Bayesian system for forecasting severe weather. International Journal of Forecasting, 12(1): 57–71. https://doi.org/10.1016/0169-2070(95)00664-8

Amestoy

, Davis

, Duff

(1996). An approximate minimum degree ordering algorithm. SIAM Journal on Matrix Analysis and Applications, 17(4): 886–905. https://doi.org/10.1137/S0895479894278952

Barabási

, Albert

(1999). Emergence of scaling in random networks. Science, 286(5439): 509–512. https://doi.org/10.1126/science.286.5439.509

Cisneros-Velarde

, Petersen

, Oh

(2020). Distributionally robust formulation and model selection for the graphical lasso. In: International Conference on Artificial Intelligence and Statistics, 756–765, PMLR.

Friedman

, Hastie

, Tibshirani

(2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3): 432–441. https://doi.org/10.1093/biostatistics/kxm045

Friedman

(2004). Inferring cellular networks using probabilistic graphical models. Science, 303(5659): 799–805. https://doi.org/10.1126/science.1094068

Goodfellow

, Bengio

, Courville

, Bengio

(2016). Deep learning, volume 1. MIT press, Cambridge.

Hastie

, Tibshirani

, Friedman

(2009). The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer.

Heckerman

, Nathwani

(1992). An evaluation of the diagnostic accuracy of pathfinder. Computers and Biomedical Research, 25(1): 56–74. https://doi.org/10.1016/0010-4809(92)90035-9

Huang

, Acero

, Hon

, Reddy

(2001). Spoken language processing: A guide to theory, algorithm, and system development. Prentice hall PTR.

Luo

, Savakis

, Singhal

(2005). A Bayesian network-based framework for semantic image understanding. Pattern recognition, 38(6): 919–934. https://doi.org/10.1016/j.patcog.2004.11.001

Pearl

(2014). Probabilistic reasoning in intelligent systems: networks of plausible inference. Elsevier.

Pourahmadi

(1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation. Biometrika, 86(3): 677–690. https://doi.org/10.1093/biomet/86.3.677

Raskutti

, Uhler

(2018). Learning directed acyclic graph models based on sparsest permutations. Stat, 7(1): e183. https://doi.org/10.1002/sta4.183

Robinson

(1977). Counting unlabeled acyclic digraphs. In: Combinatorial Mathematics V: Proceedings of the Fifth Australian Conference, Held at the Royal Melbourne Institute of Technology, August 24–26, 1976, 28–43. Springer.

Rose

(1970). Symmetric elimination on sparse positive definite systems and the potential flow network problem, Ph.D. thesis, Harvard University.

Rose

(1972). A graph-theoretic study of the numerical solution of sparse positive definite systems of linear equations. In: Graph theory and computing, 183–217. Elsevier.

Rue

, Held

(2010). Discrete spatial variation. In: Handbook of spatial statistics, 171–200.

Sachs

, Perez

, Pe’er

, Lauffenburger

, Nolan

(2005). Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308(5721): 523–529. https://doi.org/10.1126/science.1105809

Sioutis

, Condotta

(2014). Tackling large qualitative spatial networks of scale-free-like structure. In: Artificial Intelligence: Methods and Applications: 8th Hellenic Conference on AI, SETN 2014, Ioannina, Greece, May 15–17, 2014. Proceedings 8, 178–191. Springer.

Spirtes

, Glymour

, Scheines

, Heckerman

(2000). Causation, prediction, and search. MIT press.

Squires

, Amaniampong

, Uhler

(2020). Efficient permutation discovery in causal dags. arXiv preprint: https://arxiv.org/abs/2011.03610.

Tran

, Cisneros-Velarde

, Oh

, Petersen

(2022). Family-wise error rate control in Gaussian graphical model selection via distributionally robust optimization. Stat, 11(1): e477, Wiley Online Library.

Van de Geer

, Bühlmann

, et al. (2013). ℓ 0

-penalized maximum likelihood for sparse directed acyclic graphs. Annals of Statistics, 41(2): 536–567.

Verma

, Pearl

(2022). Equivalence and synthesis of causal models. In: Probabilistic and Causal Inference: The Works of Judea Pearl. 221–236.

Wainwright

, Jordan

(2008). Graphical models, exponential families, and variational inference. Now Publishers Inc.

Yannakakis

(1981). Computing the minimum fill-in is np-complete. SIAM Journal on Algebraic Discrete Methods, 2(1): 77–79. https://doi.org/10.1137/0602010

IEEE Transactions on Pattern Analysis and Machine Intelligence