Learning Markov equivalence classes of directed acyclic graphs: an objective Bayes approach. (English) Zbl 1407.62189

Summary: A Markov equivalence class contains all the Directed Acyclic Graphs (DAGs) encoding the same conditional independencies, and is represented by a Completed Partially Directed Acyclic Graph (CPDAG), also named Essential Graph (EG). We approach the problem of model selection among noncausal sparse Gaussian DAGs by directly scoring EGs, using an objective Bayes method. Specifically, we construct objective priors for model selection based on the Fractional Bayes Factor, leading to a closed form expression for the marginal likelihood of an EG. Next we propose a Markov Chain Monte Carlo (MCMC) strategy to explore the space of EGs using sparsity constraints, and illustrate the performance of our method on simulation studies, as well as on a real dataset. Our method provides a coherent quantification of inferential uncertainty, requires minimal prior specification, and shows to be competitive in learning the structure of the data-generating EG when compared to alternative state-of-the-art algorithms.


62H12 Estimation in multivariate analysis
62F15 Bayesian inference
05C20 Directed graphs (digraphs), tournaments
62H99 Multivariate analysis


glasso; TETRAD
Full Text: DOI Euclid


[1] Andersson, S. A., Madigan, D., and Perlman, M. D. (1997a). “A characterization of Markov equivalence classes for acyclic digraphs.” The Annals of Statistics, 25: 505–541. · Zbl 0876.60095
[2] Andersson, S. A., Madigan, D., and Perlman, M. D. (1997b). “On the Markov equivalence of chain graphs, undirected graphs, and acyclic digraphs.” Scandinavian Journal of Statistics, 24: 81–102. · Zbl 0918.60050
[3] Andersson, S. A., Madigan, D., and Perlman, M. D. (2001). “Alternative Markov properties for chain graphs.” Scandinavian Journal of Statistics, 28: 33–85. · Zbl 0972.60067
[4] Barbieri, M. M. and Berger, J. O. (2004). “Optimal predictive model selection.” The Annals of Statistics, 32: 870–897. · Zbl 1092.62033
[5] Bayarri, M. J., Berger, J. O., Forte, A., and García-Donato, G. (2012). “Criteria for Bayesian model choice with application to variable selection.” The Annals of Statistics, 40: 1550–1577. · Zbl 1257.62023
[6] Berger, J. O., Bernardo, J. M., and Sun, D. (2009). “The formal definition of reference priors.” The Annals of Statistics, 37: 905–938. · Zbl 1162.62013
[7] Berger, J. O. and Pericchi, L. R. (1996). “The intrinsic Bayes factor for model selection and prediction.” Journal of the American Statistical Association, 91: 109–122. · Zbl 0870.62021
[8] Bhadra, A. and Mallick, B. K. (2013). “Joint high-dimensional Bayesian variable and covariance selection with an application to eQTL analysis.” Biometrics, 69: 447–457. · Zbl 1274.62722
[9] Castelo, R. and Perlman, M. D. (2004). “Learning essential graph Markov models from data.” In Advances in Bayesian networks, volume 146 of Studies in Fuzziness and Soft Computing, 255–269. Springer, Berlin.
[10] Chen, J. and Chen, Z. (2008). “Extended Bayesian information criteria for model selection with large model spaces.” Biometrika, 95: 759–771. · Zbl 1437.62415
[11] Chickering, D. M. (2002). “Learning equivalence classes of Bayesian-network structures.” Journal of Machine Learning Research, 2: 445–498. · Zbl 1007.68179
[12] Colombo, D. and Maathuis, M. H. (2014). “Order-independent constraint-based causal structure learning.” Journal of Machine Learning Research, 15: 3921–3962. · Zbl 1312.68165
[13] Consonni, G., Forster, J. J., and La Rocca, L. (2013). “The Whetstone and the Alum Block: Balanced Objective Bayesian Comparison of Nested Models for Discrete Data.” Statistical Science, 28: 398–423. · Zbl 1331.62131
[14] Consonni, G. and La Rocca, L. (2012). “Objective Bayes Factors for Gaussian Directed Acyclic Graphical Models.” Scandinavian Journal of Statistics, 39: 743–756. · Zbl 1253.62015
[15] Consonni, G., La Rocca, L., and Peluso, S. (2017). “Objective Bayes Covariate-Adjusted Sparse Graphical Model Selection.” Scandinavian Journal of Statistics, 44: 741–764. · Zbl 06774144
[16] Consonni, G. and Veronese, P. (2008). “Compatibility of Prior Specifications Across Linear Models.” Statistical Science, 23: 332–353. · Zbl 1329.62331
[17] Cowell, R. G., Dawid, P. A., Lauritzen, S. L., and Spiegelhalter, D. J. (1999). Probabilistic Networks and Expert Systems. New York: Springer. · Zbl 0937.68121
[18] Dawid, A. P. (1981). “Some matrix-variate distribution theory: Notational considerations and a Bayesian application.” Biometrika, 68: 265–274. · Zbl 0464.62039
[19] Dawid, A. P. and Lauritzen, S. L. (1993). “Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models.” The Annals of Statistics, 21: 1272–1317. · Zbl 0815.62038
[20] Dor, D. and Tarsi, M. (1992). “Simple algorithm to construct a consistent extension of a partially oriented graph.” Technical Report R-185, Cognitive Systems Laboratory, UCLA.
[21] Drton, M. and Eichler, M. (2006). “Maximum likelihood estimation in Gaussian chain graph models under the alternative Markov property.” Scandinavian Journal of Statistics, 33: 247–257. · Zbl 1125.62086
[22] Drton, M. and Perlman, M. D. (2008). “A SINful approach to Gaussian graphical model selection.” Journal of Statistical Planning and Inference, 138: 1179–1200. · Zbl 1130.62068
[23] Fouskakis, D., Ntzoufras, I., and Draper, D. (2015). “Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models.” Bayesian Analysis, 10: 75–107. · Zbl 1335.62045
[24] Fouskakis, D., Ntzoufras, I., and Perrakis, K. (2017). “Power-Expected-Posterior Priors for Generalized Linear Models.” Bayesian Analysis. Advance publication. · Zbl 1392.62071
[25] Foygel, R. and Drton, M. (2010). “Extended Bayesian Information Criteria for Gaussian Graphical Models.” In Advances in Neural Information Processing Systems 23, 2020–2028.
[26] Friedman, J., Hastie, T., and Tibshirani, R. (2008). “Sparse inverse covariance estimation with the graphical lasso.” Biostatistics, 9: 432–441. · Zbl 1143.62076
[27] Friedman, N. (2004). “Inferring Cellular Networks Using Probabilistic Graphical Models.” Science, 303: 799–805.
[28] Geiger, D. and Heckerman, D. (2002). “Parameter priors for directed acyclic graphical models and the characterization of several probability distributions.” The Annals of Statistics, 30: 1412–1440. · Zbl 1016.62064
[29] Geisser, S. and Cornfield, J. (1963). “Posterior distributions for multivariate normal parameters.” Journal of the Royal Statistical Society. Series B (Methodological), 25: 368–376. · Zbl 0124.35304
[30] Gillispie, S. B. and Perlman, M. D. (2002). “The size distribution for Markov equivalence classes of acyclic digraph models.” Artificial Intelligence, 141: 137–155. · Zbl 1043.68096
[31] Gupta, A. K. and Nagar, D. K. (2000). Matrix variate distributions. Chapman & Hall/CRC, Boca Raton, FL. · Zbl 0935.62064
[32] Hauser, A. and Bühlmann, P. (2012). “Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs.” Journal of Machine Learning Research, 13: 2409–2464. · Zbl 1433.68346
[33] Hauser, A. and Bühlmann, P. (2015). “Jointly interventional and observational data: estimation of interventional Markov equivalence classes of directed acyclic graphs.” Journal of the Royal Statistical Society. Series B (Methodology), 77: 291–318.
[34] He, Y. and Geng, Z. (2008). “Active learning of causal networks with intervention experiments and optimal designs.” Journal of Machine Learning Research, 9: 2523–2547. · Zbl 1225.68184
[35] He, Y., Jia, J., and Yu, B. (2013). “Reversible MCMC on Markov equivalence classes of sparse directed acyclic graphs.” The Annals of Statistics, 41: 1742–1779. · Zbl 1360.62369
[36] Kalisch, M. and Bühlmann, P. (2007). “Estimating high-dimensional directed acyclic graphs with the PC-algorithm.” Journal of Machine Learning Research, 8: 613–36. · Zbl 1222.68229
[37] Kass, R. E. and Raftery, A. E. (1995). “Bayes Factors.” Journal of the American Statistical Association, 90: 773–795. · Zbl 0846.62028
[38] Koller, D. and Friedman, N. (2009). Probabilistic graphical models: principles and techniques. MIT press. · Zbl 1183.68483
[39] Lauritzen, S. L. (1996). Graphical Models. Oxford University Press. · Zbl 0907.62001
[40] Madigan, D., Andersson, S., Perlman, M., and Volinsky, C. (1996). “Bayesian Model Averaging And Model Selection For Markov Equivalence Classes Of Acyclic Digraphs.” Communications in Statistics: Theory and Methods, 2493–2519. · Zbl 0894.62032
[41] Moreno, E. (1997). “Bayes Factors for Intrinsic and Fractional Priors in Nested Models. Bayesian Robustness.” In Dodge, Y. (ed.), \(L_{1}\)-Statistical Procedures and Related Topics, 257–270. Institute of Mathematical Statistics. · Zbl 0953.62021
[42] Nagarajan, R. and Scutari, M. (2013). Bayesian Networks in R with Applications in Systems Biology. New York: Springer. · Zbl 1272.62005
[43] O’Hagan, A. (1995). “Fractional Bayes Factors for Model Comparison.” Journal of the Royal Statistical Society. Series B (Methodological), 57: 99–138.
[44] O’Hagan, A. and Forster, J. J. (2004). Bayesian Inference. Kendall’s Advanced Theory of Statistics. Arnold, 2nd edition.
[45] Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge. · Zbl 0959.68116
[46] Pearl, J. (2003). “Statistics and causal inference: A review.” Test, 12: 281–345. · Zbl 1044.62003
[47] Peréz, J. M. and Berger, J. O. (2002). “Expected-Posterior Prior Distributions for Model Selection.” Biometrika, 89: pp. 491–511. · Zbl 1036.62026
[48] Pericchi, L. R. (2005). “Model selection and hypothesis testing based on objective probabilities and Bayes factors.” In Dey, D. and Rao, C. R. (eds.), Bayesian thinking: modeling and computation, volume 25 of Handbook of Statistics, 115–149. Elsevier.
[49] Peters, J. and Bühlmann, P. (2014). “Identifiability of Gaussian structural equation models with equal error variances.” Biometrika, 101: 219–228. · Zbl 1285.62005
[50] Peterson, C., Stingo, F. C., and Vannucci, M. (2015). “Bayesian inference of multiple Gaussian graphical models.” Journal of the American Statistical Association, 110: 159–174. · Zbl 1373.62106
[51] Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D., and Nolan, G. (2005). “Causal protein-signaling networks derived from multiparameter single-cell data.” Science, 308: 523–529.
[52] Schwarz, G. E. (1978). “Estimating the dimension of a model.” The Annals of Statistics, 6: 461–464. · Zbl 0379.62005
[53] Shojaie, A. and Michailidis, G. (2009). “Analysis of gene sets based on the underlying regulatory network.” Journal of Computational Biology, 16: 407–26.
[54] Sonntag, D., Peña, J. M., and Gómez-Olmedo, M. (2015). “Approximate Counting of Graphical Models via MCMC Revisited.” International Journal of Intelligent Systems, 30: 384–420.
[55] Spirtes, P., Glymour, C., and Scheines, R. (2000). “Causation, Prediction and Search (2nd edition).” Cambridge, MA: The MIT Press., 1–16. · Zbl 0806.62001
[56] Verma, T. and Pearl, J. (1991). “Equivalence and Synthesis of Causal Models.” In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, UAI 90, 255–270. New York, NY, USA: Elsevier Science Inc.
[57] Womack, A. J., León-Novelo, L., and Casella, G. (2014). “Inference From Intrinsic Bayes’ Procedures Under Model Selection and Uncertainty.” Journal of the American Statistical Association, 109: 1040–1053. · Zbl 1368.62069
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.