Learning Bayesian network structures using weakest mutual-information-first strategy. (English) Zbl 1468.68165

Summary: In Bayesian network structure learning, the quality of the directed graph learned by the constraint-based approaches can be greatly affected by the order of choosing variable pairs and the order of selecting condition sets for testing conditional independence. Inspired by the strong connection between the degree of mutual information shared by two variables and their conditional independence, we introduce the M-ordering concept, where a matrix is precomputed from the observational data with variables ordered increasingly by their respective degree of mutual information with the target variable under concern. Given the M-ordering matrix, we propose a strategy called Weakest Mutual-Information-First Strategy (WMIF), which is integrated into the PC-algorithm in two aspects: an MI-based edge removal strategy, and an MI-based condition set generation strategy. The MI-based edge removal strategy is to always select the variable pair with the weakest mutual information to test their conditional independence; the condition set generation strategy is to construct a conditioning set where variables bearing a weaker degree of mutual information with the target variable are always considered first. We prove that the weakest MI-based edge removal strategy is sound, and our PC-MI algorithm, a PC variant empowered by the WMIF strategy, is order-independent. Moreover, in PC algorithms, the number of conditional independence tests increases exponentially with the number of random variables; we show that the WMIF strategy can effectively reduce the complexity (bounded by \(o(| \mathbf{V} |(2^{|\operatorname{adj}(X)|} - \frac{| \mathbf{V} |^2}{2})))\). We have conducted experiments with both low-dimensional and high-dimensional data sets, and the results indicate that PC-MI outperforms the state-of-the-art approaches. More importantly, the order-agnostic property of PC-MI can be extremely useful when it is hard to prescribe a meaningful variable ordering as needed in some other PC algorithms.


68T05 Learning and adaptive systems in artificial intelligence
60J20 Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.)
62H22 Probabilistic graphical models
Full Text: DOI


[1] Aliferis, C. F.; Statnikov, A.; Tsamardinos, I.; Mani, S.; Koutsoukos, X. D., Local causal and Markov blanket induction for causal discovery and feature selection for classification part i: algorithms and empirical evaluation, J. Mach. Learn. Res., 11, Jan, 171-234 (2010) · Zbl 1242.68197
[2] Alonso-Barba, J. I.; de la Ossa, L.; Regnier-Coudert, O.; McCall, J.; Gámez, J. A.; Puerta, J. M., Ant colony and surrogate tree-structured models for orderings-based Bayesian network learning, (Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation (2015), ACM), 543-550
[3] Amirkhani, H.; Rahmati, M.; Lucas, P. J.; Hommersom, A., Exploiting experts’ knowledge for structure learning of Bayesian networks, IEEE Trans. Pattern Anal. Mach. Intell., 39, 11, 2154-2170 (2017)
[4] Cano, A.; Gómez-Olmedo, M.; Moral, S., A score based ranking of the edges for the pc algorithm, (Proceedings of the Fourth European Workshop on Probabilistic Graphical Models (2008)), 41-48
[5] Chen, X.-W.; Anantha, G.; Lin, X., Improving Bayesian network structure learning with mutual information-based node ordering in the k2 algorithm, IEEE Trans. Knowl. Data Eng., 20, 5, 628-640 (2008)
[6] Chickering, D. M.; Dan, G.; Heckerman, D., Learning Bayesian networks is np-hard, Networks, 112, 2, 121-130 (1994)
[7] Chickering, D. M.; Meek, C., Selective greedy equivalence search: finding optimal Bayesian networks using a polynomial number of score evaluations, arXiv preprint
[8] Colombo, D.; Maathuis, M. H., Order-independent constraint-based causal structure learning, J. Mach. Learn. Res., 15, 1, 3741-3782 (2014)
[9] Colombo, D.; Maathuis, M. H.; Kalisch, M.; Richardson, T. S., Learning high-dimensional directed acyclic graphs with latent and selection variables, Ann. Stat., 294-321 (2012) · Zbl 1246.62131
[10] Cover, T. M.; Thomas, J. A., Elements of Information Theory (2012), John Wiley & Sons
[11] Cover, T. M.; Thomas, J. A., Elements of Information Theory, Wiley Series in Telecommunications and Signal Processing (2017)
[12] de Campos, C. P.; Scanagatta, M.; Corani, G.; Zaffalon, M., Entropy-based pruning for learning Bayesian networks using bic, Artif. Intell., 260, 42-50 (2018) · Zbl 1445.68189
[13] Gao, T.; Fadnis, K.; Campbell, M., Local-to-global Bayesian network structure learning, (International Conference on Machine Learning (2017)), 1193-1202
[14] Gao, T.; Ji, Q., Efficient score-based Markov blanket discovery, Int. J. Approx. Reason., 80, 277-293 (2017) · Zbl 1401.68263
[15] Ghassami, A.; Salehkaleybar, S.; Kiyavash, N.; Zhang, K., Learning causal structures using regression invariance, (Advances in Neural Information Processing Systems (2017)), 3011-3021
[16] Gheisari, S.; Meybodi, M. R., Bnc-pso: structure learning of Bayesian networks by particle swarm optimization, Inf. Sci., 348, 272-289 (2016) · Zbl 1398.68435
[17] Harris, N.; Drton, M., Pc algorithm for nonparanormal graphical models, J. Mach. Learn. Res., 14, 1, 3365-3383 (2013) · Zbl 1318.62197
[18] Kalisch, M.; Bühlmann, P., Estimating high-dimensional directed acyclic graphs with the pc-algorithm, J. Mach. Learn. Res., 8, Mar, 613-636 (2007) · Zbl 1222.68229
[19] Kalisch, M.; Fellinghauer, B. A.; Grill, E.; Maathuis, M. H.; Mansmann, U.; Bühlmann, P.; Stucki, G., Understanding human functioning using graphical models, BMC Med. Res. Methodol., 10, 1, 14 (2010)
[20] Koller, D.; Friedman, N., Probabilistic Graphical Models: Principles and Techniques (2009), MIT Press
[21] Lee, S.; Honavar, V., On learning causal models from relational data, (AAAI (2016)), 3263-3270
[22] Leray, P.; Francois, O., BNT Structure Learning Package: Documentation and Experiments (2004), Universitè et INSA de Rouen, Laboratoire PSI, Tech. Rep.
[23] Lerner, B.; Afek, M.; Bojmel, R., Adaptive thresholding in structure learning of a Bayesian network, (IJCAI (2013)), 1458-1464
[24] Maier, M. E.; Taylor, B. J.; Oktay, H.; Jensen, D. D., Learning causal models of relational domains, (AAAI (2010))
[25] Nandy, P.; Hauser, A.; Maathuis, M. H., High-dimensional consistency in score-based and hybrid structure learning, arXiv preprint · Zbl 1411.62144
[26] Neapolitan, R. E., Probabilistic Reasoning in Expert Systems: Theory and Algorithms (1990), Wiley: Wiley New York
[27] O’Gorman, B.; Perdomo-Ortiz, A.; Babbush, R.; Aspuru-Guzik, A.; Smelyanskiy, V., Bayesian network structure learning using quantum annealing, Eur. Phys. J. Spec. Top., 224, 1, 163-188 (2015)
[28] Pearl, J., Causality: Models, Reasoning, and Inference (2000), Cambridge University Press: Cambridge University Press Cambridge · Zbl 0959.68116
[29] Spirtes, P.; Glymour, C. N.; Scheines, R., Causation, Prediction, and Search (2000), MIT Press
[30] Spirtes, P.; Zhang, K., Causal discovery and inference: concepts and recent methodological advances, (Applied Informatics, vol. 3 (2016), SpringerOpen), 3
[31] Stekhoven, D. J.; Moraes, I.; Sveinbjörnsson, G.; Hennig, L.; Maathuis, M. H.; Bühlmann, P., Causal stability ranking, Bioinformatics, 28, 21, 2819-2823 (2012)
[32] Tsamardinos, L.; Brown, L. E.; Aliferis, C. F.; Moore, A. W., The max-min hill-climbing Bayesian network structure learning algorithm, Mach. Learn., 65, 1, 31-78 (2006)
[33] Yu, K.; Li, J.; Liu, L., A review on algorithms for constraint-based causal discovery, arXiv preprint
[34] Yuan, C.; Malone, B., Learning optimal Bayesian networks: a shortest path perspective, J. Artif. Intell. Res., 48, 23-65 (2013) · Zbl 1361.68182
[35] Zhang, H.; Zhou, S.; Zhang, K.; Guan, J., Causal discovery using regression-based conditional independence tests, (AAAI (2017)), 1250-1256
[36] Zhang, K.; Gong, M.; Ramsey, J.; Batmanghelich, K.; Spirtes, P.; Glymour, C., Causal discovery in the presence of measurement error: identifiability conditions, arXiv preprint
[37] Zhang, K.; Huang, B.; Zhang, J.; Glymour, C.; Schölkopf, B., Causal discovery from nonstationary/heterogeneous data: skeleton estimation and orientation determination, (IJCAI: Proceedings of the Conference, vol. 2017 (2017), NIH Public Access), 1347
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.