×

Learning tractable Bayesian networks in the space of elimination orders. (English) Zbl 1478.68280

Summary: The computational complexity of inference is now one of the most relevant topics in the field of Bayesian networks. Although the literature contains approaches that learn Bayesian networks from high dimensional datasets, traditional methods do not bound the inference complexity of the learned models, often producing models where exact inference is intractable. This paper focuses on learning tractable Bayesian networks from data. To address this problem, we propose strategies for learning Bayesian networks in the space of elimination orders. In this manner, we can efficiently bound the inference complexity of the networks during the learning process. Searching in the combined space of directed acyclic graphs and elimination orders can be extremely computationally demanding. We demonstrate that one type of elimination trees, which we define as valid, can be used as an equivalence class of directed acyclic graphs and elimination orders, removing redundancy. We propose methods for incrementally compiling local changes made to directed acyclic graphs in elimination trees and for searching for elimination trees of low width. Using these methods, we can move through the space of valid elimination trees in polynomial time with respect to the number of network variables and in linear time with respect to treewidth. Experimental results show that our approach successfully bounds the inference complexity of the learned models, while it is competitive with other state-of-the-art methods in terms of fitting to data.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H22 Probabilistic graphical models
68Q25 Analysis of algorithms and problem complexity

Software:

ComputeTW
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Pearl, J., Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (1988), Morgan Kaufmann
[2] Koller, D.; Friedman, N., Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning (2009), The MIT Press · Zbl 1183.68483
[3] Bielza, C.; Larrañaga, P., Discrete Bayesian network classifiers: a survey, ACM Comput. Surv., 47, 1 (2014), Article 5 · Zbl 1322.68147
[4] Peña, J. M.; Lozano, J. A.; Larrañaga, P., Learning Bayesian networks for clustering by means of constructive induction, Pattern Recognit. Lett., 20, 1219-1230 (1999)
[5] Pham, D. T.; Ruz, G. A., Unsupervised training of Bayesian networks for data clustering, (Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, vol. 465 (2009), The Royal Society), 2927-2948 · Zbl 1186.68387
[6] Heckerman, D.; Geiger, D.; Chickering, D. M., Learning Bayesian networks: the combination of knowledge and statistical data, Mach. Learn., 20, 3, 197-243 (1995) · Zbl 0831.68096
[7] Cooper, G. F.; Herskovits, E., A Bayesian method for constructing Bayesian belief networks from databases, (Proceedings of the 7th Conference on Uncertainty in Artificial Intelligence (1991), Morgan Kaufmann Publishers Inc.), 86-94
[8] Cooper, G. F.; Herskovits, E., A Bayesian method for the induction of probabilistic networks from data, Mach. Learn., 9, 4, 309-347 (1992) · Zbl 0766.68109
[9] Akaike, H., A new look at the statistical model identification, IEEE Trans. Autom. Control, 19, 6, 716-723 (1974) · Zbl 0314.62039
[10] Schwarz, G., Estimating the dimension of a model, Ann. Stat., 6, 2, 461-464 (1978) · Zbl 0379.62005
[11] Bouckaert, R. R., Probabilistic network construction using the minimum description length principle, (Symbolic and Quantitative Approaches to Reasoning and Uncertainty, vol. 747 (1993), Springer), 41-48
[12] Lam, W.; Bacchus, F., Learning Bayesian belief networks: an approach based on the MDL principle, Comput. Intell., 10, 3, 269-293 (1994)
[13] Arnborg, S.; Corneil, D. G.; Proskurowski, A., Complexity of finding embeddings in a k-tree, SIAM J. Algebraic Discrete Methods, 8, 2, 277-284 (1987) · Zbl 0611.05022
[14] Grant, K.; Horsch, M. C., Methods for constructing balanced elimination trees and other recursive decompositions, Int. J. Approx. Reason., 50, 9, 1416-1424 (2009) · Zbl 1191.68677
[15] Shoikhet, K.; Geiger, D., A practical algorithm for finding optimal triangulations, (Proceedings of the 14th National Conference on Artificial Intelligence and Ninth Conference on Innovative Applications of Artificial Intelligence (1997), AAAI Press), 185-190
[16] Fomin, F. V.; Villanger, Y., Treewidth computation and extremal combinatorics, Combinatorica, 32, 3, 289-308 (2012) · Zbl 1289.05447
[17] Fomin, F. V.; Kratsch, D.; Todinca, I., Exact (exponential) algorithms for treewidth and minimum fill-in, (Autom. Lang. Program., vol. 3142 (2004), Springer), 568-580 · Zbl 1099.68076
[18] Bodlaender, H. L.; Fomin, F. V.; Koster, A. M.; Kratsch, D.; Thilikos, D. M., On Exact Algorithms for Treewidth (2006), Springer · Zbl 1131.68481
[19] Markowitz, H. M., The elimination form of the inverse and its application to linear programming, Manag. Sci., 3, 3, 255-269 (1957) · Zbl 0995.90592
[20] Kjærulff, U. B., Triangulation of Graphs-Algorithms Giving Small Total State Space (1990), Tech. rep.
[21] Rose, D. J.; Tarjan, R. E.; Lueker, G. S., Algorithmic aspects of vertex elimination on graphs, SIAM J. Comput., 5, 2, 266-283 (1976) · Zbl 0353.65019
[22] Tarjan, R. E.; Yannakakis, M., Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs, SIAM J. Comput., 13, 3, 566-579 (1984) · Zbl 0545.68062
[23] Berry, A.; Blair, J. R.; Heggernes, P.; Peyton, B. W., Maximum cardinality search for computing minimal triangulations of graphs, Algorithmica, 39, 4, 287-298 (2004) · Zbl 1090.68080
[24] Kjærulff, U., Optimal decomposition of probabilistic networks by simulated annealing, Stat. Comput., 2, 1, 7-17 (1992)
[25] Larrañaga, P.; Kuijpers, C. M.; Poza, M.; Murga, R. H., Decomposing Bayesian networks: triangulation of the moral graph with genetic algorithms, Stat. Comput., 7, 1, 19-34 (1997)
[26] Clautiaux, F.; Moukrim, A.; Nègre, S.; Carlier, J., Heuristic and metaheuristic methods for computing graph treewidth, RAIRO. Rech. Opér., 38, 1, 13-26 (2004) · Zbl 1092.90065
[27] Koster, A., Frequency Assignment: Models and Algorithms (1999), Maastricht University, Ph.D. thesis
[28] Bodlaender, H. L.; Koster, A. M., Treewidth computations I. Upper bounds, Inf. Comput., 208, 3, 259-275 (2010) · Zbl 1186.68328
[29] Bodlaender, H. L., A linear time algorithm for finding tree-decompositions of small treewidth, (Proceedings of the 25th Annual ACM Symposium on Theory of Computing (1993), ACM), 226-234 · Zbl 1310.05194
[30] Kwisthout, J., Most probable explanations in Bayesian networks: complexity and tractability, Int. J. Approx. Reason., 52, 9, 1452-1469 (2011) · Zbl 1242.68332
[31] Cooper, G. F., The computational complexity of probabilistic inference using Bayesian belief networks, Artif. Intell., 42, 2, 393-405 (1990) · Zbl 0717.68080
[32] Dagum, P.; Luby, M., Approximating probabilistic inference in Bayesian belief networks is NP-hard, Artif. Intell., 60, 1, 141-153 (1993) · Zbl 0781.68105
[33] Pearl, J., Reverend Bayes on inference engines: a distributed hierarchical approach, (Proceedings of the 2nd AAAI Conference on Artificial Intelligence (1982), AAAI Press), 133-136
[34] Kim, J. H.; Pearl, J., A computational model for causal and diagnostic reasoning in inference systems, (Proceedings of the 8th International Joint Conference on Artificial Intelligence, vol. 1 (1983), Morgan Kaufmann Publishers Inc.), 190-193
[35] Shachter, R. D., Evidence absorption and propagation through evidence reversals, (Proceedings of the 5th Annual Conference on Uncertainty in Artificial Intelligence (1990), North-Holland Publishing Co.), 173-190 · Zbl 0721.68076
[36] Zhang, N. L.; Poole, D., A simple approach to Bayesian network computations, (Proceedings of the 10th Canadian Conference on Artificial Intelligence (1994)), 171-178
[37] Pearl, J., A constraint propagation approach to probabilistic reasoning, (Proceedings of the 1st Annual Conference on Uncertainty in Artificial Intelligence (1985)), 357-369
[38] Darwiche, A., Recursive conditioning, Artif. Intell., 126, 1, 5-41 (2001) · Zbl 0969.68150
[39] Shenoy, P. P.; Shafer, G., Axioms for probability and belief-function propagation, (Proceedings of the 4th Annual Conference on Uncertainty in Artificial Intelligence (1990), North-Holland Publishing Co.), 169-198
[40] Jensen, F.; Lauritzen, S.; Olsen, K., Bayesian updating in recursive graphical models by local computation, CSQ, Comput. Stat. Q., 4, 269-282 (1990) · Zbl 0715.68076
[41] Boutilier, C.; Friedman, N.; Goldszmidt, M.; Koller, D., Context-specific independence in Bayesian networks, (Proceedings of the 12th International Conference on Uncertainty in Artificial Intelligence (1996), Morgan Kaufmann Publishers Inc.), 115-123
[42] Darwiche, A., A differential approach to inference in Bayesian networks, J. ACM, 50, 3, 280-305 (2003) · Zbl 1325.68226
[43] Niepert, M.; van den Broeck, G., Tractability through exchangeability: a new perspective on efficient probabilistic inference, (Proceedings of the 28th AAAI Conference on Artificial Intelligence (2014), AAAI Press), 2467-2475
[44] Korhonen, J.; Parviainen, P., Exact learning of bounded tree-width Bayesian networks, (Proceedings of the 16th International Conference on Artificial Intelligence and Statistics, vol. 31 (2013), Proceedings of Machine Learning Research), 370-378
[45] Berg, J.; Järvisalo, M.; Malone, B., Learning optimal bounded treewidth Bayesian networks via maximum satisfiability, (Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, vol. 33 (2014)), 86-95
[46] Nie, S.; Mauá, D. D.; de Campos, C. P.; Ji, Q., Advances in learning Bayesian networks of bounded treewidth, (Advances in Neural Information Processing Systems (2014)), 2285-2293
[47] Parviainen, P.; Shahrabi Farahani, H.; Lagergren, J., Learning bounded tree-width Bayesian networks using integer linear programming, (Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, vol. 33 (2014), Proceedings of Machine Learning Research), 751-759
[48] Elidan, G.; Gould, S., Learning bounded treewidth Bayesian networks, (Advances in Neural Information Processing Systems (2009)), 417-424
[49] Nie, S.; de Campos, C. P.; Ji, Q., Efficient learning of Bayesian networks with bounded tree-width, Int. J. Approx. Reason., 80, 412-427 (2017) · Zbl 1401.68267
[50] Scanagatta, M.; Corani, G.; de Campos, C. P.; Zaffalon, M., Learning treewidth-bounded Bayesian networks with thousands of variables, (Proceedings of the 30th International Conference on Neural Information Processing Systems (2016)), 1470-1478
[51] Scanagatta, M.; d. Campos, C. P.; Corani, G.; Zaffalon, M., Learning Bayesian networks with thousands of variables, (Proceedings of the 28th International Conference on Neural Information Processing Systems (2015), MIT Press), 1864-1872
[52] Cussens, J., Bayesian network learning with cutting planes, (Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence (2011), AUAI Press), 153-160
[53] Scanagatta, M.; Corani, G.; Zaffalon, M.; Yoo, J.; Kang, U., Efficient learning of bounded-treewidth Bayesian networks from complete and incomplete data sets, Int. J. Approx. Reason., 95, 152-166 (2018) · Zbl 1444.68161
[54] Bach, F. R.; Jordan, M. I., Thin junction trees, (Advances in Neural Information Processing Systems (2001)), 569-576
[55] Karger, D.; Srebro, N., Learning Markov networks: maximum bounded tree-width graphs, (Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms (2001), Society for Industrial and Applied Mathematics), 392-401 · Zbl 0987.68067
[56] Chechetka, A.; Guestrin, C., Efficient principled learning of thin junction trees, (Advances in Neural Information Processing Systems (2008)), 273-280
[57] Shahaf, D.; Guestrin, C., Learning thin junction trees via graph cuts, (Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, vol. 5 (2009), Proceedings of Machine Learning Research), 113-120
[58] Ahuja, R. K.; Magnanti, T. L.; Orlin, J. B., Network Flows: Theory, Algorithms, and Applications (1993), Prentice Hall · Zbl 1201.90001
[59] Lowd, D.; Domingos, P., Learning arithmetic circuits, (Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (2008), AUAI Press), 383-392
[60] Benjumeda, M.; Larrañaga, P.; Bielza, C., Learning Bayesian networks with low inference complexity, Progr. Artif. Intell., 5, 1, 15-26 (2015)
[61] Darwiche, A., Modeling and Reasoning with Bayesian Networks (2009), Cambridge University Press · Zbl 1231.68003
[62] Lowd, D.; Davis, J., Learning Markov network structure with decision trees, (IEEE International Conference on Data Mining (2010), IEEE), 334-343
[63] Van Haaren, J.; Davis, J., Markov network structure learning: a randomized feature generation approach, (Proceedings of the 26th AAAI Conference on Artificial Intelligence (2012)), 1148-1154
[64] Bekker, J.; Davis, J.; Choi, A.; Darwiche, A.; Broeck, G. V.d., Tractable learning for complex probability queries, (Proceedings of the 28th International Conference on Neural Information Processing Systems (2015), MIT Press), 2242-2250
[65] Holm, S., A simple sequentially rejective multiple test procedure, Scand. J. Stat., 65-70 (1979) · Zbl 0402.62058
[66] Shaffer, J. P., Modified sequentially rejective multiple test procedures, J. Am. Stat. Assoc., 81, 395, 826-831 (1986) · Zbl 0603.62087
[67] Garcia, S.; Herrera, F., An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons, J. Mach. Learn. Res., 9, 2677-2694 (2008) · Zbl 1225.68178
[68] Demšar, J., Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res., 7, 1-30 (2006) · Zbl 1222.68184
[69] van der Gaag, L. C.; de Waal, P. R., Multi-dimensional Bayesian network classifiers, (Proceedings of the 3rd European Workshop on Probabilistic Graphical Models (2006)), 107-114
[70] Bielza, C.; Li, G.; Larrañaga, P., Multi-dimensional classification with Bayesian networks, Int. J. Approx. Reason., 52, 6, 705-727 (2011) · Zbl 1226.68078
[71] Benjumeda, M.; Bielza, C.; Larrañnaga, P., Tractability of most probable explanations in multidimensional Bayesian network classifiers, Int. J. Approx. Reason., 93, Supplement C, 74-87 (2018) · Zbl 1452.68146
[72] Friedman, N., The Bayesian structural EM algorithm, (Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (1998), Morgan Kaufmann Publishers Inc.), 129-138
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.