×

Hierarchical estimation of parameters in Bayesian networks. (English) Zbl 1507.62005

Summary: A novel approach for parameter estimation in Bayesian networks is presented. The main idea is to introduce a hyper-prior in the Multinomial-Dirichlet model, traditionally used for conditional distribution estimation in Bayesian networks. The resulting hierarchical model jointly estimates different conditional distributions belonging to the same conditional probability table, thus borrowing statistical strength from each other. An analytical study of the dependence structure a priori induced by the hierarchical model is performed and an ad hoc variational algorithm for fast and accurate inference is derived. The proposed hierarchical model yields a major performance improvement in classification with Bayesian networks compared to traditional models. The proposed variational algorithm reduces by two orders of magnitude the computational time, with the same accuracy in parameter estimation, compared to traditional MCMC methods. Moreover, motivated by a real case study, the hierarchical model is applied to the estimation of Bayesian networks parameters by borrowing strength from related domains.

MSC:

62-08 Computational methods for problems pertaining to statistics
62F15 Bayesian inference
68T35 Theory of languages and software systems (knowledge-based systems, expert systems, etc.) for artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

PMTK; ADVI; BayesDA; bnlearn; Stan
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Bagnall, A.; Cawley, G. C., On the use of default parameter settings in the empirical evaluation of classification algorithms (2017), arXiv preprint arXiv:1703.06777
[2] Bartlett, M.; Cussens, J., Integer linear programming for the Bayesian network structure learning problem, Artificial Intelligence, 244, 258-271 (2017) · Zbl 1404.68094
[3] Blei, D. M.; Kucukelbir, A.; McAuliffe, J. D., Variational inference: A review for statisticians, J. Amer. Statist. Assoc., 112, 518, 859-877 (2017)
[4] Blei, D. M.; Ng, A. Y.; Jordan, M. I., Latent Dirichlet allocation, J. Mach. Learn. Res., 3, 993-1022 (2003) · Zbl 1112.68379
[5] Bradley, A. P., The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recognit., 30, 7, 1145-1159 (1997)
[6] Bucher, D., Cellina, F., Mangili, F., Raubal, M., Rudel, R., Rizzoli, A.E., Elabed, O., 2016. Exploiting fitness apps for sustainable mobility - Challenges deploying the GoEco! App. In: 4th Int. Conf. ICT Sustain. (ICT4S 2016), September 2016, pp. 89-98.; Bucher, D., Cellina, F., Mangili, F., Raubal, M., Rudel, R., Rizzoli, A.E., Elabed, O., 2016. Exploiting fitness apps for sustainable mobility - Challenges deploying the GoEco! App. In: 4th Int. Conf. ICT Sustain. (ICT4S 2016), September 2016, pp. 89-98.
[7] Campos, C. P.d.; Ji, Q., Efficient structure learning of Bayesian networks using constraints, J. Mach. Learn. Res., 12, Mar, 663-689 (2011) · Zbl 1280.68226
[8] Carpenter, B.; Gelman, A.; Hoffman, M.; Lee, D.; Goodrich, B.; Betancourt, M.; Brubaker, M.; Guo, J.; Li, P.; Riddell, A., Stan: A probabilistic programming language, J. Stat. Softw., 76, 1, 1-32 (2017)
[9] Casella, G.; Moreno, E., Assessing robustness of intrinsic tests of independence in two-way contingency tables, J. Amer. Statist. Assoc., 104, 487, 1261-1271 (2009) · Zbl 1328.62359
[10] Cellina, F., Bucher, D., Rudel, R., Raubal, M., Andrea, E., 2016. Promoting sustainable mobility styles using eco- feedback and gamification elements : Introducing the GoEco! Living lab experiment. In: 4th Eur. Conf. Behav. Energy Effic. (Behave 2016), September, pp. 8-9.; Cellina, F., Bucher, D., Rudel, R., Raubal, M., Andrea, E., 2016. Promoting sustainable mobility styles using eco- feedback and gamification elements : Introducing the GoEco! Living lab experiment. In: 4th Eur. Conf. Behav. Energy Effic. (Behave 2016), September, pp. 8-9.
[11] Darwiche, A., Modeling and Reasoning with Bayesian Networks (2009), Cambridge University Press · Zbl 1231.68003
[12] Darwiche, A., Bayesian networks, Commun. ACM, 53, 12, 80-90 (2010)
[13] Demichelis, F.; Magni, P.; Piergiorgi, P.; Rubin, M. A.; Bellazzi, R., A hierarchical naive Bayes model for handling sample heterogeneity in classification problems: an application to tissue microarrays, BMC Bioinformatics, 7, 1, 514 (2006)
[14] Domingos, P.; Pazzani, M., On the optimality of the simple Bayesian classifier under zero-one loss, Mach. Learn., 29, 2, 103-130 (1997) · Zbl 0892.68076
[15] Fernández Delgado, M.; Cernadas, E.; Barro, S.; Amorim, D., Do we need hundreds of classifiers to solve real world classification problems, J. Mach. Learn. Res., 15, 1, 3133-3181 (2014) · Zbl 1319.62005
[16] Friedman, J., On bias, variance, 0/1 - loss, and the curse-of-dimensionality, Data Min. Knowl. Discov., 1, 55-77 (1997)
[17] Friedman, N.; Geiger, D.; Goldszmidt, M., Bayesian networks classifiers, Mach. Learn., 29, 2/3, 131-163 (1997) · Zbl 0892.68077
[18] Gelman, A.; Carlin, J. B.; Stern, H. S.; Dunson, D. B.; Vehtari, A.; Rubin, D. B., Bayesian Data Analysis (2014), CRC press · Zbl 1279.62004
[19] Grzegorczyk, M.; Husmeier, D., A non-homogeneous dynamic Bayesian network with sequentially coupled interaction parameters for applications in systems and synthetic biology, Stat. Appl. Genet. Mol. Biol., 11, 4 (2012) · Zbl 1296.92039
[20] Jaakkola, T. S.; Jordan, M. I., Bayesian Parameter estimation via variational methods, Stat. Comput., 10, 1, 25-37 (2000)
[21] Jordan, M. I.; Ghahramani, Z.; Jaakkola, T. S.; Saul, L. K., An introduction to variational methods for graphical models, Mach. Learn., 37, 2, 183-233 (1999) · Zbl 0945.68164
[22] Kim, D.-k.; Voelker, G.; Saul, L., A variational approximation for topic modeling of hierarchical corpora, (Proceedings of the 30th International Conference on Machine Learning, vol. 28 (2013), PMLR), 55-63
[23] Koller, D.; Friedman, N., Probabilistic Graphical Models: Principles and Techniques (2009), MIT press
[24] Kucukelbir, A.; Tran, D.; Ranganath, R.; Gelman, A.; Blei, D. M., Automatic differentiation variational inference, J. Mach. Learn. Res., 18, 1, 430-474 (2017)
[25] Leday, G. G.; de Gunst, M. C.; Kpogbezan, G. B.; Van der Vaart, A. W.; Van Wieringen, W. N.; Van de Wiel, M. A., Gene network reconstruction using global-local shrinkage priors, Ann. Appl. Stat., 11, 1, 41 (2017) · Zbl 1366.62227
[26] Malovini, A.; Barbarini, N.; Bellazzi, R.; De Michelis, F., Hierarchical Naive Bayes for genetic association studies, BMC Bioinformatics, 13, 14, S6 (2012)
[27] Murphy, K. P., Machine Learning: A Probabilistic Perspective (2012), MIT press · Zbl 1295.68003
[28] Niculescu-Mizil, A., Caruana, R., 2007. Inductive transfer for Bayesian network structure learning. In: Proc. Artificial Intelligence and Statistics, pp. 339-346.; Niculescu-Mizil, A., Caruana, R., 2007. Inductive transfer for Bayesian network structure learning. In: Proc. Artificial Intelligence and Statistics, pp. 339-346.
[29] Oates, C. J.; Smith, J. Q.; Mukherjee, S.; Cussens, J., Exact estimation of multiple directed acyclic graphs, Stat. Comput., 26, 4, 797-811 (2016) · Zbl 1505.62300
[30] Pan, S. J.; Yang, Q., A survey on transfer learning, IEEE Trans. Knowl. Data Eng., 22, 10, 1345-1359 (2010)
[31] Petitjean, F.; Buntine, W.; Webb, G. I.; Zaidi, N., Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes, Mach. Learn., 107, 8, 1303-1331 (2018) · Zbl 1473.68189
[32] Scanagatta, M.; Corani, G.; de Campos, C.; Zaffalon, M., Learning treewidth-bounded Bayesian networks with thousands of variables, (Lee, D. D.; Sugiyama, M.; Luxburg, U. V.; Guyon, I.; Garnett, R., NIPS 2016: Advances in Neural Information Processing Systems, vol. 29 (2016))
[33] Scutari, M., Learning Bayesian networks with the bnlearn R package, J. Stat. Softw., 35, 3 (2010)
[34] Teh, Y. W.; Jordan, M. I.; Beal, M. J.; Blei, D. M., Hierarchical Dirichlet processes, J. Amer. Statist. Assoc., 101, 476, 1566-1581 (2006) · Zbl 1171.62349
[35] Wainberg, M.; Alipanahi, B.; Frey, B. J., Are random forests truly the best classifiers?, J. Mach. Learn. Res., 17, 1, 3837-3841 (2016)
[36] Wainwright, M. J.; Jordan, M. I., Graphical models, exponential families, and variational inference, Found. Trends Mach. Learn., 1, 1-2, 1-305 (2008) · Zbl 1193.62107
[37] Webb, G. I.; Boughton, J. R.; Wang, Z., Not so naive Bayes: aggregating one-dependence estimators, Mach. Learn., 58, 1, 5-24 (2005) · Zbl 1075.68078
[38] Yuan, C., Malone, B., Wu, X., 2011. Learning optimal Bayesian networks using A* search. In: Proc. IJCAI - International Joint Conference on Artificial Intelligence, vol. 22, p. 2186.; Yuan, C., Malone, B., Wu, X., 2011. Learning optimal Bayesian networks using A* search. In: Proc. IJCAI - International Joint Conference on Artificial Intelligence, vol. 22, p. 2186.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.