×

Structure learning of contextual Markov networks using marginal pseudo-likelihood. (English) Zbl 1422.62190

Summary: Markov networks are popular models for discrete multivariate systems where the dependence structure of the variables is specified by an undirected graph. To allow for more expressive dependence structures, several generalizations of Markov networks have been proposed. Here, we consider the class of contextual Markov networks which takes into account possible context-specific independences among pairs of variables. Structure learning of contextual Markov networks is very challenging due to the extremely large number of possible structures. One of the main challenges has been to design a score, by which a structure can be assessed in terms of model fit related to complexity, without assuming chordality. Here, we introduce the marginal pseudo-likelihood as an analytically tractable criterion for general contextual Markov networks. Our criterion is shown to yield a consistent structure estimator. Experiments demonstrate the favourable properties of our method in terms of predictive accuracy of the inferred models.

MSC:

62H12 Estimation in multivariate analysis
62M05 Markov processes: estimation; hidden Markov models

Software:

MIM; UCI-ml
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Besag, Statistical analysis of non-lattice data, J. R. Stat. Soc. Ser. D. (The Statistician) 24 pp 179– (1975)
[2] Boutilier , C. Friedman , N. Goldszmidt , M. Koller , D. 1996 Context-specific independence in Bayesian networks Proceedings of the Twelfth Annual Conference on Uncertainty in Artificial Intelligence 115 123 Portland, Oregon
[3] Chickering , D. M. Heckerman , D. Meek , C. 1997 A Bayesian approach to learning Bayesian networks with local structure Proceedings of Thirteenth Conference on Uncertainty in Artificial Intelligence 80 89 Providence, Rhode Island
[4] Corander, Labelled graphical models, Scand. J. Stat. 30 pp 493– (2003) · Zbl 1034.62049 · doi:10.1111/1467-9469.00344
[5] Corander, Parallel interacting MCMC for learning of topologies of graphical models, Data Min. Knowl. Discov. 17 pp 431– (2008) · Zbl 05659228 · doi:10.1007/s10618-008-0099-9
[6] Corander , J. Janhunen , T. Rintanen , J. Nyman , H. Pensar , J. 2013 Learning chordal Markov networks by constraint satisfaction Advances in Neural Information Processing Systems 26 1349 1357 South Lake Tahoe, Nevada
[7] Csiszár, Consistent estimation of the basic neighborhood of Markov random fields, Ann. Statist. 34 pp 123– (2006) · Zbl 1102.62105 · doi:10.1214/009053605000000912
[8] Dawid, Hyper-Markov laws in the statistical analysis of decomposable graphical models, Ann. Statist. 21 pp 1272– (1993) · Zbl 0815.62038 · doi:10.1214/aos/1176349260
[9] Edwards, Introduction to graphical modelling (2000) · Zbl 0952.62003 · doi:10.1007/978-1-4612-0493-0
[10] Edwards, A fast procedure for model search in multidimensional contingency tables, Biometrika 72 pp 339– (1985) · Zbl 0576.62067 · doi:10.1093/biomet/72.2.339
[11] Ekeberg, Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models, Phys. Rev. E 87 pp 012707– (2013) · doi:10.1103/PhysRevE.87.012707
[12] Friedman , N. Goldszmidt , M. 1996 Learning Bayesian networks with local structure Proceedings of the Twelfth Annual Conference on Uncertainty in Artificial Intelligence 252 262 Portland, Oregon
[13] Geiger, Knowledge representation and inference in similarity networks and Bayesian multinets, Artificial Intelligence 82 pp 45– (1996) · doi:10.1016/0004-3702(95)00014-3
[14] Heckerman, Dependency networks for inference, collaborative filtering, and data visualization, J. Mach. Learn. Res. 1 pp 49– (2001) · Zbl 1008.68132
[15] Höfling, Estimation of sparse binary pairwise Markov networks using pseudo-likelihoods, J. Mach. Learn. Res. 10 pp 883– (2009) · Zbl 1245.62121
[16] Højsgaard, Split models for contingency tables, Comput. Statist. Data Anal. 42 pp 621– (2003) · Zbl 1429.62202 · doi:10.1016/S0167-9473(02)00119-6
[17] Janhunen, Learning discrete decomposable graphical models via constraint optimization, Stat. Comput. (2015) · Zbl 1505.62200 · doi:10.1007/s11222-015-9611-4
[18] Koller, Probabilistic graphical godels: Principles and techniques (2009)
[19] Lacampagne, An evaluation of the women and mathematics (wam) program and associated sex-related differences in the teaching, learning, and counseling of mathematics (1979)
[20] Lauritzen, Graphical models (1996)
[21] Lichman , M. 2013 UCI machine learning repository
[22] Liu , Q. Ihler , A. T. 2012 Distributed parameter estimation via pseudo-likelihood Proceedings of the 29th International Conference on Machine Learning 1487 1494 Edinburgh
[23] Madigan, Model selection and accounting for model uncertainty in graphical models using Occam ’s window, J. Amer. Statist. Assoc. 89 pp 1535– (1994) · Zbl 0814.62030 · doi:10.1080/01621459.1994.10476894
[24] Mizrahi , Y. D. Denil , M. de Freitas , N. 2014 Linear and parallel learning of Markov random fields Proceedings of the 31st International Conference on Machine Learning 199 207 Beijing
[25] Nyman, Stratified graphical models - context-specific independence in graphical models, Bayesian Anal. 9 pp 883– (2014) · Zbl 1327.62030 · doi:10.1214/14-BA882
[26] Nyman, Context-specific independence in graphical log-linear models, Comput. Statist. 31 pp 1493– (2016a) · Zbl 1348.65036 · doi:10.1007/s00180-015-0606-6
[27] Nyman, Marginal and simultaneous predictive classification using stratified graphical models, Adv. Data Anal. Classif. 10 pp 305– (2016b) · doi:10.1007/s11634-015-0199-5
[28] Pensar, Labeled directed acyclic graphs: A generalization of context-specific independence in directed graphical models, Data Min. Knowl. Discov. 29 pp 503– (2015) · doi:10.1007/s10618-014-0355-0
[29] Pensar, The role of local partial independence in learning of Bayesian networks, Internat. J. Approx. Reason. 69 pp 91– (2016a) · Zbl 1344.68192 · doi:10.1016/j.ijar.2015.11.008
[30] Pensar, Marginal pseudo-likelihood learning of discrete Markov network structures, Bayesian anal. (2016b) · Zbl 1384.62178 · doi:10.1214/16-BA1032
[31] Poole, Exploiting contextual independence in probabilistic inference, J. Artificial Intelligence Res. 18 pp 263– (2003) · Zbl 1056.68144
[32] Ravikumar, High-dimensional Ising model selection using 1-regularized logistic regression, Ann. Statist. 38 pp 1287– (2010) · Zbl 1189.62115 · doi:10.1214/09-AOS691
[33] Schwarz, Estimating the dimension of a model, Ann. Statist. 6 pp 461– (1978) · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[34] Sherrington, Solvable model of a spin-glass, Phys. Rev. Lett. 35 pp 1792– (1975) · doi:10.1103/PhysRevLett.35.1792
[35] Tjelmeland, Markov random fields with higher-order interactions, Scand. J. Stat. 25 pp 415– (1998) · Zbl 0928.60049 · doi:10.1111/1467-9469.00113
[36] Wainwright, Graphical models, exponential families, and variational inference, Foundations and Trends in Machine Learning 1 pp 1– (2008) · Zbl 1193.62107 · doi:10.1561/2200000001
[37] Whittaker, Graphical models in applied multivariate statistics (1990) · Zbl 0732.62056
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.