Hinge-loss Markov random fields and probabilistic soft logic.

*(English)*Zbl 1435.68252Summary: A fundamental challenge in developing high-impact machine learning technologies is balancing the need to model rich, structured domains with the ability to scale to big data. Many important problem areas are both richly structured and large scale, from social and biological networks, to knowledge graphs and the Web, to images, video, and natural language. In this paper, we introduce two new formalisms for modeling structured data, and show that they can both capture rich structure and scale to big data. The first, hinge-loss Markov random fields (HL-MRFs), is a new kind of probabilistic graphical model that generalizes different approaches to convex inference. We unite three approaches from the randomized algorithms, probabilistic graphical models, and fuzzy logic communities, showing that all three lead to the same inference objective. We then define HL-MRFs by generalizing this unified objective. The second new formalism, probabilistic soft logic (PSL), is a probabilistic programming language that
makes HL-MRFs easy to define using a syntax based on first-order logic. We introduce an algorithm for inferring most-probable variable assignments (MAP inference) that is much more scalable than general-purpose convex optimization methods, because it uses message passing to take advantage of sparse dependency structures. We then show how to learn the parameters of HL-MRFs. The learned HL-MRFs are as accurate as analogous discrete models, but much more scalable. Together, these algorithms enable HL-MRFs and PSL to model rich, structured data at scales not previously possible.

##### MSC:

68T05 | Learning and adaptive systems in artificial intelligence |

62H22 | Probabilistic graphical models |

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |

62M40 | Random fields; image analysis |

68T27 | Logic in artificial intelligence |

PDF
BibTeX
XML
Cite

\textit{S. H. Bach} et al., J. Mach. Learn. Res. 18, Paper No. 109, 67 p. (2017; Zbl 1435.68252)

Full Text:
Link

##### References:

[1] | A. Abdelbar and S. Hedetniemi. Approximating MAPs for belief networks is NP-hard and other theorems. Artificial Intelligence, 102(1):21–38, 1998. · Zbl 0909.68077 |

[2] | N. Alon and J. H. Spencer. The Probabilistic Method. Wiley-Interscience, third edition, 2008. · Zbl 1148.05001 |

[3] | D. Alshukaili, A. A. A. Fernandes, and N. W. Paton. Structuring linked data search results using probabilistic soft logic. In International Semantic Web Conference (ISWC), 2016. |

[4] | L. An and P. Tao. The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Annals of Operations Research, 133:23–46, 2005. · Zbl 1116.90122 |

[5] | T. Asano and D. P. Williamson. Improved approximation algorithms for MAX SAT. J. Algorithms, 42(1):173–202, 2002. · Zbl 0990.68078 |

[6] | S. H. Bach, M. Broecheler, L. Getoor, and D. P. O’Leary. Scaling MPE inference for constrained continuous Markov random fields. In Advances in Neural Information Processing Systems (NIPS), 2012. |

[7] | S. H. Bach, B. Huang, B. London, and L. Getoor. Hinge-loss Markov random fields: Convex inference for structured prediction. In Uncertainty in Artificial Intelligence (UAI), 2013. |

[8] | S. H. Bach, B. Huang, J. Boyd-Graber, and L. Getoor. Paired-dual learning for fast training of latent variable hinge-loss MRFs. In International Conference on Machine Learning (ICML), 2015a. |

[9] | S. H. Bach, B. Huang, and L. Getoor. Unifying local consistency and MAX SAT relaxations for scalable inference with rounding guarantees. In Artificial Intelligence and Statistics (AISTATS), 2015b. |

[10] | G. Bakir, T. Hofmann, B. Sch¨olkopf, A. J. Smola, B. Taskar, and S. V. N. Vishwanathan, editors. Predicting Structured Data. MIT Press, 2007. |

[11] | I. Beltagy, K. Erk, and R. J. Mooney. Probabilistic soft logic for semantic textual similarity. In Annual Meeting of the Association for Computational Linguistics (ACL), 2014. |

[12] | J. Besag. Statistical analysis of non-lattice data. Journal of the Royal Statistical Society, 24(3):179–195, 1975. |

[13] | S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers. Now Publishers, 2011. 58 · Zbl 1229.90122 |

[14] | M. Broecheler and L. Getoor. Computing marginal distributions over continuous Markov networks for statistical relational learning. In Advances in Neural Information Processing Systems (NIPS), 2010. |

[15] | M. Broecheler, L. Mihalkova, and L. Getoor. Probabilistic similarity logic. In Uncertainty in Artificial Intelligence (UAI), 2010a. |

[16] | M. Broecheler, P. Shakarian, and V. S. Subrahmanian. A scalable framework for modeling competitive diffusion in social networks. In Social Computing (SocialCom), 2010b. |

[17] | C. Chekuri, S. Khanna, J. Naor, and L. Zosin. A linear programming formulation and approximation algorithms for the metric labeling problem. SIAM J. Discrete Math., 18 (3):608–625, 2005. · Zbl 1077.68036 |

[18] | P. Chen, F. Chen, and Z. Qian. Road traffic congestion monitoring in social media with hinge-loss Markov random fields. In IEEE International Conference on Data Mining (ICDM), 2014. |

[19] | A. Choi, T. Standley, and A. Darwiche. Approximating weighted Max-SAT problems by compensating for relaxations. In International Conference on Principles and Practice of Constraint Programming, 2009. |

[20] | M. Collins. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Empirical Methods in Natural Language Processing (EMNLP), 2002. |

[21] | M. Collins and B. Roark. Incremental parsing with the perceptron algorithm. In Annual Meeting of the Association for Computational Linguistics (ACL), 2004. |

[22] | H. Daum´e III, J. Langford, and D. Marcu. Search-based structured prediction. Machine Learning, 75(3):297–325, 2009. |

[23] | J. Davies and F. Bacchus. Exploiting the power of MIP solvers in MAXSAT. In M. J¨arvisalo and A. Van Gelder, editors, Theory and Applications of Satisfiability Testing – SAT 2013, Lecture Notes in Computer Science, pages 166–181. Springer Berlin Heidelberg, 2013. · Zbl 1390.68592 |

[24] | L. De Raedt and L. Dehaspe. Clausal discovery. Machine Learning, 26:1058–1063, 1996. · Zbl 0866.68021 |

[25] | L. De Raedt, A. Kimmig, and H. Toivonen.ProbLog: A probabilistic Prolog and its application in link discovery. In International Joint Conference on Artificial Intelligence (IJCAI), 2007. |

[26] | R. de Salvo Braz, E. Amir, and D. Roth. Lifted first-order probabilistic inference. In L. Getoor and B. Taskar, editors, Introduction to statistical relational learning, pages 433–451. MIT Press, 2007. |

[27] | L. Deng and J. Wiebe. Joint prediction for entity/event-level sentiment analysis using probabilistic soft logic models. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2015. 59 |

[28] | J. Ebrahimi, D. Dou, and D. Lowd. Weakly supervised tweet stance classification by relational bootstrapping. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016. |

[29] | S. Fakhraei, B. Huang, L. Raschid, and L. Getoor. Network-based drug-target interaction prediction with probabilistic soft logic. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2014. |

[30] | J. Feldman, M. J. Wainwright, and D. R. Karger. Using linear programming to decode binary linear codes. Information Theory, IEEE Trans. on, 51(3):954–972, 2005. · Zbl 1234.94086 |

[31] | J. Foulds, N. Navaroli, P. Smyth, and A. Ihler. Revisiting MAP estimation, message passing and perfect graphs. In AI & Statistics, 2011. |

[32] | J. Foulds, S. Kumar, and L. Getoor. Latent topic networks: A versatile probabilistic programming framework for topic models. In International Conference on Machine Learning (ICML), 2015. |

[33] | N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In International Joint Conference on Artificial Intelligence (IJCAI), 1999. |

[34] | D. Gabay and B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers & Mathematics with Applications, 2(1):17– 40, 1976. · Zbl 0352.65034 |

[35] | M. R. Garey, D. S. Johnson, and L. Stockmeyer.Some simplified NP-complete graph problems. Theoretical Computer Science, 1(3):237–267, 1976. · Zbl 0338.05120 |

[36] | L. Getoor and B. Taskar, editors. Introduction to statistical relational learning. MIT press, 2007. · Zbl 1141.68054 |

[37] | L. Getoor, N. Friedman, D. Koller, and B. Taskar. Learning probabilistic models of link structure. Journal of Machine Learning Research (JMLR), 3:679–707, 2002. · Zbl 1112.68441 |

[38] | A. Globerson and T. Jaakkola. Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations. In Advances in Neural Information Processing Systems (NIPS), 2007. |

[39] | R. Glowinski and A. Marrocco. Sur l’approximation, par ´el´ements finis d’ordre un, et la r´esolution, par p´enalisation-dualit´e, d’une classe de probl‘emes de Dirichlet non lin´eaires. Revue fran¸caise d’automatique, informatique, recherche op´erationnelle, 9(2):41–76, 1975. |

[40] | M. X. Goemans and D. P. Williamson. New 3/4-approximation algorithms for the maximum satisfiability problem. SIAM J. Discrete Math., 7(4):656–666, 1994. · Zbl 0812.90129 |

[41] | J. Golbeck. Computing and Applying Trust in Web-based Social Networks. PhD thesis, University of Maryland, 2005. |

[42] | K. Goldberg, T. Roeder, D. Gupta, and C. Perkins. Eigentaste: A constant time collaborative filtering algorithm. Information Retrieval, 4(2):133–151, 2001. 60 · Zbl 0989.68052 |

[43] | N. D. Goodman, V. K. Mansinghka, D. M. Roy, K. Bonawitz, and J. B. Tenenbaum. Church: A language for generative models. In Uncertainty in Artificial Intelligence (UAI), 2008. |

[44] | A. D. Gordon, T. A. Henzinger, A. V. Nori, and S. K. Rajamani. Probabilistic programming. In International Conference on Software Engineering (ICSE, FOSE track), 2014. |

[45] | G. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006. · Zbl 1226.68083 |

[46] | B. Huang, A. Kimmig, L. Getoor, and J. Golbeck. A flexible framework for probabilistic models of social trust. In Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction (SBP), 2013. |

[47] | T. Huynh and R. Mooney. Max-margin weight learning for Markov logic networks. In European Conference on Machine Learning (ECML), 2009. |

[48] | A. Jaimovich, O. Meshi, and N. Friedman. Template based inference in symmetric relational Markov random fields. In Uncertainty in Artificial Intelligence (UAI), 2007. |

[49] | T. Jebara. MAP estimation, message passing, and perfect graphs. In Uncertainty in Artificial Intelligence (UAI), 2009. |

[50] | T. Joachims, T. Finley, and C. Yu. Cutting-plane training of structural SVMs. Machine Learning, 77(1):27–59, 2009. · Zbl 1235.68161 |

[51] | V. Jojic, S. Gould, and D. Koller. Accelerated dual decomposition for MAP inference. In International Conference on Machine Learning (ICML), 2010. |

[52] | S. Kamvar, M. Schlosser, and H. Garcia-Molina. The eigentrust algorithm for reputation management in P2P networks. In International Conference on the World Wide Web (WWW), 2003. |

[53] | W. Karush. Minima of Functions of Several Variables with Inequalities as Side Constraints. Master’s thesis, University of Chicago, 1939. |

[54] | K. Kersting. Lifted probabilistic inference. In European Conference on Artificial Intelligence (ECAI), 2012. |

[55] | K. Kersting, B. Ahmadi, and S. Natarajan. Counting belief propagation. In Uncertainty in Artificial Intelligence (UAI), 2009. |

[56] | A. Kimmig, G. Van den Broeck, and L. De Raedt. An algebraic Prolog for reasoning about possible worlds. In AAAI Conference on Artificial Intelligence (AAAI), 2011. |

[57] | A. Kimmig, L. Mihalkova, and L. Getoor. Lifted graphical models: A survey. Machine Learning, 99:1–45, 2015. · Zbl 1320.62016 |

[58] | A. Kimmig, G. Van den Broeck, and L. De Raedt. Algebraic model counting. Journal of Applied Logic, 2016. 61 · Zbl 1436.68335 |

[59] | J. Kleinberg and ´E. Tardos. Approximation algorithms for classification problems with pairwise relationships: Metric labeling and Markov random fields. J. ACM, 49(5):616– 639, 2002. · Zbl 1326.68336 |

[60] | G. J. Klir and B. Yuan. Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall, 1995. · Zbl 0915.03001 |

[61] | S. Kok and P. Domingos. Learning the structure of Markov logic networks. In International Conference on Machine Learning (ICML), 2005. |

[62] | S. Kok and P. Domingos. Learning Markov logic networks using structural motifs. In International Conference on Machine Learning (ICML), 2010. |

[63] | D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009. · Zbl 1183.68483 |

[64] | D. Koller and A. Pfeffer. Object-oriented Bayesian networks. In Uncertainty in Artificial Intelligence (UAI), 1997. |

[65] | V. Kolmogorov. Convergent tree-reweighted message passing for energy minimization. Pattern Analysis and Machine Intelligence, IEEE Trans. on, 28(10):1568–1583, 2006. |

[66] | N. Komodakis, N. Paragios, and G. Tziritas. MRF energy minimization and beyond via dual decomposition. Pattern Analysis and Machine Intelligence, IEEE Trans. on, 33(3): 531–552, 2011. |

[67] | P. Kouki, S. Fakhraei, J. Foulds, M. Eirinaki, and L. Getoor. HyPER: A flexible and extensible probabilistic framework for hybrid recommender systems. In ACM Conference on Recommender Systems (RecSys), 2015. |

[68] | H. W. Kuhn and A. W. Tucker. Nonlinear programming. In Berkeley Symp. on Math. Statist. and Prob., 1951. · Zbl 0044.05903 |

[69] | M. P. Kumar, P. H. S. Torr, and A. Zisserman. Solving Markov random fields using second order cone programming relaxations. In Computer Vision and Pattern Recognition (CVPR), 2006. |

[70] | N. Landwehr, A. Passerini, L. De Raedt, and P. Frasconi. Fast learning of relational kernels. Machine Learning, 78(3):305–342, 2010. |

[71] | J. Larrosa, F. Heras, and S. de Givry. A logical approach to efficient Max-SAT solving. Artificial Intelligence, 172(2-3):204–233, 2008. · Zbl 1182.68253 |

[72] | J. Li, A. Ritter, and D. Jurafsky. Inferring user preferences by probabilistic logical reasoning over social networks. arXiv preprint arXiv:1411.2679, 2014. |

[73] | S. Liu, K. Liu, S. He, and J. Zhao. A probabilistic soft logic based approach to exploiting latent and global information in event classification. In AAAI Conference on Artificial Intelligence (AAAI), 2016. 62 |

[74] | B. London, S. Khamis, S. H. Bach, B. Huang, L. Getoor, and L. Davis. Collective activity detection using hinge-loss Markov random fields.In CVPR Workshop on Structured Prediction: Tractability, Learning and Inference, 2013. |

[75] | B. London, B. Huang, and L. Getoor. Stability and generalization in structured prediction. Journal of Machine Learning Research (JMLR), 17(222):1–52, 2016. · Zbl 1404.68114 |

[76] | D. Lowd and P. Domingos. Efficient weight learning for Markov logic networks. In Principles and Practice of Knowledge Discovery in Databases (PKDD), 2007. · Zbl 1202.68403 |

[77] | S. Magliacane, P. Stutz, P. Groth, and A. Bernstein. FoxPSL: An extended and scalable PSL implementation. In AAAI Spring Symposium on Knowledge Representation and Reasoning: Integrating Symbolic and Neural Approaches, 2015. |

[78] | A. F. T. Martins, M. A. T. Figueiredo, P. M. Q. Aguiar, N. A. Smith, and E. P. Xing. AD3: Alternating Directions Dual Decomposition for MAP Inference in Graphical Models. Journal of Machine Learning Research (JMLR), 16(Mar):495–545, 2015. · Zbl 1337.68226 |

[79] | A. McCallum, K. Nigam, and L. H. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In International Conference on Knowledge Discovery and Data Mining (KDD), 2000. |

[80] | A. McCallum, K. Schultz, and S. Singh. FACTORIE: Probabilistic programming via imperatively defined factor graphs. In Advances in Neural Information Processing Systems (NIPS), 2009. |

[81] | O. Meshi and A. Globerson. An alternating direction method for dual MAP LP relaxation. In European Conference on Machine learning (ECML), 2011. |

[82] | O. Meshi, D. Sontag, T. Jaakkola, and A. Globerson. Learning efficiently with approximate inference via dual losses. In International Conference on Machine Learning (ICML), 2010. |

[83] | E. Mezuman, D. Tarlow, A. Globerson, and Y. Weiss. Tighter linear program relaxations for high order graphical models. In Uncertainty in Aritificial Intelligence (UAI), 2013. |

[84] | H. Miao, X. Liu, B. Huang, and L. Getoor. A hypergraph-partitioned vertex programming approach for large-scale consensus optimization. In IEEE International Conference on Big Data, 2013. |

[85] | L. Mihalkova and R. J. Mooney. Bottom-up learning of Markov logic network structure. In International Conference on Machine Learning (ICML), 2007. |

[86] | B. Milch, B. Marthi, S. Russell, D. Sontag, D. L. Ong, and A. Kolobov. BLOG: Probabilistic models with unknown objects. In International Joint Conference on Artificial Intelligence (IJCAI), 2005. |

[87] | P. Mills and E. Tsang.Guided local search for solving SAT and weighted MAX-SAT problems. J. Automated Reasoning, 24(1-2):205–223, 2000. 63 · Zbl 0967.68152 |

[88] | M. Mladenov, B. Ahmadi, and K. Kersting. Lifted linear programming. In Artificial Intelligence & Statistics (AISTATS), 2012. |

[89] | S. Muggleton and L. De Raedt. Inductive logic programming: Theory and methods. The Journal of Logic Programming, 19:629–679, 1994. · Zbl 0816.68043 |

[90] | Y. Nesterov and A. Nemirovskii. Interior-Point Polynomial Algorithms in Convex Programming. Society for Industrial and Applied Mathematics, 1994. · Zbl 0824.90112 |

[91] | J. Neville and D. Jensen. Relational dependency networks. Journal of Machine Learning Research (JMLR), 8:653–692, 2007. · Zbl 1222.68274 |

[92] | H. B. Newcombe and J. M. Kennedy. Record linkage: Making maximum use of the discriminating power of identifying information. Communications of the ACM, 5(11):563–566, 1962. |

[93] | S. Nowozin, P. V. Gehler, J. Jancsary, and C. H. Lampert, editors. Advanced Structured Prediction. Neural Information Processing. MIT press, 2016. |

[94] | J. D. Park. Using weighted MAX-SAT engines to solve MPE. In AAAI Conference on Artificial Intelligence (AAAI), 2002. |

[95] | S. Perkins, K. Lacker, and J. Theiler. Grafting: Fast, incremental feature selection by gradient descent in function space. Journal of Machine Learning Research (JMLR), 3: 1333–1356, 2003. · Zbl 1102.68578 |

[96] | A. Pfeffer. IBAL: A probabilistic rational programming language. In International Joint Conference on Artificial Intelligence (IJCAI), 2001. |

[97] | A. Pfeffer.Figaro: An object-oriented probabilistic programming language.Technical report, Charles River Analytics, 2009. |

[98] | A. Pfeffer, D. Koller, B. Milch, and K. T. Takusagawa. SPOOK: A system for probabilistic object-oriented knowledge representation. In Uncertainty in Artificial Intelligence (UAI), 1999. |

[99] | H. Poon and P. Domingos. Sum-product networks: A new deep architecture. In Uncertainty in Artificial Intelligence (UAI), 2011. |

[100] | J. Pujara, H. Miao, L. Getoor, and W. Cohen. Knowledge graph identification. In International Semantic Web Conference (ISWC), 2013. |

[101] | A. Ramesh, D. Goldwasser, B. Huang, H. Daum´e III, and L. Getoor. Learning latent engagement patterns of students in online courses. In AAAI Conference on Artificial Intelligence (AAAI), 2014. |

[102] | A. Ramesh, S. Kumar, J. Foulds, and L. Getoor. Weakly supervised models of aspectsentiment for online course discussion forums. In Annual Meeting of the Association for Computational Linguistics (ACL), 2015. 64 |

[103] | P. Ravikumar and J. Lafferty. Quadratic programming relaxations for metric labeling and Markov random field MAP estimation. In International Conference on Machine Learning (ICML), 2006. |

[104] | P. Ravikumar, A. Agarwal, and M. J. Wainwright. Message-passing for graph-structured linear programs: Proximal methods and rounding schemes. Journal of Machine Learning Research (JMLR), 11:1043–1080, 2010a. · Zbl 1242.90167 |

[105] | P. Ravikumar, M. J. Wainwright, and J. D. Lafferty. High-dimensional Ising model selection using ‘1-regularized logistic regression. The Annals of Statistics, 38(3):1287–1319, 2010b. · Zbl 1189.62115 |

[106] | B. L. Richards and R. J. Mooney. Learning relations by pathfinding. In AAAI Conference on Artificial Intelligence (AAAI), 1992. |

[107] | M. Richardson and P. Domingos. Markov logic networks. Machine Learning, 62(1-2):107– 136, 2006. |

[108] | M. Richardson, R. Agrawal, and P. Domingos. Trust management for the semantic web. In D. Fensel, K. Sycara, and J. Mylopoulos, editors, The Semantic Web - ISWC 2003, volume 2870 of Lecture Notes in Computer Science, pages 351–368. Springer Berlin / Heidelberg, 2003. |

[109] | F. Riguzzi and T. Swift. The PITA system: Tabling and answer subsumption for reasoning under uncertainty. In International Conference on Logic Programming (ICLP), 2011. · Zbl 1218.68169 |

[110] | S. Ross and J. A. Bagnell. Reinforcement and Imitation Learning via Interactive No-Regret Learning, 2014. |

[111] | S. Ross, G. J. Gordon, and J. A. Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In Artificial Intelligence & Statistics (AISTATS), 2011. |

[112] | R. Salakhutdinov and G. Hinton. Deep Boltzmann machines. In Artificial Intelligence & Statistics (AISTATS), 2009. · Zbl 1247.68223 |

[113] | R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In International Conference on Machine Learning (ICML), 2008. |

[114] | M. Samadi, P. Talukdar, M. Veloso, and M. Blum. ClaimEval: Integrated and flexible framework for claim evaluation using credibility of sources. In AAAI Conference on Artificial Intelligence (AAAI), 2016. |

[115] | A. Schrijver. Combinatorial Optimization: Polyhedra and Efficiency. Springer-Verlag, 2003. · Zbl 1041.90001 |

[116] | A. G. Schwing, T. Hazan, M. Pollefeys, and R. Urtasun. Globally convergent dual MAP LP relaxation solvers using Fenchel-Young margins. In Advances in Neural Information Processing Systems (NIPS), 2012. |

[117] | P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T. Eliassi-Rad. Collective classification in network data. AI Magazine, 29(3):93–106, 2008. 65 |

[118] | S. E. Shimony. Finding MAPs for belief networks is NP-hard. Artificial Intelligence, 68(2): 399–410, 1994. · Zbl 0818.68097 |

[119] | P. Singla and P. Domingos. Discriminative training of Markov logic networks. In AAAI Conference on Artificial Intelligence (AAAI), 2005. |

[120] | P. Singla and P. Domingos. Lifted first-order belief propagation. In AAAI Conference on Artificial Intelligence (AAAI), 2008. |

[121] | D. Sontag, T. Meltzer, A. Globerson, T. Jaakkola, and Y. Weiss. Tightening LP relaxations for MAP using message passing. In Uncertainty in Aritificial Intelligence (UAI), 2008. |

[122] | D. Sontag, A. Globerson, and T. Jaakkola. Introduction to dual decomposition for inference. In S. Sra, S. Nowozin, and S. J. Wright, editors, Optimization for Machine Learning, pages 219–254. MIT Press, 2011. |

[123] | D. Sontag, D. K. Choe, and Y. Li. Efficiently searching for frustrated cycles in MAP inference. In Uncertainty in Aritificial Intelligence (UAI), 2012. |

[124] | D. Sridhar, J. Foulds, M. Walker, B. Huang, and L. Getoor. Joint models of disagreement and stance in online debate. In Annual Meeting of the Association for Computational Linguistics (ACL), 2015. |

[125] | D. Sridhar, S. Fakhraei, and L. Getoor. A probabilistic approach for collective similaritybased drug-drug interaction prediction. Bioinformatics, 32(20):3175–3182, 2016. |

[126] | B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In Neural Information Processing Systems (NIPS), 2004. |

[127] | B. Taskar, V. Chatalbashev, D. Koller, and C. Guestrin. Learning structured prediction models: A large margin approach. In International Conference on Machine Learning (ICML), 2005. |

[128] | D. Tran, A. Kucukelbir, A. B. Dieng, M. Rudolph, D. Liang, and D. M. Blei.Edward: A library for probabilistic modeling, inference, and criticism.arXiv preprint arXiv:1610.09787, 2016. |

[129] | I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research (JMLR), 6:1453–1484, 2005. · Zbl 1222.68321 |

[130] | V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, 2000. · Zbl 0934.62009 |

[131] | D. Venugopal and V. Gogate. On lifting the Gibbs sampling algorithm. In Neural Information Processing Systems (NIPS), 2012. |

[132] | B. W. Wah and Y. Shang. Discrete Lagrangian-based search for solving MAX-SAT problems. In International Joint Conference on Artificial Intelligence (IJCAI), 1997. · Zbl 0891.68027 |

[133] | M. J. Wainwright and M. I. Jordan. Graphical Models, Exponential Families, and Variational Inference. Now Publishers, 2008. 66 · Zbl 1193.62107 |

[134] | J. Wang and P. Domingos. Hybrid Markov logic networks. In AAAI Conference on Artificial Intelligence (AAAI), 2008. |

[135] | T. Werner. A linear programming approach to max-sum problem: A review. Pattern Analysis and Machine Intelligence, IEEE Trans. on, 29(7):1165–1179, 2007. |

[136] | R. West, H. S. Paskov, J. Leskovec, and C. Potts. Exploiting social network structure for person-to-person sentiment analysis. Transactions of the Association for Computational Linguistics (TACL), 2:297–310, 2014. |

[137] | F. Wood, J. W. van de Meent, and V. Mansinghka.A new approach to probabilistic programming inference. In Artificial Intelligence & Statistics (AISTATS), 2014. |

[138] | M. Wright. The interior-point revolution in optimization: History, recent developments, and lasting consequences. Bulletin of the American Mathematical Society, 42(1):39–56, 2005. · Zbl 1114.90153 |

[139] | L. Xiong, X. Chen, T. Huang, J. Schneider, and J. Carbonell. Temporal collaborative filtering with Bayesian probabilistic tensor factorization. In SIAM International Conference on Data Mining, 2010. |

[140] | C. Yanover, T. Meltzer, and Y. Weiss. Linear programming relaxations and belief propagation – An empirical study. Journal of Machine Learning Research (JMLR), 7:1887–1907, 2006. · Zbl 1222.90033 |

[141] | J. Zhu, N. Lao, and E. P. Xing. Grafting-Light: Fast, Incremental Feature Selection and Structure Learning of Markov Random Fields. In International Conference on Knowledge Discovery and Data Mining (KDD), 2010. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.