×

A survey on learning approaches for undirected graphical models. Application to scene object recognition. (English) Zbl 1404.68116

Summary: Probabilistic Graphical Models (PGMs) in general, and Undirected Graphical Models (UGMs) in particular, become suitable frameworks to capture and conveniently model the uncertainty inherent in a variety of problems. When applied to real world applications, such as scene object recognition, they turn into a reliable and widespread resorted tool. The effectiveness of UGMs is tight to the particularities of the problem to be solved and, especially, to the chosen learning strategy. This paper presents a review of practical, widely resorted learning approaches for Conditional Random Fields (CRFs), the discriminate variant of UGMs, which is completed with a thorough comparison and experimental analysis in the field of scene object recognition. The chosen application for UGMs is of particular interest given its potential for enhancing the capabilities of cognitive agents. Two state-of-the-art datasets, NYUv2 and Cornell-RGBD, containing intensity and depth imagery from indoor scenes are used for training and testing CRFs. Results regarding success rate, computational burden, and scalability are analyzed, including the benefits of using parallelization techniques for gaining in efficiency.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62M40 Random fields; image analysis
68T40 Artificial intelligence for robotics
68T45 Machine vision and scene understanding
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Gupta, S.; Arbeláez, P.; Girshick, R.; Malik, J., Indoor scene understanding with RGB-D images: bottom-up segmentation, object detection and semantic segmentation, Int. J. Comput. Vis., 112, 2, 133-149 (2014)
[2] Yao, J.; Fidler, S.; Urtasun, R., Describing the scene as a whole: joint object detection, scene classification and semantic segmentation, (2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2012)), 702-709
[3] Pronobis, A.; Jensfelt, P.; Sjöö, K.; Zender, H.; Kruijff, G.-J. M.; Mozos, O. M.; Burgard, W., Semantic modelling of space, (Christensen, H. I.; Kruijff, G.-J. M.; Wyatt, J. L., Cognitive Systems. Cognitive Systems, Cognitive Systems Monographs, vol. 8 (2010), Springer: Springer Berlin, Heidelberg), 165-221
[4] Kostavelis, I.; Gasteratos, A., Semantic mapping for mobile robotics tasks: a survey, Robot. Auton. Syst., 66, 86-103 (2015)
[5] Koller, D.; Friedman, N., Probabilistic Graphical Models: Principles and Techniques (2009), MIT Press
[6] Kindermann, R.; Snell, J. L., Markov Random Fields and Their Applications, vol. 1 (1980), American Mathematical Society: American Mathematical Society Providence, RI · Zbl 1229.60003
[7] Marlin, B. M.; de Freitas, N., Asymptotic efficiency of deterministic estimators for discrete energy-based models: ratio matching and pseudolikelihood, (Uncertainty in Artificial Intelligence. Uncertainty in Artificial Intelligence, UAI (2011), AUAI Press), 497-505
[8] Kumar, S.; August, J.; Hebert, M., Exploiting inference for approximate parameter learning in discriminative fields: an empirical study, (Proceedings of the 5th International Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition. Proceedings of the 5th International Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition, EMMCVPR’05 (2005), Springer-Verlag: Springer-Verlag Berlin, Heidelberg), 153-168
[9] Korč, F.; Förstner, W., Approximate parameter learning in conditional random fields: an empirical investigation, (Proceedings of the 30th DAGM Symposium on Pattern Recognition (2008), Springer-Verlag: Springer-Verlag Berlin, Heidelberg), 11-20
[10] Parise, S.; Welling, M., Learning in Markov random fields: an empirical study, (Proceedings of the Joint Statistical Meeting. Proceedings of the Joint Statistical Meeting, JSM2005 (2005))
[11] Lafferty, J. D.; McCallum, A.; Pereira, F. C.N., Conditional random fields: probabilistic models for segmenting and labeling sequence data, (Proceedings of the Eighteenth International Conference on Machine Learning. Proceedings of the Eighteenth International Conference on Machine Learning, ICML ’01 (2001), Morgan Kaufmann Publishers Inc.: Morgan Kaufmann Publishers Inc. San Francisco, CA, USA), 282-289
[12] Kumar, S.; Hebert, M., Discriminative random fields, Int. J. Comput. Vis., 68, 2, 179-201 (2006)
[13] Besag, J., On the statistical analysis of dirty pictures, J. R. Stat. Soc. B, 48, 3, 259-302 (1986) · Zbl 0609.62150
[14] Yedidia, J. S.; Freeman, W. T.; Weiss, Y., Generalized belief propagation, (Advances in Neural Information Processing Systems, vol. 13 (2001)), 689-695
[15] Boykov, Y.; Veksler, O.; Zabih, R., Fast approximate energy minimization via graph cuts, IEEE Trans. Pattern Anal. Mach. Intell., 23, 11, 1222-1239 (2001)
[16] Weiss, Y.; Freeman, W. T., On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs, IEEE Trans. Inf. Theory, 47, 2, 736-744 (2006) · Zbl 1002.94057
[17] Nesterov, Y., Introductory Lectures on Convex Optimization: A Basic Course, Applied Optimization (2004), Springer: Springer US · Zbl 1086.90045
[18] Nocedal, J., Updating quasi-Newton matrices with limited storage, Math. Comput., 35, 2376-2383 (1980)
[19] Ruiz-Sarmiento, J.; Galindo, C.; González-Jiménez, J., UPGMpp: a software library for contextual object recognition, (3rd Workshop on Recognition and Action for Scene Understanding (2015))
[20] Silberman, N.; Hoiem, D.; Kohli, P.; Fergus, R., Indoor segmentation and support inference from RGBD images, (Proc. of the 12th European Conference on Computer Vision. Proc. of the 12th European Conference on Computer Vision, ECCV 2012 (2012)), 746-760
[21] Anand, A.; Koppula, H. S.; Joachims, T.; Saxena, A., Contextually guided semantic labeling and search for three-dimensional point clouds, Int. J. Robot. Res., 32, 1, 19-34 (2013)
[23] Galleguillos, C.; Belongie, S., Context based object categorization: a critical survey, Comput. Vis. Image Underst., 114, 6, 712-722 (2010)
[24] Oliva, A.; Torralba, A., The role of context in object recognition, Trends Cogn. Sci., 11, 12, 520-527 (2007)
[25] Divvala, S.; Hoiem, D.; Hays, J.; Efros, A.; Hebert, M., An empirical study of context in object detection, (IEEE Conference on Computer Vision and Pattern Recognition. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009 (2009)), 1271-1278
[26] Murphy, K. P., Machine Learning: A Probabilistic Perspective, Adaptive Computation and Machine Learning Series (2012), MIT Press: MIT Press Cambridge, MA · Zbl 1295.68003
[27] Rabiner, L. R., A tutorial on hidden Markov models and selected applications in speech recognition, (Readings in Speech Recognition (1990), Morgan Kaufmann Publishers Inc.: Morgan Kaufmann Publishers Inc. San Francisco, CA, USA), 267-296
[28] Robert, C. P.; Casella, G., Monte Carlo Statistical Methods, Springer Texts in Statistics (2005), Springer-Verlag New York, Inc.: Springer-Verlag New York, Inc. Secaucus, NJ, USA
[29] Geman, S.; Geman, D., Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Anal. Mach. Intell., 6, 6, 721-741 (1984) · Zbl 0573.62030
[30] Wainwright, M. J.; Jordan, M. I., Graphical models, exponential families, and variational inference, Found. Trends Mach. Learn., 1, 1-2, 1-305 (2008) · Zbl 1193.62107
[31] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. B, 39, 1, 1-38 (1977) · Zbl 0364.62022
[32] Winkler, G., Image Analysis, Random Fields and Dynamic Monte Carlo Methods: A Mathematical Introduction (1995), Springer Publishing Company, Incorporated · Zbl 0821.68125
[33] Liang, P.; Jordan, M. I., An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators, (Proceedings of the 25th International Conference on Machine Learning. Proceedings of the 25th International Conference on Machine Learning, ICML ’08 (2008), ACM: ACM New York, NY, USA), 584-591
[34] Bradley, J. K., Learning Large-Scale Conditional Random Fields (2013), Dissertations, Paper 221
[35] Sutton, C. A.; Mccallum, A., Piecewise training for undirected models, (Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence. Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence, UAI-05 (2005)), 568-575
[36] Sutton, C.; McCallum, A., Piecewise pseudolikelihood for efficient training of conditional random fields, (Proceedings of the 24th International Conference on Machine Learning (2007), ACM), 863-870
[37] Hyvärinen, A., Estimation of non-normalized statistical models by score matching, J. Mach. Learn. Res., 6, 695-709 (2005) · Zbl 1222.62051
[38] Hyvärinen, A., Some extensions of score matching, Comput. Stat. Data Anal., 51, 5, 2499-2512 (2007) · Zbl 1161.62326
[39] Darroch, J. N.; Ratcliff, D., Generalized iterative scaling for log-linear models, Ann. Math. Stat., 43, 5, 1470-1480 (1972) · Zbl 0251.62020
[40] Pietra, S. D.; Pietra, V. D.; Lafferty, J., Inducing features of random fields, IEEE Trans. Pattern Anal. Mach. Intell., 19, 4, 380-393 (1997)
[41] Collins, M., Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms, (Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10. Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, EMNLP ’02 (2002), Association for Computational Linguistics: Association for Computational Linguistics Stroudsburg, PA, USA), 1-8
[42] Malouf, R., A comparison of algorithms for maximum entropy parameter estimation, (Proceedings of the 6th Conference on Natural Language Learning, vol. 20. Proceedings of the 6th Conference on Natural Language Learning, vol. 20, COLING-02 (2002), Association for Computational Linguistics: Association for Computational Linguistics Stroudsburg, PA, USA), 1-7
[43] Sha, F.; Pereira, F., Shallow parsing with conditional random fields, (Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, NAACL ’03 (2003), Association for Computational Linguistics: Association for Computational Linguistics Stroudsburg, PA, USA), 134-141
[44] Quattoni, A.; Collins, M.; Darrell, T., Conditional random fields for object recognition, (Advances in Neural Information Processing Systems (2004), MIT Press), 1097-1104
[45] Xiang, Y.; Zhou, X.; Liu, Z.; Chua, T.-S.; Ngo, C.-W., Semantic context modeling with maximal margin conditional random fields for automatic image annotation, (2010 IEEE Conference on Computer Vision and Pattern Recognition. 2010 IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2010)), 3368-3375
[46] Murphy, K. P.; Torralba, A.; Freeman, W. T., Using the forest to see the trees: a graphical model relating features, objects, and scenes, (Thrun, S.; Saul, L. K.; Schölkopf, B., Advances in Neural Information Processing Systems, vol. 16 (2004), MIT Press), 1499-1506
[47] Floros, G.; Leibe, B., Joint 2d-3d temporally consistent semantic segmentation of street scenes, (IEEE Conference on Computer Vision and Pattern Recognition. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012 (2012)), 2823-2830
[48] Wolf, D.; Prankl, J.; Vincze, M., Fast semantic segmentation of 3D point clouds using a dense CRF with learned parameters, (Proceedings of IEEE International Conference on Robotics and Automation. Proceedings of IEEE International Conference on Robotics and Automation, ICRA, Seattle, WA, USA (2015))
[49] Husain, F.; Dellen, L.; Torras, C., Recognizing point clouds using conditional random fields, (2014 22nd International Conference on Pattern Recognition. 2014 22nd International Conference on Pattern Recognition, ICPR (2014)), 4257-4262
[50] Ruiz-Sarmiento, J. R.; Galindo, C.; González-Jiménez, J., Scene object recognition for mobile robots through semantic knowledge and probabilistic graphical models, Expert Syst. Appl., 42, 22, 8805-8816 (2015)
[51] Ruiz-Sarmiento, J. R.; Galindo, C.; González-Jiménez, J., Exploiting semantic knowledge for robot object recognition, Knowl.-Based Syst., 86, 131-142 (2015)
[52] Ruiz-Sarmiento, J. R.; Galindo, C.; González-Jiménez, J., Joint categorization of objects and rooms for mobile robots, (IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2015))
[53] Xiong, X.; Huber, D., Using context to create semantic 3d models of indoor environments, (Proceedings of the British Machine Vision Conference. Proceedings of the British Machine Vision Conference, BMVC 2010 (2010)), pp. 45.1-11
[54] Ren, X.; Bo, L.; Fox, D., RGB-(D) scene labeling: features and algorithms, (IEEE Conference on Computer Vision and Pattern Recognition. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012 (2012)), 2759-2766
[55] Janoch, A.; Karayev, S.; Jia, Y.; Barron, J. T.; Fritz, M.; Saenko, K.; Darrell, T., A category-level 3D object dataset: putting the Kinect to work, (1st Workshop on Consumer Depth Cameras for Computer Vision. 1st Workshop on Consumer Depth Cameras for Computer Vision, ICCV Workshop (2011))
[56] Anand, A.; Koppula, H. S.; Joachims, T.; Saxena, A., Contextually guided semantic labeling and search for three-dimensional point clouds, Int. J. Robot. Res., 32, 1, 19-34 (2013)
[57] Silberman, N.; Fergus, R., Indoor scene segmentation using a structured light sensor, (Proceedings of the International Conf. on Computer Vision - Workshop on 3D Representation and Recognition (2011))
[58] Aldoma, A.; Faulhammer, T.; Vincze, M., Automation of “ground truth” annotation for multi-view RGB-D object instance recognition datasets, (2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2014 (2014)), 5016-5023
[59] Xiao, J.; Owens, A.; Torralba, A., SUN3D: a database of big spaces reconstructed using SfM and object labels, (2013 IEEE International Conference on Computer Vision. 2013 IEEE International Conference on Computer Vision, ICCV (2013)), 1625-1632
[60] Martinez-Gomez, J.; Cazorla, M.; Garcia-Varea, I.; Morell, V., ViDRILO: the visual and depth robot indoor localization with objects information dataset, Int. J. Robot. Res. (2015)
[61] Kahler, O.; Reid, I., Efficient 3d scene labeling using fields of trees, (IEEE International Conference on Computer Vision. IEEE International Conference on Computer Vision, ICCV 2013 (2013)), 3064-3071
[62] Wong, Y.-S.; Chu, H.-K.; Mitra, N. J., SmartAnnotator an interactive tool for annotating indoor RGBD images, Comput. Graph. Forum, 34, 2, 447-457 (2015)
[63] Ruiz-Sarmiento, J. R.; Galindo, C.; González-Jiménez, J., OLT: a toolkit for object labeling applied to robotic RGB-D datasets, (European Conference on Mobile Robots (2015))
[64] Schmidt, M., UGM: Matlab code for undirected graphical models (2015), [online; accessed 10 May 2016]
[65] Okazaki, N., CRFsuite: a fast implementation of conditional random fields (CRFs), [online; accessed 28 April 2015]
[66] Finley, T.; Joachims, T., Training structural SVMs when exact inference is intractable, (Proceedings of the 25th International Conference on Machine Learning. Proceedings of the 25th International Conference on Machine Learning, ICML ’08 (2008), ACM: ACM New York, NY, USA), 304-311
[68] Jancsary, J., Approximate Discriminative Training of Graphical Models (2012), Institute of Telecommunications, Vienna University of Technology, Ph.D. thesis
[69] Barndorff-Nielsen, O., Information and Exponential Families in Statistical Theory, Wiley Series in Probability and Mathematical Statistics (1978), John Wiley & Sons, Ltd. · Zbl 0387.62011
[70] Blake, A.; Rother, C.; Brown, M.; Perez, P.; Torr, P., Interactive image segmentation using an adaptive GMMRF model, (Computer Vision - ECCV 2004: 8th European Conference on Computer Vision, Proceedings, Part I. Computer Vision - ECCV 2004: 8th European Conference on Computer Vision, Proceedings, Part I, Prague, Czech Republic, May 11-14, 2004 (2004), Springer Berlin Heidelberg: Springer Berlin Heidelberg Berlin, Heidelberg), 428-441 · Zbl 1098.68730
[71] Murphy, K. P.; Weiss, Y.; Jordan, M. I., Loopy belief propagation for approximate inference: an empirical study, (Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence. Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, UAI’99 (1999)), 467-475
[72] Weiss, Y., Comparing the mean field method and belief propagation for approximate inference in MRFs, (Opper, M.; Saad, D., Advanced Mean Field Methods: Theory and Practice (2001), MIT Press: MIT Press Cambridge, MA), 229-239
[73] Greig, B. P.D. M.; Seheult, A., Exact maximum a posteriori estimation for binary images, J. R. Stat. Soc. B, 51, 271-279 (1989)
[74] Darken, C.; Moody, J., Fast adaptive k-means clustering: some empirical results, (1990 IJCNN International Joint Conference on Neural Networks, vol. 2 (1990)), 233-238
[75] Sutskever, I.; Martens, J.; Dahl, G.; Hinton, G., On the importance of initialization and momentum in deep learning, (Proceedings of the 30th International Conference on Machine Learning. Proceedings of the 30th International Conference on Machine Learning, ICML-13 (1990)), 1139-1147
[76] Schraudolph, N. N., Fast curvature matrix-vector products for second-order gradient descent, Neural Comput., 14, 2002 (2002) · Zbl 1037.68119
[77] Vishwanathan, S. V.N.; Schraudolph, N. N.; Schmidt, M. W.; Murphy, K. P., Accelerated training of conditional random fields with stochastic gradient methods, (International Conference on Machine Learning (2006)), 969-976
[78] Broyden, C. G., The convergence of a class of double-rank minimization algorithms, J. Inst. Math. Appl., 1, 6, 76-90 (1970) · Zbl 0223.65023
[79] Nocedal, J.; Wright, S. J., Numerical Optimization, Springer Series in Operations Research and Financial Engineering (2006), Springer: Springer New York · Zbl 1104.65059
[80] Arlot, S.; Celisse, A., A survey of cross-validation procedures for model selection, Stat. Surv., 4, 40-79 (2010) · Zbl 1190.62080
[81] Ruiz-Sarmiento, J. R.; Galindo, C.; Gonzalez-Jimenez, J., Probability and common-sense: tandem towards robust robotic object recognition in ambient assisted living, (10th International Conference on Ubiquitous Computing and Ambient Intelligence (2016))
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.