×

Filter-based mean-field inference for random fields with higher-order terms and product label-spaces. (English) Zbl 1328.68252

Summary: Recently, a number of cross bilateral filtering methods have been proposed for solving multi-label problems in computer vision, such as stereo, optical flow and object class segmentation that show an order of magnitude improvement in speed over previous methods. These methods have achieved good results despite using models with only unary and/or pairwise terms. However, previous work has shown the value of using models with higher-order terms e.g. to represent label consistency over large regions, or global co-occurrence relations. We show how these higherorder terms can be formulated such that filter-based inference remains possible. We demonstrate our techniques on joint stereo and object labelling problems, as well as object class segmentation, showing in addition for joint object-stereo labelling how our method provides an efficient approach to inference in product label-spaces. We show that we are able to speed up inference in these models around 10–30 times with respect to competing graph-cut/move-making methods, as well as maintaining or improving accuracy in all cases. We show results on PascalVOC-10 for object class segmentation, and Leuven for joint object-stereo labelling.

MSC:

68T45 Machine vision and scene understanding
62H35 Image analysis in multivariate analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68U10 Computing methodologies for image processing

Software:

GrabCut; GeoS; PASCAL VOC
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Adams, A., Baek, J., & Davis, M. A. (2010). Fast high-dimensional filtering using the permutohedral lattice. Computer Graphics Forum, 29(2), 753-762. · doi:10.1111/j.1467-8659.2009.01645.x
[2] Bai, X. and Sapiro, G. (2007). A geodesic framework for fast interactive image and video segmentation and matting. In ICCV. · Zbl 1179.68188
[3] Bleyer, M., Rhemann , C. and Rother, C. (2012). Extracting 3D scene-consistent object proposals and depth from stereo images. In ECCV, (pp. 467-481).
[4] Bleyer, M., Rother, C., Kohli, P., Scharstein, D. and Sinha, S. (2011). Object stereo - joint stereo matching and object segmentation. In CVPR, (pp. 3081-3088).
[5] Borestein, E. and Malik, J. (2006). Shape guided object segmentation. In CVPR, (pp. 969-976).
[6] Boykov, Y. and Jolly, M. (2001) Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In ICCV, (pp. 105-112). · Zbl 1179.68188
[7] Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE PAMI, 23(11), 1222-1239. · doi:10.1109/34.969114
[8] Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach towards feature space analysis. TPAMI, 24, 603-619. · doi:10.1109/34.1000236
[9] Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE PAMI, 24(5), 603-619. · doi:10.1109/34.1000236
[10] Criminisi, A. Sharp, T. and Blake, A. (2008). GeoS: Geodesic image segmentation. In ECCV, (pp. 99-112).
[11] Everingham, M. Van Gool, L., Williams, C.K.I., Winn, J. and Zisserman, A. (2011). The PASCAL visual object classes, challenge (VOC2011).
[12] Galleguillos, C. Rabinovich, A. and Belongiem, S. (2008). Object categorization using co-occurrence, location and appearance. In CVPR.
[13] Gastla, E. S. S. L., & Oliveira, M. M. (2011). Domain transform for edge-aware image and video processing. ACM Transactions on Graphics, 30(4), 69.
[14] Goldlucke, B. and Cremers, D. (2010). Convex relaxation for multilabel problems with product label spaces. In ECCV, (pp. 225-238).
[15] Gonfaus, J. M., Boix, X., Van De Weijer, J., Bagdanov, A. D., Serrat, J. and J. (2010). Gonzalez. Harmony potentials for joint classification and segmentation. In IEEE CVPR. · Zbl 1235.68246
[16] Grady, L. (2006). Random walks for image segmentation. TPAMI, 28, 1768-1783. · doi:10.1109/TPAMI.2006.233
[17] Kohli, P., Kumar, M.P. and Torr, P.H.S. (2007). P3 & beyond: Solving energies with higher order cliques. In IEEE CVPR.
[18] Koller, D., & Friedman, N. (2009). Probabilistic graphical models. London: MIT Press. · Zbl 1183.68483
[19] Kolmogorov, V. (2006). Convergent tree-reweighted message passing for energy minimization. IEEE PAMI, 28(10), 1568-1583. · doi:10.1109/TPAMI.2006.200
[20] Komodakis, N. and Paragios, N. (2009). Beyond pairwise energies: Efficient optimization for higher-order MRFs. In IEEE CVPR, (pp. 2985-2992).
[21] Komodakis, N., Paragios, N., & Tziritas, G. (2011). MRF energy minimization and beyond via dual decomposition. IEEE PAMI, 33(3), 531-552. · doi:10.1109/TPAMI.2010.108
[22] Kornprobst, P., Tumblin, J., & Durand, F. (2009). Bilateral filtering: Theory and applications. Foundations and Trends in Computer Graphics and Vision, 4(1), 1-74. · Zbl 1179.68188
[23] Krahenbuhl . P. and Koltun, V. (2011). Efficient inference in fully connected CRFs with gaussian edge potentials. In NIPS, (pp. 109-117).
[24] Kumar, M., Torr, P. and Zisserman, A. (2005). Obj cut. In CVPR, (pp. 18-25).
[25] Kumar, M. P., Veksler, O., & Torr, P. H. S. (2011). Improved moves for truncated convex models. JMLR, 12, 31-67. · Zbl 1280.68174
[26] Ladický, L., Russell, C., Kohli, P. and Torr, P.H.S. (2009). Assiciative hierarchical CRFs for object class image segmentation. In ICCV, (pp. 739-746).
[27] Ladický, L., Russell, C., Kohli, P. and Torr, P.H.S. (2010). Graph cut based inference with co-occurrence statistics. In ECCV, (pp. 239-253). · Zbl 1270.68350
[28] Ladický, L., Sturgess, P., Alahari, K., Russell, C. and Torr, P.H.S. (2010). What, where and how many? combining object detectors and crfs. In ECCV.
[29] Ladický, L., Sturgess, P., Russell, C., Sengupta, S., Bastanlar, Y., Clocksin, W.F. and Torr, P.H.S. (2010). Joint optimisation for object class segmentation and dense stereo reconstruction. In BMVC, (pp. 1-11).
[30] Lan, X., Roth, S., Huttenlocker, D. and Black, M. (2009). Efficient belief propagation with learnerd higher-order markov random fields. In ECCV, (pp. 269-283).
[31] Liu, C., Yuen, J., Torralba, A., Sivic, J. and Freeman, W.T. (2008). SIFT flow: Dense correspondence across different scenes. In ECCV.
[32] Liu, C., Yuen, J. and Torralba, A. (2009). Nonparametric scene parsing: Label transfer via dense scene alignment. In CVPR.
[33] Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV, 42, 145-175. · Zbl 0990.68601
[34] Pawan Kumar, M. and Torr, Philip H.S. (2008). Improved moves for truncated convex models. In NIPS, (pp. 889-896). · Zbl 1280.68174
[35] Payet, N. and Todorovic, S. (2010). \[(\text{ RF })^2\] RF)2-random forest random field. In NIPS.
[36] Potetz, B., & Lee, T. S. (2008). Efficient belief propagation for higher-order cliques using linear constraint nodes. CVIU, 112, 39-54.
[37] Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E. and Belongie, S. (2007). Objects in context. In ICCV.
[38] Rhemann, C., Hosni, A., Bleyer, M., Rother, C. and Gelautz. M. (2011). Fast cost-volume filtering for visual correspondence and beyond. In CVPR, (pp. 3017-3024).
[39] Rother, C., Kohli, P., Feng, W. and Jia, J. (2009). Minimizing sparse higher order energy functions of discrete variables. In CVPR, (pp. 1382-1389).
[40] Rother, C., Kohli, P., Feng, W. and Jia, J. (2009). Minimizing sparse higher order energy functions of discrete variables. In CVPR .
[41] Rother, C., Kolmogorov, V., & Blake, A. (2004). Grabcut: Interactive foreground extraction using iterated graph cuts. ACM TOG, 23, 309-314. · doi:10.1145/1015706.1015720
[42] Shotton, J., Winn, J. M., Rother, C., & Criminisi, A. (2009). Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. IJCV, 81(1), 2-23. · doi:10.1007/s11263-007-0109-1
[43] Singaraju, D., Grady, L. and Vidal R. (2008). P-Brush: Continuous valued MRFs with normed pairwise distributions for image segmentation. In CVPR.
[44] Torralba, A., Murphy, K. P., & Freeman, W. T. (2007). Sharing visual features for multiclass and multiview object detection. IEEE PAMI, 29, 854-869. · doi:10.1109/TPAMI.2007.1055
[45] Toyoda, T., & Hasegawa, O. (2008). Random field model for integration of local information and global information. TPAMI, 30, 1483-1489. · doi:10.1109/TPAMI.2008.105
[46] Turner, R. E. and Sahani, M. (2011). Two problems with variational expectation maximisation for time-series models. In Bayesian time series models, (pp. 109-130).
[47] Veksler, O. (2007). Graph cut based optimization for MRFs with truncated convex priors. In CVPR .
[48] Weiss, Y., Comparing the mean field method and belief propagation for approximate inference in MRFs (2001), Cambridge, MA
[49] Woodford, O., Torr, P. H. S., Reid, I., & Fitzgibbon, A. (2009). Global stereo reconstruction under second-order smoothness priors. IEEE PAMI, 31(12), 2115-2128. · doi:10.1109/TPAMI.2009.131
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.