Revisiting deep structured models for pixel-level labeling with gradient-based inference.

*(English)*Zbl 1448.68442##### MSC:

68T45 | Machine vision and scene understanding |

68T07 | Artificial neural networks and deep learning |

68U10 | Computing methodologies for image processing |

94A08 | Image processing (compression, reconstruction, etc.) in information and communication theory |

PDF
BibTeX
XML
Cite

\textit{M. Larsson} et al., SIAM J. Imaging Sci. 11, No. 4, 2610--2628 (2018; Zbl 1448.68442)

Full Text:
DOI

##### References:

[1] | A. Adams, J. Baek, and M. A. Davis, Fast high-dimensional filtering using the permutohedral lattice, Computer Graphics Forum, 29 (2010), pp. 753–762. |

[2] | T. Ajanthan, A. Desmaison, R. Bunel, M. Salzmann, P. H. S. Torr, and M. P. Kumar, Efficient linear programming for dense CRFs, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Washington, DC, 2017, pp. 2934–2942. |

[3] | A. Arnab, S. Jayasumana, S. Zheng, and P. H. S. Torr, Higher order conditional random fields in deep neural networks, in European Conference on Computer Vision 2016, Lecture Notes in Comput. Sci. 9906, Springer, Berlin, 2016, pp. 524–540. |

[4] | A. Arnab, S. Zheng, S. Jayasumana, B. Romera-Paredes, M. Larsson, A. Kirillov, B. Savchynskyy, C. Rother, F. Kahl, and P. H. S. Torr, Conditional random fields meet deep neural networks for semantic segmentation: Combining probabilistic graphical models with deep learning for structured prediction, IEEE Signal Process. Mag., 35 (2018), pp. 37–52, . |

[5] | A. Beck and M. Teboulle, Mirror descent and nonlinear projected subgradient methods for convex optimization, Oper. Res. Lett., 31 (2003), pp. 167–175. · Zbl 1046.90057 |

[6] | D. Belanger and A. McCallum, Structured prediction energy networks, in Proceedings of the International Conference on Machine Learning (ICML’16), ACM, New York, 2016, pp. 983–992. |

[7] | D. Belanger, B. Yang, and A. McCallum, End-to-End Learning for Structured Prediction Energy Networks, preprint, , 2017. |

[8] | G. Bertasius, L. Torresani, S. X. Yu, and J. Shi, Convolutional random walk networks for semantic image segmentation, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Honolulu, HI), IEEE, Washington, DC, 2017. |

[9] | A. Blake, P. Kohli, and C. Rother, Markov Random Fields for Vision and Image Processing, MIT Press, Cambridge, MA, 2011. · Zbl 1236.68001 |

[10] | E. Borenstein and S. Ullman, Class-specific, top-down segmentation, in European Conference on Computer Vision 2002, Lecture Notes in Comput. Sci. 2351, Springer, Berlin, 2002, pp. 109–122. · Zbl 1039.68601 |

[11] | E. Boros and P. Hammer, Pseudo-Boolean optimization, Discrete Appl. Math., 123 (2002), pp. 155–225. · Zbl 1076.90032 |

[12] | L. Bottou, Y. Bengio, and Y. Le Cun, Global training of document processing systems using graph transformer networks, in IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Washington, DC, 1997, pp. 489–494. |

[13] | S. Chandra and I. Kokkinos, Fast, exact and multi-scale inference for semantic image segmentation with deep Gaussian CRFs, in European Conference on Computer Vision 2016, Lecture Notes in Comput. Sci. 9911, Springer, Berlin, pp. 402–418. |

[14] | L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, Semantic image segmentation with deep convolutional nets and fully connected CRFs, in International Conference on Learning Representations, San Diego, CA, 2015. |

[15] | L. Chen, A. Schwing, A. Yuille, and R. Urtasun, Learning deep structured models, in Proceedings of the International Conference on Machine Learning (Lille, France), ACM, New York, 2015, pp. 1785–1794. |

[16] | L.-C. Chen, J. T. Barron, G. Papandreou, K. Murphy, and A. L. Yuille, Semantic image segmentation with task-specific edge detection using CNNs and a discriminatively trained domain transform, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Washington, DC, 2016, pp. 4545–4554. |

[17] | L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs, IEEE Trans. Pattern Anal. Mach. Intell., 40 (2018), pp. 834–848. |

[18] | Y. Chen and X. Ye, Projection onto a Simplex, preprint, , 2011. |

[19] | A. Desmaison, R. Bunel, P. Kohli, P. H. S. Torr, and M. P. Kumar, Efficient continuous relaxations for dense CRF, in European Conference on Computer Vision 2016, Lecture Notes in Comput. Sci. 9906, Springer, Berlin, 2016, pp. 818–833. |

[20] | J. Domke, Learning graphical model parameters with approximate marginal inference, IEEE Trans. Pattern Anal. Mach. Intell., 35 (2013), pp. 2454–2467. |

[21] | M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, The Pascal visual object classes challenge: A retrospective, Internat. J. Comput. Vis., 111 (2015), pp. 98–136, . |

[22] | M. Everingham, L. V. Gool, C. K. I. Williams, J. Winn, and A. Zisserman, The Pascal visual object classes (VOC) challenge, Internat. J. Comput. Vis., 88 (2010), pp. 303–338. |

[23] | G. Ghiasi and C. Fowlkes, Laplacian reconstruction and refinement for semantic segmentation, in European Conference on Computer Vision 2016, Lecture Notes in Comput. Sci. 9906, Springer, Berlin, 2016, pp. 519–534. |

[24] | R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in IEEE Conference on Computer Vision and Pattern Recognition (Columbus, OH), IEEE, Washington, DC, 2014. |

[25] | B. Hariharan, P. Arbelez, L. Bourdev, S. Maji, and J. Malik, Semantic contours from inverse detectors, in 2011 International Conference on Computer Vision (Barcelona, Spain), IEEE, Washington, DC, 2011, pp. 991–998, . |

[26] | K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in IEEE Conference on Computer Vision and Pattern Recognition (Las Vegas, NV), IEEE, Washington, DC, 2016. |

[27] | O. H. Jafari, O. Groth, A. Kirillov, M. Y. Yang, and C. Rother, Analyzing modular CNN architectures for joint depth prediction and semantic segmentation, in Proceedings of the IEEE International Conference on Robotics and Automation, IEEE, Washington, DC, 2017, pp. 4620–4627. |

[28] | V. Jampani, M. Kiefel, and P. V. Gehler, Learning sparse high dimensional filters: Image filtering, dense CRFs and bilateral neural networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Washington, DC, 2016, pp. 4452–4461. |

[29] | Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, Caffe: Convolutional Architecture for Fast Feature Embedding, preprint, , 2014. |

[30] | A. Kirillov, D. Schlesinger, S. Zheng, B. Savchynskyy, P. Torr, and C. Rother, Joint training of generic CNN-CRF models with stochastic optimization, in Asian Conference on Computer Vision 2016, Lecture Notes in Comput. Sci. 10112, Springer, Berlin, pp. 221–236. |

[31] | D. Koller and N. Friedman, Probabilistic Graphical Models, MIT Press, Cambridge, MA, 2009. · Zbl 1183.68483 |

[32] | P. Kraehenbuehl and V. Koltun, Parameter learning and convergent inference for dense random fields, in Proceedings of the 30th International Conference on Machine Learning, ACM, New York, 2013, pp. 513–521. |

[33] | P. Krähenbühl and V. Koltun, Efficient inference in fully connected CRFs with Gaussian edge potentials, in Proceedings of Neural Information Processing Systems 2011, Curran Associates, Red Hook, NY, 2011, pp. 109–117. |

[34] | M. Larsson, A. Arnab, F. Kahl, S. Zheng, and P. H. S. Torr, A projected gradient descent method for CRF inference allowing end-to-end training of arbitrary pairwise potentials, in 11th International Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), Springer, Berlin, 2017. |

[35] | G. Lin, A. Milan, C. Shen, and I. Reid, RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation, preprint, , 2016. |

[36] | G. Lin, C. Shen, A. Hengel, and I. Reid, Efficient piecewise training of deep structured models for semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Washington, DC, 2016, pp. 3194–3203. |

[37] | T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, Microsoft COCO: Common objects in context, in Proceedings of the European Conference on Computer Vision, Lecture Notes in Comput. Sci. 8693, Springer, Berlin, 2014, pp. 740–755, . |

[38] | Z. Liu, X. Li, P. Luo, C. C. Loy, and X. Tang, Semantic image segmentation via deep parsing network, in Proceedings of the International Conference on Computer Vision, IEEE, Washington, DC, 2015, pp. 1377–1385. |

[39] | J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, IEEE Trans. Pattern Recognition, 39 (2017), pp. 640–651. |

[40] | S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, in Proceedings of Neural Information Processing Systems 2015, MIT Press, Cambridge, MA, 2015, pp. 91–99. |

[41] | C. Rother, V. Kolmogorov, and A. Blake, “GrabCut”: Interactive foreground extraction using iterated graph cuts, ACM Trans. Graphics, 23 (2004), pp. 309–314. |

[42] | A. Schwing and R. Urtasun, Fully Connected Deep Structured Networks, preprint, , 2015. |

[43] | K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, in Proceedings of the International Conference on Learning Representations, 2015, pp. 1–14. |

[44] | V. Vineet, J. Warrell, and P. Torr, Filter-based mean-field inference for random fields with higher order terms and product label-spaces, in European Conference on Computer Vision 2012, Lecture Notes in Comput. Sci. 7576, Springer, Berlin, pp. 31–44. · Zbl 1328.68252 |

[45] | P. Wang, X. Shen, Z. Lin, S. Cohen, B. Price, and A. Yuille, Towards unified depth and semantic prediction from a single image, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2014, IEEE, Washington, DC, 2014. |

[46] | W. Wang, S. Fidler, and R. Urtasun, Proximal deep structured models, in Proceedings of Neural Information Processing Systems 2016, Curran Associates, Red Hook, NY, 2016, pp. 865–873. |

[47] | S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. Torr, Conditional random fields as recurrent neural networks, in Proceedings of the IEEE International Conference on Computer Vision 2015, IEEE, Washington, DC, 2015, pp. 1529–1537. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.