×

Smoothed dilated convolutions for improved dense prediction. (English) Zbl 1473.68181

Summary: Dilated convolutions, also known as atrous convolutions, have been widely explored in deep convolutional neural networks (DCNNs) for various dense prediction tasks. However, dilated convolutions suffer from the gridding artifacts, which hampers the performance. In this work, we propose two simple yet effective degridding methods by studying a decomposition of dilated convolutions. Unlike existing models, which explore solutions by focusing on a block of cascaded dilated convolutional layers, our methods address the gridding artifacts by smoothing the dilated convolution itself. In addition, we point out that the two degridding approaches are intrinsically related and define separable and shared (SS) operations, which generalize the proposed methods. We further explore SS operations in view of operations on graphs and propose the SS output layer, which is able to smooth the entire DCNNs by only replacing the output layer. We evaluate our degridding methods and the SS output layer thoroughly, and visualize the smoothing effect through effective receptive field analysis. Results show that our methods degridding yield consistent improvements on the performance of dense prediction tasks, while adding negligible amounts of extra training parameters. And the SS output layer improves the performance by 3.3% and contains only 9% training parameters of the original output layer.

MSC:

68T07 Artificial neural networks and deep learning
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: Proceedings of the 12th \(\{\) USENIX \(\}\) symposium on operating systems design and implementation. pp 265-283
[2] Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017a) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834-848
[3] Chen LC, Papandreou G, Schroff F, Adam H (2017b) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587
[4] Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 1251-1258
[5] Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 3213-3223
[6] Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems. pp 379-387
[7] Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 248-255
[8] Everingham, M.; Van Gool, L.; Williams, CK; Winn, J.; Zisserman, A., The pascal visual object classes (voc) challenge, Int J Comput Vis, 88, 2, 303-338 (2010) · doi:10.1007/s11263-009-0275-4
[9] Gao H, Wang Z, Ji S (2018) Large-scale learnable graph convolutional networks. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1416-1424
[10] Gao, H.; Yuan, H.; Wang, Z.; Ji, S., Pixel transposed convolutional networks, IEEE Trans Pattern Anal Mach Intell, 42, 5, 1218-1227 (2019)
[11] Giusti A, Cireşan DC, Masci J, Gambardella LM, Schmidhuber J (2013) Fast image scanning with deep max-pooling convolutional neural networks. In: Proceedings of the IEEE international conference on image processing. IEEE, pp 4034-4038
[12] Hamaguchi R, Fujita A, Nemoto K, Imaizumi T, Hikosaka S (2018) Effective use of dilated convolutions for segmenting small object instances in remote sensing imagery. In: Proceedings of the IEEE winter conference on applications of computer vision. IEEE, pp 1442-1450
[13] Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Advances in neural information processing systems. pp 1024-1034
[14] Hariharan B, Arbeláez P, Bourdev L, Maji S, Malik J (2011) Semantic contours from inverse detectors. In: Proceedings of the IEEE international conference on computer vision. IEEE, pp 991-998
[15] He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 770-778
[16] Holschneider M, Kronland-Martinet R, Morlet J, Tchamitchian P (1990) A real-time algorithm for signal analysis with the help of the wavelet transform. In: Wavelets. Springer, pp 286-297 · Zbl 0850.94021
[17] Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S et al (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 7310-7311
[18] Kalchbrenner N, Espeholt L, Simonyan K, Oord Avd, Graves A, Kavukcuoglu K (2016) Neural machine translation in linear time. arXiv preprint arXiv:1610.10099
[19] Kalchbrenner N, van den Oord A, Simonyan K, Danihelka I, Vinyals O, Graves A, Kavukcuoglu K (2017) Video pixel networks. In: Proceedings of the international conference on machine learning. pp 1771-1779
[20] Li H, Zhao R, Wang X (2014) Highly efficient forward and backward propagation of convolutional neural networks for pixelwise classification. arXiv preprint arXiv:1412.4526
[21] Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Proceedings of the European conference on computer vision. Springer, pp 740-755
[22] Liu W, Rabinovich A, Berg AC (2015) Parsenet: Looking wider to see better. arXiv preprint arXiv:1506.04579
[23] Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 3431-3440
[24] Luo W, Li Y, Urtasun R, Zemel R (2016) Understanding the effective receptive field in deep convolutional neural networks. In: Advances in neural information processing systems. pp 4898-4906
[25] Mamalet F, Garcia C (2012) Simplifying convnets for fast learning. In: Proceedings of the international conference on artificial neural networks. Springer, pp 58-65
[26] Monti F, Boscaini D, Masci J, Rodola E, Svoboda J, Bronstein MM (2017) Geometric deep learning on graphs and manifolds using mixture model CNNs. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 5115-5124
[27] Oord Avd, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499
[28] Papandreou G, Kokkinos I, Savalle PA (2015) Modeling local and global deformations in deep learning: epitomic convolution, multiple instance learning, and sliding window detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 390-399
[29] Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, Lecun Y (2014) Overfeat: Integrated recognition, localization and detection using convolutional networks. In: Proceedings of the international conference on learning representations
[30] Shensa, MJ, The discrete wavelet transform: wedding the a trous and Mallat algorithms, IEEE Trans Signal Process, 40, 10, 2464-2482 (1992) · Zbl 0825.94053 · doi:10.1109/78.157290
[31] Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2018) Graph attention networks. In: Proceedings of the international conference on learning representations
[32] Wang Z, Ji S (2018) Smoothed dilated convolutions for improved dense prediction. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 2486-2495
[33] Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G (2018) Understanding convolution for semantic segmentation. In: Proceedings of the IEEE winter conference on applications of computer vision. IEEE, pp 1451-1460
[34] Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: Proceedings of the international conference on learning representations
[35] Yu F, Koltun V, Funkhouser T (2017) Dilated residual networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 472-480
[36] Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 2881-2890
[37] Ziegler T, Fritsche M, Kuhn L, Donhauser K (2019) Efficient smoothing of dilated convolutions for image segmentation. arXiv preprint arXiv:1903.07992
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.