Inference, learning and attention mechanisms that exploit and preserve sparsity in CNNs. (English) Zbl 1477.68362

Summary: Convolutional neural networks (CNNs) are a powerful tool for pattern recognition and computer vision, but they do not scale well to higher-dimensional inputs, because of the associated memory demands for storing and manipulating high-dimensional tensors. This work starts from the observation that higher-dimensional data, like for example 3D voxel volumes, are sparsely populated. CNNs naturally lend themselves to densely sampled data, and sophisticated, massively parallel implementations are available. On the contrary, existing frameworks by and large lack the ability to efficiently process sparse data. Here, we introduce a suite of tools that exploit sparsity in both the feature maps and the filter weights of a CNN, and thereby allow for significantly lower memory footprints and computation times than the conventional dense framework, when processing data with a high degree of sparsity. Our scheme provides (i) an efficient GPU implementation of a convolution layer based on direct, sparse convolution, as well as sparse implementations of the ReLU and max-pooling layers; (ii) a filter step within the convolution layer, which we call attention, that prevents fill-in, i.e., the tendency of convolution to rapidly decrease sparsity, and guarantees an upper bound on the computational resources; and (iii) an adaptation of back-propagation that makes it possible to combine our approach with standard learning frameworks, while still benefitting from sparsity in the data as well as the model.


68T45 Machine vision and scene understanding
68T07 Artificial neural networks and deep learning
Full Text: DOI arXiv Link


[1] Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al. (2016). Tensorflow: A system for large-scale machine learning. In USENIX OSDI.
[2] Alabi, T., Blanchard, J. D., Gordon, B., & Steinbach, R. (2012). Fast k-selection algorithms for graphics processing units. Journal of Experimental Algorithmics. 10.1145/2133803.2345676. · Zbl 1284.68637
[3] Boulch, A. (2019). Generalizing discrete convolutions for unstructured point clouds. In Eurographics 3DOR.
[4] Brock, A., Lim, T., Ritchie, J., & Weston, N. (2017). Generative and discriminative voxel modeling with convolutional neural networks. arXiv:1608.04236.
[5] Chen, LC; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, AL, Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, IEEE TPAMI, 40, 4, 834-848 (2018)
[6] Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., & Shelhamer, E. (2014). CUDNN: Efficient primitives for deep learning. arXiv:1410.0759.
[7] Choy, C., Gwak, J., & Savarese, S. (2019). 4D spatio-temporal convnets: Minkowski convolutional neural networks. In CVPR.
[8] Dai,A., Chang, A. X., Savva, M., Halber, M., Funkhouser, T., & Nießner, M. (2017). ScanNet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR.
[9] Denil, M., Shakibi, B., Dinh, L., de Freitas, N., et al. (2013). Predicting parameters in deep learning. In NIPS.
[10] Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., & Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. In: NIPS.
[11] Duchi, J.; Hazan, E.; Singer, Y., Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, 12, 2121-2159 (2011) · Zbl 1280.68164
[12] Engelcke, M., Rao, D., Wang, D. Z., Tong, C. H., & Posner, I. (2016). Vote3Deep: Fast object detection in 3d point clouds using efficient convolutional neural networks. In ICRA.
[13] Graham, B. (2014). Spatially-sparse convolutional neural networks. arXiv:1409.6070.
[14] Graham, B., & van der, Maaten L. (2017). Submanifold sparse convolutional networks. arXiv:1706.01307.
[15] Graham, B., Engelcke, M., & van der Maaten, L. (2018) 3d semantic segmentation with submanifold sparse convolutional networks. In CVPR.
[16] Hackel, T., Usvyatsov, M., Galliani, S., Wegner, J. D., & Schindler, K., (2018). Inference, learning and attention mechanisms that exploit and preserve sparsity in CNNs. In German conference on pattern recognition (GCPR).
[17] Han, S.; Pool, J.; Tran, J.; Dally, W., Learning both weights and connections for efficient neural network, Advances in Neural Information Processing Systems, 1, 1135-1143 (2015)
[18] Häne, C., Tulsiani, S., & Malik, J.(2017). Hierarchical surface prediction for 3d object reconstruction. In 3DV.
[19] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.
[20] Huang, J., & You, S. (2016). Point cloud labeling using 3d convolutional neural network. In ICPR.
[21] Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., & Brox, T. (2017). Flownet 2.0: Evolution of optical flow estimation with deep networks. In CVPR.
[22] Jaderberg, M., Vedaldi, A., & Zisserman, A. (2014). Speeding up convolutional neural networks with low rank expansions. In BMVC.
[23] Jampani, V., Kiefel, M., & Gehler, P. V. (2016). Learning sparse high dimensional filters: Image filtering, dense CRFs and bilateral neural networks. In CVPR.
[24] Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014) Large-scale video classification with convolutional neural networks. In CVPR.
[25] Krizhevsky, A., Sutskever,I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.
[26] Lai, K., Bo, L., & Fox, D. (2014). Unsupervised feature learning for 3d scene labeling. In ICRA.
[27] LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P., Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86, 11, 2278-2324 (1998)
[28] Li, Y., Pirk, S., Su, H., Qi, C. R., & Guibas, L. J. (2016). FPNN: Field probing neural networks for 3d data. In NIPS.
[29] Liu, B., Wang, M., Foroosh, H., Tappen, M., & Pensky, M. (2015). Sparse convolutional neural networks. In CVPR.
[30] Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In CVPR.
[31] Maturana, D., & Scherer, S. (2015). Voxnet: A 3d convolutional neural network for real-time object recognition. In IROS.
[32] Nissen, MJ; Bullemer, P., Attentional requirements of learning: Evidence from performance measures, Cognitive Psychology, 19, 1, 1-32 (1987)
[33] Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J., Keckler, S. W., & Dally, W. J. (2017). SCNN: An accelerator for compressed-sparse convolutional neural networks. In Int’l symposium on computer architecture.
[34] Park, J., Li, S., Wen, W., Tang, P. T. P., Li, H., Chen, Y., & Dubey, P. (2017). Faster CNNs with direct sparse convolutions and guided pruning. In ICLR.
[35] Prokhorov, D., A Convolutional learning system for object classification in 3-D lidar data, IEEE Transactions on Neural Networks, 21, 5, 858-863 (2010)
[36] Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017a) PointNet: Deep learning on point sets for 3d classification and segmentation. In CVPR.
[37] Qi, C. R., Yi, L., Su, H., & Guibas, L. J. (2017b). PointNet++: Deep hierarchical feature learning on point sets in a metric space. In NIPS.
[38] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS.
[39] Riegler, G., Ulusoy, A. O., Bischof, H., & Geiger, A. (2017a) OctNetFusion: Learning depth fusion from data. In 3DV.
[40] Riegler, G., Ulusoy, A. O., & Geiger, A. (2017b). OctNet: Learning deep 3d representations at high resolutions. In CVPR.
[41] Robertson, EM, The serial reaction time task: Implicit motor skill learning?, Journal of Neuroscience, 27, 38, 10073-10075 (2007)
[42] Song, S., & Xiao, J. (2016). Deep sliding shapes for amodal 3d object detection in rgb-d images. In CVPR.
[43] Tatarchenko, M., Dosovitskiy, A., & Brox, T. (2017). Octree generating networks: Efficient convolutional architectures for high-resolution 3d outputs. In ICCV.
[44] Thomas, H., Qi, C. R., Deschaud, J. E., Marcotegui, B., Goulette, F., & Guibas, L. J. (2019). KPConv: Flexible and deformable convolution for point clouds. In ICCV.
[45] Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., & Geiger, A. (2017). Sparsity invariant CNNs. In 3DV.
[46] Wang, PS; Liu, Y.; Guo, YX; Sun, CY; Tong, X., O-CNN: Octree-based convolutional neural networks for 3d shape analysis, ACM Transactions on Graphics (SIGGRAPH), 36, 4, 1-11 (2017)
[47] Wang, PS; Sun, CY; Liu, Y.; Tong, X., Adaptive O-CNN: A patch-based deep representation of 3d shapes, ACM Transactions on Graphics (SIGGRAPH Asia), 37, 6, 1-11 (2018)
[48] Wen, W., Wu, C., Wang, Y., Chen, Y., & Li, H. (2016). Learning structured sparsity in deep neural networks. In NIPS.
[49] Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3d shapenets: A deep representation for volumetric shapes. In CVPR.
[50] Zhou, T., Brown, M., Snavely, N., & Lowe, D. G. (2017). Unsupervised learning of depth and ego-motion from video. In CVPR.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.