×

Deep neural networks with a set of node-wise varying activation functions. (English) Zbl 1468.68184

Summary: In this study, we present deep neural networks with a set of node-wise varying activation functions. The feature-learning abilities of the nodes are affected by the selected activation functions, where the nodes with smaller indices become increasingly more sensitive during training. As a result, the features learned by the nodes are sorted by the node indices in order of their importance such that more sensitive nodes are related to more important features. The proposed networks learn input features but also the importance of the features. Nodes with lower importance in the proposed networks can be pruned to reduce the complexity of the networks, and the pruned networks can be retrained without incurring performance losses. We validated the feature-sorting property of the proposed method using both shallow and deep networks as well as deep networks transferred from existing networks.

MSC:

68T07 Artificial neural networks and deep learning
62H25 Factor analysis and principal components; correspondence analysis
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Baldi, P.; Hornik, K., Neural networks and principal component analysis: Learning from examples without local minima, Neural Networks, 2, 53-58 (1989)
[2] Bansal, N.; Chen, X.; Wang, Z., Can we gain more from orthogonality regularizations in training deep cnns? (2018), arXiv preprint arXiv:1810.09102
[3] Bertsekas, D. P., Nonlinear programming (1999), Athena scientific Belmont · Zbl 1015.90077
[4] Chauvin, Y., A back-propagation algorithm with optimal use of hidden units, (Advances in neural information processing systems (1989)), 519-526
[5] Cheng, Y.; Wang, D.; Zhou, P.; Zhang, T., A survey of model compression and acceleration for deep neural networks (2017), arXiv preprint arXiv:1710.09282
[6] Chong, E. K.; Zak, S. H., An introduction to optimization, Vol. 76 (2013), John Wiley & Sons · Zbl 1266.90001
[7] Collins, M. D.; Kohli, P., Memory bounded deep convolutional networks (2014), arXiv preprint arXiv:1412.1442
[8] Deng, J.; Dong, W.; Socher, R.; Li, L. J.; Li, K.; Fei-Fei, L., Imagenet: A large-scale hierarchical image database, (2009 IEEE conference on computer vision and pattern recognition (2009), Ieee), 248-255
[9] Du, Y.; Fu, Y.; Wang, L., Skeleton based action recognition with convolutional neural network, (2015 3rd IAPR Asian conference on pattern recognition (2015), IEEE), 579-583
[10] Han, S.; Mao, H.; Dally, W. J., Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding (2015), arXiv preprint arXiv:1510.00149
[11] Han, S.; Pool, J.; Tran, J.; Dally, W., Learning both weights and connections for efficient neural network, (Advances in neural information processing systems (2015)), 1135-1143
[12] Hanson, S. J.; Pratt, L. Y., Comparing biases for minimal network construction with back-propagation, (Advances in neural information processing systems (1989)), 177-185
[13] Hassibi, B.; Stork, D. G., Second order derivatives for network pruning: Optimal brain surgeon, (Advances in neural information processing systems (1993)), 164-171
[14] Haykin, S., Neural networks, a comprehensive foundationTechnical Report (1994), Macmilan · Zbl 0828.68103
[15] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
[16] He, Y., Zhang, X., & Sun, J. (2017). Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision (pp. 1389-1397).
[17] Huang, L.; Liu, X.; Lang, B.; Yu, A. W.; Wang, Y.; Li, B., Orthogonal weight normalization: Solution to optimization over multiple dependent stiefel manifolds in deep neural networks, (Thirty-second AAAI conference on artificial intelligence (2018))
[18] Ishikawa, M., Structural learning with forgetting, Neural Networks, 9, 509-521 (1996)
[19] Karmin, E., A simple procedure for pruning back-propagation trained networks, IEEE Transactions on Neural Networks, 1, 239-242 (1990)
[20] Kingma, D. P.; Ba, J., Adam: A method for stochastic optimization (2014), arxiv.org
[21] Krizhevsky, A.; Hinton, G., Learning multiple layers of features from tiny images (2009)
[22] Krizhevsky, A.; Sutskever, I.; Hinton, G. E., Imagenet classification with deep convolutional neural networks, (Advances in neural information processing systems (2012)), 1097-1105
[23] Lebedev, V.; Lempitsky, V., Fast convnets using group-wise brain damage, (2016 IEEE conference on computer vision and pattern recognition (2016), IEEE), 2554-2564
[24] LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P., Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86, 2278-2324 (1998)
[25] LeCun, Y.; Denker, J. S.; Solla, S. A., Optimal brain damage, (Advances in neural information processing systems (1990)), 598-605
[26] Li, H.; Kadav, A.; Durdanovic, I.; Samet, H.; Graf, H. P., Pruning filters for efficient convnets (2016), arXiv preprint arXiv:1608.08710
[27] Lucey, P.; Cohn, J. F.; Kanade, T.; Saragih, J.; Ambadar, Z.; Matthews, I., The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression, (2010 IEEE computer society conference on computer vision and pattern recognition workshops (2010), IEEE), 94-101
[28] Luo, J. H.; Wu, J., An entropy-based pruning method for cnn compression (2017), arXiv preprint arXiv:1706.05791
[29] Luo, J. H., Wu, J., & Lin, W. (2017). Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE international conference on computer vision (pp. 5058-5066).
[30] Mozer, M. C.; Smolensky, P., Skeletonization: A technique for trimming the fat from a network via relevance assessment, (Advances in neural information processing systems (1989)), 107-115
[31] Reed, R., Pruning algorithms-a survey, IEEE Transactions on Neural Networks, 4, 740-747 (1993)
[32] Rodríguez, P.; Gonzàlez, J.; Cucurull, G.; Gonfaus, J. M.; Roca, X., Regularizing cnns with locally constrained decorrelations (2016), arxiv preprint. arXiv preprint arXiv:1611.01967
[33] Rumelhart, D. E.; Hinton, G. E.; Williams, R. J., Learning representations by back-propagating errors, Nature, 323, 533 (1986) · Zbl 1369.68284
[34] Simonyan, K.; Zisserman, A., Very deep convolutional networks for large-scale image recognition (2014), arXiv preprint arXiv:1409.1556
[35] Srinivas, S.; Babu, R. V., Data-free parameter pruning for deep neural networks (2015), arXiv preprint arXiv:1507.06149
[36] Xu, X.; Park, M. S.; Brick, C., Hybrid pruning: Thinner sparse networks for fast inference on edge devices (2018), arXiv preprint arXiv:1811.00482
[37] Yang, Z., Moczulski, M., Denil, M., de Freitas, N., Smola, A., & Song, L., et al. (2015). Deep fried convnets. In Proceedings of the IEEE international conference on computer vision (pp. 1476-1483).
[38] Zhou, H.; Alvarez, J. M.; Porikli, F., Less is more: Towards compact cnns, (European conference on computer vision (2016), Springer), 662-677
[39] Zhu, X.; Zhou, W.; Li, H., Improving deep neural network sparsity through decorrelation regularization, (IJCAI (2018)), 3264-3270
[40] Zhuang, Z.; Tan, M.; Zhuang, B.; Liu, J.; Guo, Y.; Wu, Q., Discrimination-aware channel pruning for deep neural networks, (Advances in neural information processing systems (2018)), 875-886
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.