×

zbMATH — the first resource for mathematics

Architecture self-attention mechanism: nonlinear optimization for neural architecture search. (English) Zbl 07312310
Summary: Neural Architecture Search (NAS) is a very prevalent method of automatically designing neural network architectures. It has recently drawn considerable attention since it relieves the manual design labour of neural networks. However, existing NAS methods ignore the interrelationships among candidate architectures in the search space. As a consequence, the objective neural architecture extracted from the search space suffers from performance unstable due to the interrelationship collapse. In this paper, we propose architecture self-attention mechanism for neural architecture search (ASM-NAS) to address the above problem. Specifically, the proposed architecture self-attention mechanism constructs the interrelationships among architectures by interacting information between any two candidate architectures. Through learning the interrelationships, it selectively emphasizes some architectures important to the network while suppressing unimportant ones, which provides significant references for the architecture selection. Therefore, we improves the performance stability of the architecture search by the above startegy. Besides, our proposed method is high-efficiency and executes architecture search with low time and space costs. Compared to other advanced NAS approaches, our ASM-NAS is able to achieve better architecture search performance on the image classification datasets of CIFAR10, CIFAR100, fashionMNIST and ImageNet.
MSC:
68Txx Artificial intelligence
PDF BibTeX XML Cite
Full Text: Link
References:
[1] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, In: Advances in Neural Information Processing Systems, pp. 1097-1105, 2012.
[2] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556, 2014.
[3] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2015.
[4] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
[5] G. Huang, Z. Liu, L.V. Der Maaten, K.Q. Weinberger, Densely connected convolutional networks, In; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700-4708, 2017.
[6] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132-7141, 2018.
[7] I. Guyon, S. Gunn, M. Nikravesh, L.A. Zadeh, Feature Extraction: Foundations and Applications, vol. 207, Springer, 2008. · Zbl 1114.68059
[8] Z. Cai, W. Zhu, Multi-label feature selection via feature manifold learning and sparsity regularization, Int. J. Machine Learn. Cybernetics, 9 (2018), 1321-1334.
[9] M. Nixon, A. Aguado, Feature Extraction and Image Processing for Computer Vision, Academic press, 2019.
[10] M. Bramer, Principles of Data Mining, vol. 180, Springer, 2007. · Zbl 1116.68069
[11] W. Zhu, Relationship between generalized rough sets based on binary relation and covering, Info. Sci. 179 (2009), 210-225. · Zbl 1163.68339
[12] Z. Cai, X. Yang, T. Huang, W. Zhu, A new similarity combining reconstruction coefficient with pairwise distance for agglomerative clustering, Info. Sci. 508 (2020), 173-182.
[13] B. Zoph, Q.V. Le, Neural architecture search with reinforcement learning, arXiv preprint arXiv:1611.01578, 2016.
[14] B. Zoph, V. Vasudevan, J. Shlens, Q.V. Le, Learning transferable architectures for scalable image recognition, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697-8710, 2018.
[15] E. Real, A. Aggarwal, Y. Huang, Q.V. Le, Regularized evolution for image classifier architecture search, In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4780-4789, 2019.
[16] H. Liu, K. Simonyan, O. Vinyals, C. Fernando, K. Kavukcuoglu, Hierarchical representations for efficient architecture search, arXiv preprint arXiv:1711.00436, 2017.
[17] H. Pham, M.Y. Guan, B. Zoph, Q.V. Le, J. Dean, Efficient neural architecture search via parameter sharing, arXiv preprint arXiv:1802.03268, 2018.
[18] H. Liu, K. Simonyan, Y. Yang, Darts: Differentiable architecture search, arXiv preprint arXiv:1806.09055, 2018.
[19] S. Xie, H. Zheng, C. Liu, L. Lin, Snas:stochastic neural architecture search, arXiv preprint arXiv:1812.09926, 2018.
[20] A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, Mobilenets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint arXiv:1704.04861, 2017.
[21] B. Baker, O. Gupta, N. Naik, R. Raskar, Designing neural network architectures using reinforcement learning, arXiv preprint arXiv:1611.02167, 2016.
[22] E. Real, S. Moore, A. Selle, S. Saxena, Y.L. Suematsu, J. Tan, Q.V. Le, A. Kurakin, Large-scale evolution of image classifiers, In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2902-2911, 2017.
[23] T. Elsken, J.H. Metzen, F. Hutter, Efficient multi-objective neural architecture search via lamarckian evolution, arXiv preprint arXiv:1804.09081, 2018. · Zbl 07049774
[24] C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L.J. Li, L.L. Fei, A. Yuille, J. Huang, K. Murphy, Progressive neural architecture search, In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 19-34, 2018.
[25] X. Dong, Y. Yang, Searching for a robust neural architecture in four gpu hours, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1761-1770, 2019.
[26] X. Chen, L. Xie, J. Wu, Q. Tian, Progressive differentiable architecture search: Bridging the depth gap between search and evaluation, In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1294-1303, 2019.
[27] Y. Xu, L. Xie, X. Zhang, X. Chen, G.J. Qi, Q. Tian, H. Xiong, Pc-darts: Partial channel connections for memory-efficient differentiable architecture search, arXiv preprint arXiv:1907.05737, 2019.
[28] L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Trans. Pattern Anal. Machine Intell. 20 (1998), 1254-12598.
[29] M. Corbetta, G.L. Shulman, Control of goal-directed and stimulus-driven attention in the brain, Nature Rev. Neurosci. 3 (2002), 201-215.
[30] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, Adv. Neural Info. Processing Sys. 30 (2017), 5998-6008.
[31] S. Woo, J. Park, J.Y. Lee, I.S. Kweon, Cbam: Convolutional block attention module, In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3-19, 2018.
[32] V. Nair, G.E. Hinton, Rectified linear units improve restricted boltzmann machines, In: Proceedings of the 27th International Conference on Machine Learning, pp. 807-814, Haifa, 2010.
[33] L. Bttou, F.E. Curtis, J. Nocedal, Optimization methods for large-scale machine learning, SIAM Rev. 60 (2018), 223-311. · Zbl 1397.65085
[34] A. Krizhevsky, Learning multiple layers of features from tiny images, Technical Report TR-2009, University of Toronto, Toronto, 2009.
[35] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, F. Li, Imagenet: A large-scale hierarchical image database, In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248-255, 2009.
[36] X.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.