×

Efficient evolutionary deep neural architecture search (NAS) by noisy network morphism mutation. (English) Zbl 07240067

Pan, Linqiang (ed.) et al., Bio-inspired computing: theories and applications. 14th international conference, BIC-TA 2019, Zhengzhou, China, November 22–25, 2019. Revised selected papers. Part II. Singapore: Springer. Commun. Comput. Inf. Sci. 1160, 497-508 (2020).
Summary: Deep learning has achieved enormous breakthroughs in the field of image recognition. However, due to the time-consuming and error-prone process in discovering novel neural architecture, it remains a challenge for designing a specific network in handling a particular task. Hence, many automated neural architecture search methods are proposed to find suitable deep neural network architecture for a specific task without human experts. Nevertheless, these methods are still computationally/economically expensive, since they require a vast amount of computing resource and/or computational time. In this paper, we propose several network morphism mutation operators with extra noise, and further redesign the macro-architecture based on the classical network. The proposed methods are embedded in an evolutionary algorithm and tested on CIFAR-10 classification task. Experimental results indicate the capability of our proposed method in discovering powerful neural architecture which has achieved a classification error 2.55% with only 4.7M parameters on CIFAR-10 within 12 GPU-hours.
For the entire collection see [Zbl 1440.68010].

MSC:

68Q07 Biologically inspired models of computation (DNA computing, membrane computing, etc.)
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Cai, H., Chen, T., Zhang, W., Yu, Y., Wang, J.: Efficient architecture search by network transformation. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
[2] Cai, H., Yang, J., Zhang, W., Han, S., Yu, Y.: Path-level network transformation for efficient architecture search. arXiv preprint arXiv:1806.02639 (2018)
[3] Chen, L.C., et al.: Searching for efficient multi-scale architectures for dense image prediction. In: Advances in Neural Information Processing Systems, pp. 8699-8710 (2018)
[4] Chen, T., Goodfellow, I., Shlens, J.: Net2Net: accelerating learning via knowledge transfer. arXiv preprint arXiv:1511.05641 (2015)
[5] Chen, Y., Li, J., Xiao, H., Jin, X., Yan, S., Feng, J.: Dual path networks. In: Advances in Neural Information Processing Systems, pp. 4467-4475 (2017)
[6] Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251-1258 (2017)
[7] DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017)
[8] Eiben, A.E., Smith, J.: From evolutionary computation to the evolution of things. Nature 521(7553), 476 (2015)
[9] Elsken, T., Metzen, J.H., Hutter, F.: Efficient multi-objective neural architecture search via Lamarckian evolution. arXiv preprint arXiv:1804.09081 (2018) · Zbl 1485.68229
[10] Elsken, T., Metzen, J.H., Hutter, F.: Neural architecture search: a survey. arXiv preprint arXiv:1808.05377 (2018) · Zbl 1485.68229
[11] Ghiasi, G., Lin, T.Y., Le, Q.V.: DropBlock: a regularization method for convolutional networks. In: Advances in Neural Information Processing Systems, pp. 10727-10737 (2018)
[12] Goldberg, D.E., Deb, K.: A comparative analysis of selection schemes used in genetic algorithms. In: Foundations of Genetic Algorithms, vol. 1, pp. 69-93. Elsevier (1991)
[13] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778 (2016)
[14] Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132-7141 (2018)
[15] Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700-4708 (2017)
[16] Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
[17] Larsson, G., Maire, M., Shakhnarovich, G.: FractalNet: ultra-deep neural networks without residuals. arXiv preprint arXiv:1605.07648 (2016)
[18] Liu, C., et al.: Progressive neural architecture search. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 19-35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_2
[19] Liu, H., Simonyan, K., Vinyals, O., Fernando, C., Kavukcuoglu, K.: Hierarchical representations for efficient architecture search. arXiv preprint arXiv:1711.00436 (2017)
[20] Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018)
[21] Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)
[22] Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268 (2018)
[23] Real, E., Aggarwal, A., Huang, Y., Le, Q.: Aging evolution for image classifier architecture search. In: AAAI Conference on Artificial Intelligence (2019)
[24] Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. arXiv preprint arXiv:1802.01548 (2018)
[25] Real, E., et al.: Large-scale evolution of image classifiers. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2902-2911. JMLR. org (2017)
[26] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
[27] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818-2826 (2016)
[28] Wen, W., Yan, F., Li, H.: AutoGrow: automatic layer growing in deep convolutional networks. arXiv preprint arXiv:1906.02909 (2019)
[29] Wistuba, M.: Deep learning architecture search by neuro-cell-based evolution with function-preserving mutations. In: Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., Ifrim, G. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11052, pp. 243-258. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10928-8_15
[30] Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
[31] Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)
[32] Zhu, H., An, Z., Yang, C., Xu, K., Xu, Y.: EENA: efficient evolution of neural architecture. arXiv preprint arXiv:1905.07320 (2019)
[33] Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016)
[34] Zoph, B.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.