×

A sequential quadratic Hamiltonian algorithm for training explicit RK neural networks. (English) Zbl 1480.49004

Summary: A sequential quadratic hamiltonian (SQH) algorithm for solving nonsmooth supervised learning problems (SLPs) in the framework of residual neural networks is presented. In this framework, a SLP is interpreted as an optimal control problem and the SQH algorithm determines a solution using the characterization of optimality given by a discrete version of the Pontryagin maximum principle.
Convergence and stability of the proposed algorithm is investigated theoretically in the framework of residual neural networks with Runge-Kutta structure, and its computational performance is compared to that of the so-called extended method of successive approximations. Results of numerical experiments demonstrate the superior performance of the SQH algorithm in terms of efficiency and robustness of the training process.

MSC:

49J15 Existence theories for optimal control problems involving ordinary differential equations
49K15 Optimality conditions for problems involving ordinary differential equations
49M05 Numerical methods based on necessary conditions
65K10 Numerical optimization and variational techniques
92B20 Neural networks for/in biological studies, artificial life and related topics
68T05 Learning and adaptive systems in artificial intelligence

Software:

torchdiffeq; LBFGS-B
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] LeCun, Y., A theoretical framework for back-propagation, (Proceedings of the 1988 Connectionist Models Summer School (1988), Morgan Kaufmann: Morgan Kaufmann CMU, Pittsburg, Pa), 21-28
[2] Haber, E.; Ruthotto, L., Stable architectures for deep neural networks, Inverse Problems, 34, Article 014004 pp. (2018) · Zbl 1426.68236
[3] Li, Q.; Chen, L.; Tai, C.; E, W., Maximum principle based algorithms for deep learning, J. Mach. Learn. Res., 18, 1-29 (2018) · Zbl 1467.68156
[4] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: IEEE Conf. Comput. Vis. Pattern Recognit., CVPR, 2016, pp. 770-778.
[5] Breitenbach, T.; Borzì, A., On the SQH scheme to solve nonsmooth PDE optimal control problems, Numer. Funct. Anal. Optim., 40, 1489-1531 (2019) · Zbl 1418.35112
[6] Breitenbach, T.; Borzì, A., A sequential quadratic Hamiltonian method for solving parabolic optimal control problems with discontinuous cost functionals, J. Dyn. Control Syst., 25, 403-435 (2019) · Zbl 1417.49021
[7] Breitenbach, T.; Borzì, A., A sequential quadratic Hamiltonian scheme for solving non-smooth quantum control problems with sparsity, J. Comput. Appl. Math., 369, Article 112583 pp. (2020) · Zbl 1480.81063
[8] Boltyanskiĭ, V. G.; Gamkrelidze, R. V.; Pontryagin, L. S., On the theory of optimal processes, Dokl. Akad. Nauk SSSR (N.S.), 110, 7-10 (1956), (in Russian) · Zbl 0071.18203
[9] Pontryagin, L. S.; Boltyanskiĭ, V. G.; Gamkrelidze, R. V.; Mishchenko, E. F., The Mathematical Theory of Optimal Processes (1962), John Wiley & Sons: John Wiley & Sons New York-London · Zbl 0102.32001
[10] Dmitruk, A. V.; Osmolovskii, N. P., On the proof of Pontryagin’s maximum principle by means of needle variations, J. Math. Sci., 218, 5, 581-598 (2016) · Zbl 1352.49021
[11] Rozonoèr, L. I., Pontryagin maximum principle in the theory of optimum systems, Avtomat. I Telemeh.. Avtomat. I Telemeh., Automat. Remote Control, 20, 1288-1302 (1959), English transl. in · Zbl 0124.05801
[12] H.J. Kelley, R.E. Kopp, H.G. Moyer, Successive approximation techniques for trajectory optimization, in: Proc. IAS Symp. Vehicle Syst. Optim., New York, 1961, pp. 10-25.
[13] Krylov, I. A.; Chernous’ko, F. L., On a method of successive approximations for the solution of problems of optimal control, USSR Comput. Math. Math. Phys.. USSR Comput. Math. Math. Phys., Zh. Vychisl. Mat. Mat. Fiz., 2, 6, 1132-1139 (1962), in Russian · Zbl 0154.10402
[14] Krylov, I. A.; Chernous’ko, F. L., An algorithm for the method of successive approximations in optimal control problems, USSR Comput. Math. Math. Phys., 12, 1, 15-38 (1972) · Zbl 0271.65043
[15] Chernous’ko, F. L.; Lyubushin, A. A., Method of successive approximations for solution of optimal control problems, Optimal Control Appl. Methods, 3, 2, 101-114 (1982) · Zbl 0485.49003
[16] Li, Q.; Hao, S., An optimal control approach to deep learning and applications to discrete-weight neural networks, (Proc. Mach. Learn. Res., Vol. 80 (2018)), 2991-3000
[17] Sakawa, Y.; Shindo, Y., On global convergence of an algorithm for optimal control, IEEE Trans. Automat. Control, 25, 6, 1149-1153 (1980) · Zbl 0489.49017
[18] Shindo, Y.; Sakawa, Y., Local convergence of an algorithm for solving optimal control problems, J. Optim. Theory Appl., 46, 3, 265-293 (1985) · Zbl 0548.49015
[19] Halkin, H., A maximum principle of the pontryagin type for systems described by nonlinear difference equations, SIAM J. Control, 4, 90-111 (1966) · Zbl 0152.09301
[20] Lin, X.; Frank, J., Symplectic runge-kutta discretization of a regularized forward-backward sweep iteration for optimal control problems, J. Comput. Appl. Math. (2021) · Zbl 1447.49043
[21] Hairer, E.; Lubich, C.; Wanner, G., Geometric Numerical Integration - Structure-Preserving Algorithms for Ordinary Differential Equations (2013), Springer: Springer Berlin
[22] Aggarwal, C. C., Neural Networks and Deep Learning - A Textbook (2018), Springer: Springer Heidelberg · Zbl 1402.68001
[23] Chen, R. T.Q.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D., Neural ordinary differential equations, (Proc. 32nd Int. Conf. Neural Inform. Processing Syst.. Proc. 32nd Int. Conf. Neural Inform. Processing Syst., NIPS’18 (2018), Curran Associates Inc.: Curran Associates Inc. Red Hook, NY, USA), 6572-6583
[24] Borzì, A., (Modelling with Ordinary Differential Equations: A Comprehensive Approach. Modelling with Ordinary Differential Equations: A Comprehensive Approach, Numerical Analysis and Scientific Computing Series (2020), Chapman & Hall/CRC: Chapman & Hall/CRC Boca Raton) · Zbl 1472.34001
[25] B. Chang, L. Meng, E. Haber, L. Ruthotto, D. Begert, E. Holtham, Reversible architectures for arbitrarily deep residual neural networks, in: AAAI-18, Vol. 32, 2018, pp. 2811-2818.
[26] Lu, Y.; Zhong, A.; Li, Q.; Dong, B., Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations, (Proc. 35th Intern. Conf. Mach. Learn.. Proc. 35th Intern. Conf. Mach. Learn., Proc. Mach. Learn. Res., vol. 80 (2018), PMLR: PMLR Stockholm), 3276-3285
[27] Larsson, G.; Maire, M.; Shakhnarovich, G., FractalNet: Ultra-deep neural networks without residuals (2016), CoRR abs/1605.07648, arXiv:1605.07648
[28] Huang, G.; Liu, Z.; Weinberger, K. Q., Densely connected convolutional networks (2016), CoRR abs/1608.06993, arXiv:1608.06993
[29] Zhu, M.; Fu, C., Convolutional neural networks combined with Runge-Kutta methods (2018), ArXiv arXiv:1802.08831
[30] Emmrich, E., Discrete Versions of Gronwall’s Lemma and their Application To the Numerical Analysis of Parabolic ProblemsTech. Rep. 637 (1999), T.U. Berlin
[31] Nocedal, J.; Wright, S., Numerical Optimization (2006), Springer-Verlag New York · Zbl 1104.65059
[32] Ulbrich, M., Semismooth Newton Methods for Variational Inequalities and Constrained Optimization Problems in Function Spaces (2011), SIAM · Zbl 1235.49001
[33] Hager, W., Runge-Kutta methods in optimal control and the transformed adjoint system, Numer. Math., 87, 247-282 (2000) · Zbl 0991.49020
[34] Byrd, R.; Lu, P.; Nocedal, J.; Zhu, C., A limited memory algorithm for bound constrained optimization, SIAM J. Sci. Comput., 16, 1190-1208 (1995) · Zbl 0836.65080
[35] Frank, M.; Wolfe, P., An algorithm for quadratic programming, Naval Res. Logist. Quart., 3, 1-2, 95-110 (1956)
[36] Wu, K.; Xiu, D., An explicit neural network construction for piecewise constant function approximation (2018), CoRR, arXiv:1808.07390
[37] Harrison, D.; Rubinfeld, D. L., Hedonic housing prices and the demand for clean air, J. Environ. Econ. Manag., 5, 1, 81-102 (1978) · Zbl 0375.90023
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.