×

Generalized momentum-based methods: a Hamiltonian perspective. (English) Zbl 1462.90087

Summary: We take a Hamiltonian-based perspective to generalize Nesterov’s accelerated gradient descent and Polyak’s heavy ball method to a broad class of momentum methods in the setting of (possibly) constrained minimization in Euclidean and non-Euclidean normed vector spaces. Our perspective leads to a generic and unifying nonasymptotic analysis of convergence of these methods in both the function value (in the setting of convex optimization) and in the norm of the gradient (in the setting of unconstrained, possibly nonconvex, optimization). Our approach relies upon a time-varying Hamiltonian that produces generalized momentum methods as its equations of motion. The convergence analysis for these methods is intuitive and is based on the conserved quantities of the time-dependent Hamiltonian.

MSC:

90C25 Convex programming
49N15 Duality theory (optimization)
65K05 Numerical mathematical programming methods

References:

[1] Z. Allen-Zhu, Natasha: Faster non-convex stochastic optimization via strongly non-convex parameter, Proc. Mach. Learn. Res. PMLR, 70 (2017), pp. 89-97.
[2] Z. Allen-Zhu and L. Orecchia, Linear coupling: An ultimate unification of gradient and mirror descent, in Proc. ITCS’17, LIPIcs Leibniz Int. Proc. Inform ITCS-2017, Schloss Dagstuhl, Wadern, 2017. · Zbl 1402.90209
[3] U. M. Ascher, Discrete processes and their continuous limits, preprint, J. Dyn. Games, 7 (2020), pp. 123-140. · Zbl 1461.65124
[4] H. Attouch and F. Alvarez, The heavy ball with friction dynamical system for convex constrained minimization problems, in Optimization, Springer, Berlin, 2000, pp. 25-35. · Zbl 0980.90062
[5] H. Attouch, A. Cabot, and P. Redont, The dynamics of elastic shocks via epigraphical regularization of a differential inclusion. Barrier and penalty approximations, Adv. Math. Sci. Appl., 12 (2002), pp. 273-306. · Zbl 1038.49029
[6] H. Attouch, Z. Chbani, J. Fadili, and H. Riahi, First-order optimization algorithms via inertial systems with Hessian driven damping, Math. Program., to appear. · Zbl 1497.37121
[7] H. Attouch, Z. Chbani, and H. Riahi, Rate of convergence of the Nesterov accelerated gradient method in the subcritical case \(\alpha \leq 3\), ESAIM Control Optim. Calc. Var., 25 (2019), 2. · Zbl 1437.49045
[8] H. Attouch, X. Goudou, and P. Redont, The heavy ball with friction method, I. The continuous dynamical system: Global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system, Commun. Contemp. Math., 2 (2000), pp. 1-34. · Zbl 0983.37016
[9] H. Attouch, P.-E. Maingé, and P. Redont, A second-order differential system with Hessian-driven damping; Application to non-elastic shock laws, Differ. Equ. Appl., 4 (2012), pp. 27-65. · Zbl 1248.34091
[10] H. Attouch and J. Peypouquet, Convergence rate of proximal inertial algorithms associated with Moreau envelopes of convex functions, in Splitting Algorithms, Modern Operator Theory, and Applications, Springer, Cham, Switzerland, 2019, pp. 1-44.
[11] P. Bégout, J. Bolte, and M. A. Jendoubi, On damped second-order gradient systems, J. Differential Equations, 259 (2015), pp. 3115-3143. · Zbl 1347.34082
[12] D. P. Bertsekas, Control of Uncertain Systems with a Set-Membership Description of the Uncertainty, Ph.D. thesis, MIT, Cambridge, MA, 1971.
[13] M. Betancourt, M. I. Jordan, and A. C. Wilson, On Symplectic Optimization, preprint, arXiv:1802.03653, 2018.
[14] J. Bolte, A. Daniilidis, O. Ley, and L. Mazet, Characterizations of Lojasiewicz inequalities: Subgradient flows, talweg, convexity, Trans. Amer. Math. Soc., 362 (2010), pp. 3319-3363. · Zbl 1202.26026
[15] E. Bostan, M. Soltanolkotabi, D. Ren, and L. Waller, Accelerated Wirtinger flow for multiplexed Fourier ptychographic microscopy, in Proc. IEEE ICIP’18, IEEE, Piscataway, NJ, pp. 3823-3827.
[16] S. Bubeck, Y. T. Lee, and M. Singh, A Geometric Alternative to Nesterov’s Accelerated Gradient Descent, preprint, arXiv:1506.08187, 2015.
[17] A. Cabot, H. Engler, and S. Gadat, On the long time behavior of second order differential equations with asymptotically small dissipation, Trans. Amer. Math. Soc., 361 (2009), pp. 5983-6017. · Zbl 1191.34078
[18] Y. Carmon, J. C. Duchi, O. Hinder, and A. Sidford, Lower bounds for finding stationary points I, Math. Program., 184 (2020), pp. 71-120. · Zbl 1451.90128
[19] C. Castera, J. Bolte, C. Févotte, and E. Pauwels, An Inertial Newton Algorithm for Deep Learning, in 33rd Conference on Neural Information Processing Systems, Neural Information Processing Systems Foundation, La Jolla, CA, 2019.
[20] A. Chambolle and C. Dossal, On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”, J. Optim. Theory Appl., 166 (2015), pp. 968-982. · Zbl 1371.65047
[21] M. B. Cohen, J. Diakonikolas, and L. Orecchia, On acceleration with noise-corrupted gradients, in Proc. ICML’18, Curran Associates, Red Hook, NY, 2018, pp. 1019-1028.
[22] D. Davis and B. Grimmer, Proximally guided stochastic subgradient method for nonsmooth, nonconvex problems, SIAM J. Optim., 29 (2019), pp. 1908-1930. · Zbl 1431.65084
[23] J. Diakonikolas and C. Guzmán, Lower bounds for parallel and randomized convex optimization, J. Mach. Learn. Res., 21 (2020), pp. 1-31. · Zbl 1497.68222
[24] J. Diakonikolas and L. Orecchia, Accelerated extra-gradient descent: A novel, accelerated first-order method, in Proc. ITCS’18, LIPIcs Leibniz Int. Proc. Inform. ITCS-2018, 2018, 23. · Zbl 1462.90088
[25] J. Diakonikolas and L. Orecchia, The approximate duality gap technique: A unified theory of first-order methods, SIAM J. Optim., 29 (2019), pp. 660-689. · Zbl 1412.90085
[26] D. Drusvyatskiy, M. Fazel, and S. Roy, An optimal first order method based on optimal quadratic averaging, SIAM J. Optim., 28 (2018), pp. 251-271. · Zbl 1382.65169
[27] G. França, J. Sulam, D. P. Robinson, and R. Vidal, Conformal symplectic and relativistic optimization, J. Stat. Mech. Theory Exp., 2020 (2020), 124008. · Zbl 1539.37089
[28] A. V. Gasnikov and Y. E. Nesterov, Universal method for stochastic composite optimization problems, Comput. Math. Math. Phys., 58 (2018), pp. 48-64. · Zbl 1457.90099
[29] E. Ghadimi, H. R. Feyzmahdavian, and M. Johansson, Global convergence of the heavy-ball method for convex optimization, in Proc. IEEE ECC’15, IEEE, Piscataway, NJ, 2015, pp. 310-315.
[30] S. Ghadimi and G. Lan, Accelerated gradient methods for nonconvex nonlinear and stochastic programming, Math. Program., 156 (2016), pp. 59-99. · Zbl 1335.62121
[31] S. Ghadimi, G. Lan, and H. Zhang, Generalized uniformly optimal methods for nonlinear programming, J. Sci. Comput., 79 (2019), pp. 1854-1881. · Zbl 1418.90191
[32] O. Güler, New proximal point algorithms for convex minimization, SIAM J. Optim., 2 (1992), pp. 649-664. · Zbl 0778.90052
[33] B. Hu and L. Lessard, Control interpretations for first-order optimization methods, in Proc. IEEE ACC’17, IEEE, Piscataway, NJ, 2017, pp. 3114-3119.
[34] C. Jin, P. Netrapalli, and M. I. Jordan, Accelerated gradient descent escapes saddle points faster than gradient descent, PMLR Proc. Mach. Learn. Res.. 75 (2018), pp. 1042-1085.
[35] J. A. Kelner, Y. T. Lee, L. Orecchia, and A. Sidford, An almost-linear-time algorithm for approximate max flow in undirected graphs, and its multicommodity generalizations, in Proc. ACM-SIAM SODA’14, SIAM, Philadelphia, 2014, pp. 217-226. · Zbl 1423.05177
[36] D. Kim and J. A. Fessler, Generalizing the optimized gradient method for smooth convex minimization, SIAM J. Optim., 28 (2018), pp. 1920-1950. · Zbl 1400.90245
[37] D. Kim and J. A. Fessler, Optimizing the Efficiency of First-Order Methods for Decreasing the Gradient of Smooth Convex Functions, J. Optim. Theory Appl., 188 (2021), pp. 192-212. · Zbl 1468.90085
[38] W. Krichene, A. Bayen, and P. L. Bartlett, Accelerated mirror descent in continuous and discrete time, in Proc. NIPS’15, Curran Associates, Red Hook, NY, 2015, pp.4656-4662.
[39] R. Laraki and P. Mertikopoulos, Inertial game dynamics and applications to constrained optimization, SIAM J. Control Optim., 53 (2015), pp. 3141-3170. · Zbl 1335.91018
[40] Y. T. Lee, S. Rao, and N. Srivastava, A new approach to computing maximum flows using electrical flows, in Proc. ACM STOC’13, ACM, New York, 2013, pp. 755-764. · Zbl 1293.05148
[41] L. Lessard and P. Seiler, Direct Synthesis of Iterative Algorithms with Bounds on Achievable Worst-Case Convergence Rate, in American Control Conference, IEEE, Piscataway, NJ, 2020, pp. 119-125.
[42] H. Lin, J. Mairal, and Z. Harchaoui, A Universal Catalyst for First-order Optimization, in Proc. NIPS’15, Curran Associates, Red Hook, NY, 2015. · Zbl 1469.68101
[43] C. J. Maddison, D. Paulin, Y. W. Teh, B. O’Donoghue, and A. Doucet, Hamiltonian Descent Methods, preprint, arXiv:1809.05042, 2018.
[44] M. Muehlebach and M. Jordan, A dynamical systems perspective on Nesterov acceleration, in Proc. ICML’19, Curran Associates, Red Hook, NY, 2019, pp. 4656-4662.
[45] A. S. Nemirovski and Y. E. Nesterov, Optimal methods of smooth convex minimization, Zh. Vychisl. Mat. Mat. Fiz., 25 (1985), pp. 356-369. · Zbl 0591.90072
[46] A. Nemirovskii and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization, Wiley, Chichester, England, 1983. · Zbl 0501.90062
[47] Y. Nesterov, A method of solving a convex programming problem with convergence rate \(O(1/k^2)\), Doklady Akad. Nauk SSSR, 269 (1983), pp. 543-547. · Zbl 0535.90071
[48] Y. Nesterov, How to make the gradients small, Optima, 88 (2012), pp. 10-11.
[49] Y. Nesterov, Lectures on Convex Optimization, Springer, Cham, Switzerland, 2018. · Zbl 1427.90003
[50] Y. Nesterov, A. Gasnikov, S. Guminov, and P. Dvurechensky, Primal-dual accelerated gradient descent with line search for convex and nonconvex optimization problems, Dokl. Acad. Nauk, 485 (2019), pp. 15-18. · Zbl 1418.90192
[51] M. O’Neill and S. J. Wright, Behavior of accelerated gradient methods near critical points of nonconvex functions, Math. Program., 176 (2019), pp. 403-427. · Zbl 1415.90092
[52] B. T. Polyak, Some methods of speeding up the convergence of iteration methods, USSR Comput. Math. Math. Phys., 4 (1964), pp. 1-17. · Zbl 0147.35301
[53] D. Scieur, V. Roulet, F. Bach, and A. D’Aspremont, Integration methods and accelerated optimization algorithms, in Proc. NIPS’17, Curran Associates, Red Hook, NY, 2017.
[54] J. Sherman, Area-convexity, \(l_\infty\) regularization, and undirected multicommodity flow, in Proc. ACM STOC’17, ACM, New York, 2017. · Zbl 1370.90050
[55] B. Shi, S. S. Du, M. I. Jordan, and W. J. Su, Understanding the Acceleration Phenomenon via High-Resolution Differential Equations, preprint, arXiv:1810.08907, 2018.
[56] W. Su, S. Boyd, and E. J. Candes, A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights, J. Mach. Learn. Res., 17 (2016), pp. 1-43. · Zbl 1391.90667
[57] P. Tseng, On Accelerated Proximal Gradient Methods for Convex-Concave Optimization, manuscript.
[58] A. Wibisono, A. C. Wilson, and M. I. Jordan, A variational perspective on accelerated methods in optimization, in Proc. Natl. Acad. Sci. USA, 22 (2016), pp. E7351-E7358. · Zbl 1404.90098
[59] A. C. Wilson, B. Recht, and M. I. Jordan, A Lyapunov Analysis of Momentum Methods in Optimization, preprint, arXiv:1611.02635, 2016.
[60] R. Xu, M. Soltanolkotabi, J. P. Haldar, W. Unglaub, J. Zusman, A. F. Levi, and R. M. Leahy, Accelerated Wirtinger Flow: A Fast Algorithm for ptychography, preprint, arXiv:1806.05546, 2018.
[61] J. Zhang, A. Mokhtari, S. Sra, and A. Jadbabaie, Direct Runge-Kutta discretization achieves acceleration, in Proc. NeurIPS’18, Curran Associates, Red Hook, NY, 2018.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.