×

A Lyapunov analysis of accelerated methods in optimization. (English) Zbl 07370630

Summary: Accelerated optimization methods, such as Nesterov’s accelerated gradient method, play a significant role in optimization. Several accelerated methods are provably optimal under standard oracle models. Such optimality results are obtained using a technique known as “estimate sequences,” which yields upper bounds on convergence properties. The technique of estimate sequences has long been considered difficult to understand and deploy, leading many researchers to generate alternative, more intuitive methods and analyses. We show there is an equivalence between the technique of estimate sequences and a family of Lyapunov functions in both continuous and discrete time. This connection allows us to develop a unified analysis of many existing accelerated algorithms, introduce new algorithms, and strengthen the connection between accelerated algorithms and continuous-time dynamical systems.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

Software:

PESTO
PDFBibTeX XMLCite
Full Text: Link

References:

[1] Felipe Alvarez, H´edy Attouch, J´erˆome Bolte, and Patrick Redont. A second-order gradientlike dissipative dynamical system with Hessian-driven damping: Application to optimization and mechanics.Journal de Math´ematiques Pures et Appliqu´ees, 81(8):747-779, 2002. · Zbl 1036.34072
[2] Felipe Alvarez, J´erˆome Bolte, and Olivier Brahic. Hessian Riemannian gradient flows in convex programming.SIAM Journal on Control and Optimization, 43(2):477-501, 2004. · Zbl 1077.34050
[3] H´edy Attouch. Viscosity solutions of minimization problems.SIAM Journal on Optimization, 6(3):769-806, 1996. · Zbl 0859.65065
[4] H´edy Attouch and Juan Peypouquet. The rate of convergence of Nesterov’s accelerated forward-backward method is actuallyo(k−2).ArXiv e-prints arXiv:1510.08740v2, November 2015.
[5] Michel Baes. Estimate sequence methods: Extensions and approximations. Manuscript available athttp://www.optimization-online.org/DB_FILE/2009/08/2372.pdf, August 2009.
[6] Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1):183-202, March 2009. ISSN 1936-4954. · Zbl 1175.94009
[7] Michael Betancourt, Michael I. Jordan, and Ashia Wilson. On symplectic optimization. Arxiv preprint arXiv1802.03653, March 2018.
[8] S´ebastien Bubeck, Yin Tat Lee, and Mohit Singh. A geometric alternative to Nesterov’s accelerated gradient descent.ArXiv preprint arXiv:1506.08187, 2015.
[9] John C. Butcher. Numerical methods for ordinary differential equations in the 20th century. Journal of Computational and Applied Mathematics, 125(1-2):1-29, 2000. · Zbl 0969.65063
[10] Pafnuty L. Chebyshev. Th´eorie des m´ecanismes connus sous le nom de parall´elogrammes. M´emoires Pr´esent´es ‘a l’Acad´emie Imp´eriale des Sciences de St-P´etersbourg, VII(539-568), 1854.
[11] Yoel Drori and Marc Teboulle.Performance of first-order methods for smooth convex minimization: A novel approach.Mathematical Programming, 145(1-2):451-482, 2014. · Zbl 1300.90068
[12] Dmitry Drusvyatskiy, Maryam Fazel, and Scott Roy. An optimal first order method based on optimal quadratic averaging.ArXiv preprint arXiv:1604.06543, 2016.
[13] Donghwan Kim and Jeffrey A. Fessler. Optimized first-order methods for smooth convex minimization.Mathematical Programming, 159(1):81-107, 2016. ISSN 1436-4646. · Zbl 1345.90113
[14] Walid Krichene, Alexandre Bayen, and Peter Bartlett. Accelerated mirror descent in continuous and discrete time. InAdvances in Neural Information Processing Systems (NIPS) 29, 2015.
[15] Laurent Lessard, Benjamin Recht, and Andrew Packard. Analysis and design of optimization algorithms via integral quadratic constraints.SIAM Journal on Optimization, 26(1): 57-95, 2016. · Zbl 1329.90103
[16] Alexander M. Lyapunov. General problem of the stability of motion.International Journal of Control, 55:531-773, 1992. · Zbl 0786.70001
[17] Michael Muehlebach and Michael I. Jordan. Optimization with momentum: Dynamical, control-theoretic, and symplectic perspectives.Journal of Machine Learning Research, 22:1-50, 2021. · Zbl 07370590
[18] Yurii Nesterov. A method of solving a convex programming problem with convergence rate O(1/k2).Soviet Mathematics Doklady, 27(2):372-376, 1983. · Zbl 0535.90071
[19] Yurii Nesterov.Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization. Kluwer, Boston, 2004. · Zbl 1086.90045
[20] Yurii Nesterov. Smooth minimization of non-smooth functions.Mathematical Programming, 103(1):127-152, 2005. · Zbl 1079.90102
[21] Yurii Nesterov. Accelerating the cubic regularization of Newton’s method on convex problems.Mathematical Programming, 112(1):159-181, 2008. ISSN 0025-5610. · Zbl 1167.90013
[22] Yurii Nesterov. Gradient methods for minimizing composite functions.Mathematical Programming, 140(1):125-161, Aug 2013. · Zbl 1287.90067
[23] Yurii Nesterov. Universal gradient methods for convex optimization problems.Mathematical Programming, pages 1-24, 2014. ISSN 0025-5610.
[24] Yurii Nesterov. Complexity bounds for primal-dual methods minimizing the model of objective function. Technical report, Universit´e Catholique de Louvain, Center for Operations Research and Econometrics (CORE), 2015.
[25] Yurii Nesterov and Vladimir Shikhman. Quasi-monotone subgradient methods for nonsmooth convex minimization.Journal of Optimization Theory and Applications, 165(3): 917-940, 2015. · Zbl 1330.90078
[26] Brendan O’Donoghue and Emmanuel Cand‘es. Adaptive restart for accelerated gradient schemes.Foundations of Computational Mathematics, 15(3):715-732, 2015. · Zbl 1320.90061
[27] Boris T. Polyak. Some methods of speeding up the convergence of iteration methods.USSR Computational Mathematics and Mathematical Physics, 4(5):1-17, 1964. · Zbl 0147.35301
[28] Bastian E. Rapp. Numerical methods for solving differential equations. In Bastian E. Rapp, editor,Microfluidics: Modelling, Mechanics and Mathematics, Micro and Nano Technologies, pages 549 - 593. Elsevier, Oxford, 2017.
[29] David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning representations by back-propagating errors.Nature, 323(6088):533-536, 1986. · Zbl 1369.68284
[30] Weijie Su, Stephen Boyd, and Emmanuel J. Cand‘es. A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights.Journal of Machine Learning Research, 17:1-43, 2016. · Zbl 1391.90667
[31] Adrien B. Taylor, Julien M. Hendrickx, and Fran¸cois Glineur. Smooth strongly convex interpolation and exact worst-case performance of first-order methods.Mathematical Programming, pages 1-39, 2016.
[32] Paul Tseng. On accelerated proximal gradient methods for convex-concave optimization. SIAM Journal on Optimization, 12:724-739, 2008.
[33] Stephen Tu, Shivaram Venkataraman, Ashia C. Wilson, Alex Gittens, Michael I. Jordan, and Benjamin Recht. Breaking locality accelerates block Gauss-Seidel. InProceedings of the 34th International Conference on Machine Learning, ICML 2017, pages 1549-1557, 2017.
[34] Andre Wibisono, Ashia C. Wilson, and Michael I. Jordan. A variational perspective on accelerated methods in optimization.Proceedings of the National Academy of Sciences, 133:E7351-E7358, 2016. · Zbl 1404.90098
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.