×

Analysis of optimization algorithms via sum-of-squares. (English) Zbl 1475.90055

Summary: We introduce a new framework for unifying and systematizing the performance analysis of first-order black-box optimization algorithms for unconstrained convex minimization. The low-cost iteration complexity enjoyed by first-order algorithms renders them particularly relevant for applications in machine learning and large-scale data analysis. Relying on sum-of-squares (SOS) optimization, we introduce a hierarchy of semidefinite programs that give increasingly better convergence bounds for higher levels of the hierarchy. Alluding to the power of the SOS hierarchy, we show that the (dual of the) first level corresponds to the performance estimation problem (PEP) introduced by Y. Drori and M. Teboulle [Math. Program. 145, No. 1–2 (A), 451–482 (2014; Zbl 1300.90068)], a powerful framework for determining convergence rates of first-order optimization algorithms. Consequently, many results obtained within the PEP framework can be reinterpreted as degree-1 SOS proofs, and thus, the SOS framework provides a promising new approach for certifying improved rates of convergence by means of higher-order SOS certificates. To determine analytical rate bounds, in this work, we use the first level of the SOS hierarchy and derive new results for noisy gradient descent with inexact line search methods (Armijo, Wolfe, and Goldstein).

MSC:

90C22 Semidefinite programming

Citations:

Zbl 1300.90068

Software:

CVX; Mathematica; SDPT3; PESTO
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Ahmadi, A.A.: Sum of squares (SOS) techniques: an introduction. http://www.princeton.edu/ amirali/Public/Teaching/ORF523/S16/ORF523_S16_Lec15.pdf. Accessed 2020
[2] Bertsekas, DP, Nonlinear Programming (1999), Nashua: Athena Scientific, Nashua · Zbl 1015.90077
[3] Blekherman, G., Parrilo, P.A., Thomas, R.R.: Semidefinite Optimization and Convex Algebraic Geometry, vol. 13. MOS-SIAM Series on Optimization (2012) · Zbl 1260.90006
[4] Boyd, SP; Vandenberghe, L., Convex Optimization (2009), Cambridge: Cambridge University Press, Cambridge · Zbl 1058.90049
[5] Cohen, AI, Rate of convergence of several conjugate gradient algorithms, SIAM J. Numer. Anal., 9, 2, 248-259 (1972) · Zbl 0279.65051 · doi:10.1137/0709024
[6] Dai, Y.-H.: Nonlinear conjugate gradient methods. Wiley Encyclopedia of Operations Research and Management Science (2011)
[7] de Klerk, E.; Glineur, F.; Taylor, AB, On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions, Optim. Lett., 11, 7, 1185-1199 (2017) · Zbl 1381.90067 · doi:10.1007/s11590-016-1087-4
[8] de Klerk, E., Glineur, F., Taylor, A.B.: Worst-case convergence analysis of gradient and Newton methods through semidefinite programming performance estimation. Technical Report, https://arxiv.org/abs/1709.05191 (2017). Accessed 2017 · Zbl 1448.90070
[9] Drori, Y.; Taylor, AB, Efficient first-order methods for convex minimization: a constructive approach, Math. Program., 184, 183-220 (2019) · Zbl 1451.90118 · doi:10.1007/s10107-019-01410-2
[10] Drori, Y.; Teboulle, M., Performance of first-order methods for smooth convex minimization: a novel approach, Math. Program., 145, 1, 451-482 (2014) · Zbl 1300.90068 · doi:10.1007/s10107-013-0653-0
[11] Drori, Y.; Teboulle, M., An optimal variant of Kelley’s cutting-plane method, Math. Program., 160, 1, 321-351 (2016) · Zbl 1349.90880 · doi:10.1007/s10107-016-0985-7
[12] Fazlyab, M., Morari, M., Preciado, V.M.: Design of first-order optimization algorithms via sum-of-squares programming. In: The 18th IEEE Conference on Decision and Control, p. 4445-4452 (2018)
[13] Fazlyab, M.; Ribeiro, A.; Morari, M.; Preciado, VM, Analysis of optimization algorithms via integral quadratic constraints: nonstrongly convex problems, SIAM J. Optim., 28, 3, 2654-2689 (2018) · Zbl 1406.90089 · doi:10.1137/17M1136845
[14] Grant, M.; Boydm, S.; Blondel, V.; Boyd, S.; Kimura, H., Graph implementations for nonsmooth convex programs, Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences, 95-110 (2008), Cham: Springer, Cham · Zbl 1205.90223
[15] Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx (2014). Accessed 2020
[16] Gu, G., Yang, J.: Optimal nonergodic sublinear convergence rate of proximal point algorithm for maximal monotone inclusion problems. Technical Report, https://arxiv.org/abs/1904.05495 (2019)
[17] Hu, B., Seiler, P., Lessard, L.: Analysis of approximate stochastic gradient using quadratic constraints and sequential semidefinite programs. Technical Report, https://arxiv.org/pdf/1711.00987.pdf (2017). Accessed 2020 · Zbl 1465.90052
[18] Kim, D.: Accelerated proximal point method and forward method for monotone inclusions. Technical Report, https://arxiv.org/abs/1905.05149 (2019)
[19] Kim, D.; Fessler, JA, Optimized first-order methods for smooth convex minimization, Math. Program., 159, 1, 81-107 (2016) · Zbl 1345.90113 · doi:10.1007/s10107-015-0949-3
[20] Kim, D., Fessler, J.A.: Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions. Technical Report, https://arxiv.org/abs/1803.06600 (2018) · Zbl 1468.90085
[21] Lasserre, J., A sum of squares approximation of nonnegative polynomials, SIAM Rev., 49, 4, 651-669 (2007) · Zbl 1129.12004 · doi:10.1137/070693709
[22] Laurent, M., Sums of Squares, Moment Matrices and Optimization Over Polynomials, 157-270 (2009), New York: Springer, New York · Zbl 1163.13021
[23] Lessard, L.; Recht, B.; Packard, A., Analysis and design of optimization algorithms via integral quadratic constraints, SIAM J. Optim., 26, 1, 57-95 (2016) · Zbl 1329.90103 · doi:10.1137/15M1009597
[24] Lieder, F.: On the convergence rate of the Halpern-iteration. Technical Report, http://www.optimization-online.org/DB_FILE/2017/11/6336.pdf (2019). Accessed 2020 · Zbl 1466.90067
[25] Luenberger, DG; Ye, Y., Linear and Nonlinear Programming (2016), Cham: Springer International Publishing, Cham · Zbl 1319.90001 · doi:10.1007/978-3-319-18842-3
[26] Nemirovski, A.: Optimization II: Numerical methods for nonlinear continuous optimization. https://www2.isye.gatech.edu/ nemirovs/Lect_OptII.pdf. Accessed 2020
[27] Nishihara, R., Lessard, L., Recht, B., Packard, A., Jordan, M.I.: A general analysis of the convergence of ADMM. In: Proceedings of the 32nd International Conference on Machine Learning, vol. 37 (2015)
[28] Nocedal, J.; Wright, SJ, Numerical Optimization (2006), New York: Springer, New York · Zbl 1104.65059
[29] Parrilo, PA, Semidefinite programming relaxations for semialgebraic problems, Math. Program., 96, 2, 293-320 (2003) · Zbl 1043.14018 · doi:10.1007/s10107-003-0387-5
[30] Parrilo, PA, Polynomial Optimization, Sums of Squares, and Applications (2013), Philadelphia: Society for Industrial and Applied Mathematics, Philadelphia · Zbl 1360.90194
[31] Putinar, M., Positive polynomials on compact semi-algebraic sets, Indiana Univ. Math. J., 42, 3, 969-984 (1993) · Zbl 0796.12002 · doi:10.1512/iumj.1993.42.42045
[32] Ryu, E.K., Taylor, A.B., Bergeling, C., Giselsson, P.: Operator splitting performance estimation: tight contraction factors and optimal parameter selection. Technical Report, https://arxiv.org/abs/1812.00146 (2018) · Zbl 1511.90322
[33] Tan, S.S.Y.: Performance analysis of optimization algorithms using semidefinite programming. Master’s thesis, Department of Electrical and Computer Engineering, National University of Singapore, https://scholarbank.nus.edu.sg/handle/10635/170801 (2020). Accessed 2020
[34] Tan, S.S.Y., Varvitsiotis, A., Tan, V.Y.F.: A unified framework for the convergence analysis of optimization algorithms via sums-of-squares. In: The Signal Processing with Adaptive Sparse Structured Representations (SPARS) Workshop (2019)
[35] Taylor, A., Scoy, B.V., Lessard, L.: Lyapunov functions for first-order methods: tight automated convergence guarantees. In: 35th International Conference on Machine Learning (2018)
[36] Taylor, AB; Bach, F., Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions, Proc. Mach. Learn. Res., 99, 1-58 (2019)
[37] Taylor, AB; Hendrickx, J.; Glineur, F., Exact worst-case performance of first-order methods for composite convex optimization, SIAM J. Optim., 27, 3, 1283-1313 (2017) · Zbl 1371.90108 · doi:10.1137/16M108104X
[38] Taylor, AB; Hendrickx, JM; Glineur, F., Smooth strongly convex interpolation and exact worst-case performance of first-order methods, Math. Program., 161, 1, 307-345 (2017) · Zbl 1359.90098 · doi:10.1007/s10107-016-1009-3
[39] Taylor, AB; Hendrickx, JM; Glineur, F., Exact worst-case convergence rates of the proximal gradient method for composite convex minimization, J. Optim. Theory Appl., 178, 2, 455-476 (2018) · Zbl 1394.90464 · doi:10.1007/s10957-018-1298-1
[40] Toh, KC; Todd, MJ; Tütüncü, RH, SDPT3—a Matlab software package for semidefinite programming, Version 1.3, Optim. Methods Softw., 11, 1-4, 545-581 (1999) · Zbl 0997.90060 · doi:10.1080/10556789908805762
[41] Tütüncü, RH; Toh, KC; Todd, MJ, Solving semidefinite-quadratic-linear programs using SDPT3, Math. Program., 95, 2, 189-217 (2003) · Zbl 1030.90082 · doi:10.1007/s10107-002-0347-5
[42] Wolfram Research, Inc. Mathematica, Version 12.0. Champaign, IL (2019)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.