×

Sharpness, restart, and acceleration. (English) Zbl 1435.90109

Summary: The Łojasiewicz inequality shows that sharpness bounds on the minimum of convex optimization problems hold almost generically. Sharpness directly controls the performance of restart schemes, as observed by A. S. Nemirovskij and Yu. E. Nesterov [U.S.S.R. Comput. Math. Math. Phys. 25, No. 2, 21–30 (1985; Zbl 0591.90072); translation from Zh. Vychisl. Mat. Mat. Fiz. 25, No. 3, 356–369 (1985)]. The constants quantifying these sharpness bounds are of course unobservable, but we show that optimal restart strategies are robust, and searching for the best scheme only increases the complexity by a logarithmic factor compared to the optimal bound. Overall then, restart schemes generically accelerate accelerated methods.

MSC:

90C25 Convex programming

Citations:

Zbl 0591.90072

Software:

NESUN; UCI-ml
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] A. Asuncion and D. Newman, UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/index.php, 2007.
[2] H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality, Math. Oper. Res., 35 (2010), pp. 438-457. · Zbl 1214.65036
[3] A. Auslender and J.-P. Crouzeix, Global regularity theorems, Math. Oper. Res., 13 (1988), pp. 243-253. · Zbl 0655.90059
[4] H. H. Bauschke, J. Bolte, and M. Teboulle, A descent lemma beyond Lipschitz gradient continuity: First-order methods revisited and applications, Math. Oper. Res., 42 (2016), pp. 330-348. · Zbl 1364.90251
[5] A. Beck and M. Teboulle, Smoothing and first order methods: A unified framework, SIAM J. Optim., 22 (2012), pp. 557-580, https://doi.org/10.1137/100818327. · Zbl 1251.90304
[6] E. Bierstone and P. D. Milman, Semianalytic and subanalytic sets, Inst. Hautes Études Sci. Publ. Math., 67 (1988), pp. 5-42. · Zbl 0674.32002
[7] J. Bolte, A. Daniilidis, and A. Lewis, The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems, SIAM J. Optim., 17 (2007), pp. 1205-1223, https://doi.org/10.1137/050644641. · Zbl 1129.26012
[8] J. Bolte, T. P. Nguyen, J. Peypouquet, and B. W. Suter, From error bounds to the complexity of first-order descent methods for convex functions, Math. Program., 165 (2017), pp. 471-507. · Zbl 1373.90076
[9] J. Bolte, S. Sabach, and M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Math. Program., 146 (2014), pp. 459-494. · Zbl 1297.90125
[10] J. Burke and S. Deng, Weak sharp minima revisited. I. Basic theory, Control Cybernet., 31 (2002), pp. 439-469. · Zbl 1105.90356
[11] J. V. Burke and M. C. Ferris, Weak sharp minima in mathematical programming, SIAM J. Control Optim., 31 (1993), pp. 1340-1359, https://doi.org/10.1137/0331063. · Zbl 0791.90040
[12] D. Drusvyatskiy and A. S. Lewis, Error bounds, quadratic growth, and linear convergence of proximal methods, Math. Oper. Res., 43 (2018), pp. 919-948. · Zbl 1440.90046
[13] O. Fercoq and Z. Qu, Restarting Accelerated Gradient Methods with a Rough Strong Convexity Estimate, preprint, https://arxiv.org/abs/1609.07358, 2016. · Zbl 1432.90109
[14] O. Fercoq and Z. Qu, Adaptive restart of accelerated gradient methods under local quadratic growth condition, IMA J. Numer. Anal., 39 (2019), pp. 2069-2095. · Zbl 07130820
[15] P. Frankel, G. Garrigos, and J. Peypouquet, Splitting methods with variable metric for Kurdyka-Łojasiewicz functions and general convergence rates, J. Optim. Theory Appl., 165 (2015), pp. 874-900. · Zbl 1316.49039
[16] R. M. Freund and H. Lu, New computational guarantees for solving convex optimization problems with first order methods, via a function growth condition measure, Math. Program., 170 (2018), pp. 445-477. · Zbl 1403.90549
[17] A. Gilpin, J. Pena, and T. Sandholm, First-order algorithm with \(\mathcal{O}(\log 1/\epsilon)\) convergence for \(\epsilon \)-equilibrium in two-person zero-sum games, Math. Program., 133 (2012), pp. 279-298. · Zbl 1243.91004
[18] P. Giselsson and S. Boyd, Monotonicity and restart in fast gradient methods, in Proceedings of the 53rd IEEE Conference on Decision and Control, IEEE, Washington, DC, 2014, pp. 5058-5063.
[19] A. J. Hoffman, On approximate solutions of systems of linear inequalities, J. Research Nat. Bur. Standards, 49 (1952), pp. 263-265.
[20] A. Iouditski and Y. Nesterov. Primal-Dual Subgradient Methods for Minimizing Uniformly Convex Functions, preprint, https://arxiv.org/abs/1401.1792, 2014.
[21] H. Karimi, J. Nutini, and M. Schmidt, Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition, in Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, New York, 2016, pp. 795-811.
[22] T. Kerdreux, A. d’Aspremont, and S. Pokutta, Restarting Frank-Wolfe, in Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, Okinawa, Japan, 2019, pp. 1275-1283.
[23] Q. Lin and L. Xiao, An adaptive accelerated proximal gradient method and its homotopy continuation for sparse optimization, in Proceedings of the 31st International Conference on Machine Learning, Vol. 32, Beijing, China, 2014, pp. 73-81.
[24] M. Liu and T. Yang, Adaptive accelerated gradient converging method under Hölderian error bound condition, in Advances in Neural Information Processing Systems 2017, Long Beach, CA, 2017, pp. 3104-3114.
[25] S. Łojasiewicz, Une propriété topologique des sous-ensembles analytiques réels, in Les Équations aux Dérivées Partielles (Paris, 1962), Éditions du Centre National de la Recherche Scientifique, Paris, France, 1963, pp. 87-89. · Zbl 0234.57007
[26] S. Łojasiewicz, Sur la géométrie semi- et sous-analytique, Ann. Inst. Fourier (Grenoble), 43 (1993), pp. 1575-1595. · Zbl 0803.32002
[27] H. Lu, R. M. Freund, and Y. Nesterov, Relatively smooth convex optimization by first-order methods, and applications, SIAM J. Optim., 28 (2018), pp. 333-354, https://doi.org/10.1137/16M1099546. · Zbl 1392.90090
[28] O. L. Mangasarian, A condition number for differentiable convex inequalities, Math. Oper. Res., 10 (1985), pp. 175-179. · Zbl 0565.90059
[29] A. Nemirovskii and Y. Nesterov, Optimal methods of smooth convex minimization, USSR Comput. Math. Math. Phys., 25 (1985), pp. 21-30. · Zbl 0591.90072
[30] Y. Nesterov, A method of solving a convex programming problem with convergence rate \({O}(1/k^2)\), Soviet Math. Dokl., 27 (1983), pp. 372-376. · Zbl 0535.90071
[31] Y. Nesterov, Smooth minimization of non-smooth functions, Math. Program., 103 (2005), pp. 127-152. · Zbl 1079.90102
[32] Y. Nesterov, Gradient methods for minimizing composite functions, Math. Program., 140 (2013), pp. 125-161. · Zbl 1287.90067
[33] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Appl. Optim. 87, Springer, New York, 2004. · Zbl 1086.90045
[34] Y. Nesterov, Universal gradient methods for convex optimization problems, Math. Program., 152 (2015), pp. 381-404. · Zbl 1327.90216
[35] B. O’Donoghue and E. Candes, Adaptive restart for accelerated gradient schemes, Found. Comput. Math., 15 (2015), pp. 715-732. · Zbl 1320.90061
[36] K. Pillutla, V. Roulet, S. M. Kakade, and Z. Harchaoui, A smoother way to train structured prediction models, in Advances in Neural Information Processing Systems 32, Montreal, Canada, 2018, pp. 4766-4778.
[37] B. T. Polyak, Gradient methods for minimizing functionals, Ž. Vyčisl. Mat. i Mat. Fiz., 3 (1963), pp. 643-653 (in Russian). · Zbl 0196.47701
[38] J. Renegar, Efficient First-Order Methods for Linear Programming and Semidefinite Programming, preprint, https://arxiv.org/abs/1409.5832, 2014.
[39] J. Renegar and B. Grimmer, A Simple Nearly-Optimal Restart Scheme for Speeding-Up First Order Methods, preprint, https://arxiv.org/abs/1803.00151, 2018. · Zbl 1516.90054
[40] S. M. Robinson, An application of error bounds for convex programming in a linear space, SIAM J. Control, 13 (1975), pp. 271-273, https://doi.org/10.1137/0313015. · Zbl 0297.90072
[41] V. Roulet, N. Boumal, and A. d’Aspremont, Computational complexity versus statistical performance on sparse recovery problems, Inf. Inference, to appear, https://doi.org/10.1093/imaiai/iay020. · Zbl 1470.94056
[42] W. Su, S. Boyd, and E. Candes, A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights, in Advances in Neural Information Processing Systems 27, Montreal, Canada, 2014, pp. 2510-2518.
[43] P. Tseng, On Accelerated Proximal Gradient Methods for Convex-Concave Optimization, Technical report, University of Washington, Seattle, WA, 2008.
[44] Z. Zhou, Q. Zhang, and A. M.-C. So, \( \ell_{1,p}\)-norm regularization: Error bounds and convergence rate analysis of first-order methods, in Proceedings of the 32nd International Conference on Machine Learning, Vol. 37, Lille, France, 2015, pp. 1501-1510.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.