Adaptive restart for accelerated gradient schemes. (English) Zbl 1320.90061

Summary: In this paper we introduce a simple heuristic adaptive restart technique that can dramatically improve the convergence rate of accelerated gradient schemes. The analysis of the technique relies on the observation that these schemes exhibit two modes of behavior depending on how much momentum is applied at each iteration. In what we refer to as the ‘high momentum’ regime the iterates generated by an accelerated gradient scheme exhibit a periodic behavior, where the period is proportional to the square root of the local condition number of the objective function. Separately, it is known that the optimal restart interval is proportional to this same quantity. This suggests a restart technique whereby we reset the momentum whenever we observe periodic behavior. We provide a heuristic analysis that suggests that in many cases adaptively restarting allows us to recover the optimal rate of convergence with no prior knowledge of function parameters.


90C25 Convex programming
90C06 Large-scale problems in mathematical programming


Full Text: DOI arXiv


[1] A. Auslender, M. Teboulle, Interior gradient and proximal methods for convex and conic optimization, SIAM J. Optim.16(3), 697-725 (2006). · Zbl 1113.90118
[2] A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci.2, 183-202 (2009). · Zbl 1175.94009
[3] S. Becker, E. Candès, M. Grant, Templates for convex cone problems with applications to sparse signal recovery, Math. Program. Comput.3(3), 165-218 (2011). · Zbl 1257.90042
[4] S. Boyd, L. Vandenberghe, Convex Optimization (Cambridge University Press, Cambridge, 2004). · Zbl 1058.90049
[5] E. Candès, J. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Commun. Pure Appl. Math.59(8), 1207-1223 (2006). · Zbl 1098.94009
[6] E. Candès, M. Wakin, An introduction to compressive sampling, IEEE Signal Process. Mag.25(2), 21-30 (2008).
[7] A. Chambolle, R. De Vore, N. Lee, B. Lucier, Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage, IEEE Trans. Image Process.7(3), 319-335 (1998). · Zbl 0993.94507
[8] A. Chiang, Fundamental Methods of Mathematical Economics (McGraw-Hill, New York, 1984).
[9] I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Commun. Pure Appl. Math.57(11), 1413-1457 (2004). · Zbl 1077.65055
[10] D. Donoho, Compressed sensing, IEEE Trans. Inf. Theory52(4), 1289-1306 (2006). · Zbl 1288.94016
[11] M. Gu, L. Lim, C. Wu, PARNES: A rapidly convergent algorithm for accurate recovery of sparse and approximately sparse signals. Technical report (2009). arXiv:0911.0492. · Zbl 1284.65055
[12] M. Hestenes, E. Stiefel, Methods of conjugate gradients for solving linear systems, J. Res. Natl. Bur. Stand.49(6), 409-436 (1952). · Zbl 0048.09901
[13] G. Lan, R. Monteiro, Iteration complexity of first-order penalty methods for convex programming. Manuscript, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, June 2008
[14] G. Lan, Z. Lu, R. Monteiro, Primal-dual first-order methods with o(1/ϵ) iteration-complexity for cone programming, Math. Program. 1-29 (2009). · Zbl 1208.90113
[15] Liu, J.; Yuan, L.; Ye, J., An efficient algorithm for a class of fused lasso problems, July
[16] A. Nemirovski, Efficient methods in convex programming. Lecture notes (1994). http://www2.isye.gatech.edu/ nemirovs/Lect_EMCO.pdf. · Zbl 0820.68058
[17] A. Nemirovski, D. Yudin, Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience Series in Discrete Mathematics (Wiley, New York, 1983).
[18] Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1/k2), Sov. Math. Dokl.27(2), 372-376 (1983). · Zbl 0535.90071
[19] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course (Kluwer Academic, Dordrecht, 2004). · Zbl 1086.90045
[20] Y. Nesterov, Gradient methods for minimizing composite objective function. CORE discussion paper (2007). http://www.ecore.be/DPs/dp_1191313936.pdf. · Zbl 1113.90118
[21] J. Nocedal, S. Wright, Numerical Optimization. Springer Series in Operations Research (Springer, Berlin, 2000).
[22] B. Polyak, Introduction to Optimization. Translations Series in Mathematics and Engineering (Optimization Software, Publications Division, New York, 1987). · Zbl 0708.90083
[23] R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. B58(1), 267-288 (1994). · Zbl 0850.62538
[24] P. Tseng, On accelerated proximal gradient methods for convex-concave optimization (2008). http://pages.cs.wisc.edu/ brecht/cs726docs/Tseng.APG.pdf. · Zbl 0850.62538
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.