×

A secant-based Nesterov method for convex functions. (English) Zbl 1373.90100

Summary: A simple secant-based fast gradient method is developed for problems whose objective function is convex and well-defined. The proposed algorithm extends the classical Nesterov gradient method by updating the estimate-sequence parameter with secant information whenever possible. This is achieved by imposing a secant condition on the choice of search point. Furthermore, the proposed algorithm embodies an “update rule with reset” that parallels the restart rule recently suggested in [B. O’Donoghue and E. Candès, Found. Comput. Math. 15, No. 3, 715–732 (2015; Zbl 1320.90061)]. The proposed algorithm applies to a large class of problems including logistic and least-square losses commonly found in the machine learning literature. Numerical results demonstrating the efficiency of the proposed algorithm are analyzed with the aid of performance profiles.

MSC:

90C25 Convex programming

Citations:

Zbl 1320.90061
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Amini, K., Ahookhosh, M., Nosratipour, H.: An inexact line search approach using modified nonmonotone strategy for unconstrained optimization. Numer. Algorithms 66(1), 49-78 (2013) · Zbl 1293.65087
[2] Barzilai, J., Borwein, J.: Two-point step size gradient methods. IMA J. Numer. Anal. 8, 141-148 (1988) · Zbl 0638.65055
[3] Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183-202 (2009) · Zbl 1175.94009
[4] Becker, S.R., Candes, E.J., Grant, M.C.: Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Comput. 3(3), 165-218 (2011) · Zbl 1257.90042
[5] Bhaya, A., Kaszkurewicz, E.: Steepest descent with momentum for quadratic functions is a version of the conjugate gradient method. Neural Netw. 17(1), 65-71 (2004) · Zbl 1082.68093
[6] Birgin, E.G., Martinez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim. 10(4), 1196-1211 (2000) · Zbl 1047.90077
[7] Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2009) · Zbl 1058.90049
[8] Collins, M., Schapire, R.E., Singer, Y.: Logistic regression. Adaboost and Bregman distances. Mach. Learn. 48, 253-285 (2002) · Zbl 0998.68123
[9] Dolan, E.D., More, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201-213 (2002) · Zbl 1049.90004
[10] Fletcher, R.; Qi, L. (ed.); Teo, K. (ed.); Yang, X. (ed.), On the Barzilai-Borwein method, No. 96, 235-256 (2005), US · Zbl 1118.90318
[11] Goldstein, T., O’Donoghue, B., Setzer, S.: Fast alternating direction optimization methods. Technical Report, UCLA (May 2012 (Revised January 2014)) · Zbl 1314.49019
[12] Gonzaga, C.C., Karas, E.W.: Fine tuning Nesterov’s steepest descent algorithm for differentiable convex programming. Math. Program. Ser. A 138, 141-166 (2013) · Zbl 1297.90118
[13] Gu, M., Lim, L.H., Wu, C.: ParNes: a rapidly convergent algorithm for accurate recovery of sparse and approximately sparse signals. Numer. Algorithms 64(2), 321-347 (2013) · Zbl 1284.65055
[14] He, R., Tan, T., Wang, L.: Robust recovery of corrupted low-rank matrix by implicit regularizers. IEEE Trans. Pattern Anal. Mach. Intell. 36(4), 770-783 (2014)
[15] Hu, S.L., Huang, Z.H., Lu, N.: A nonmonotone line search algorithm for unconstrained optimization. J. Sci. Comput. 42, 38-53 (2010) · Zbl 1203.90146
[16] Kozma, A., Conte, C., Diehl, M.: Benchmarking large-scale distributed convex quadratic programming algorithms. Optim. Methods Softw. 30(1), 191-214 (2015) · Zbl 1321.49046
[17] Kozma, A., Frasch, J.V., Diehl, M.: A distributed method for convex quadratic programming problems arising in optimal control of distributed systems. In: Proceedings of the 52nd IEEE Conference on Decision and Control, Florence, Italy (December 2013) · Zbl 0998.68123
[18] Lan, G., Monteiro, R.D.: Iteration-complexity of first-order penalty methods for convex programming. Math. Program. 138(1-2), 115-139 (2013) · Zbl 1282.90129
[19] Lin, Q., Xiao, L.: An adaptive accelerated proximal gradient method and its homotopy continuation for sparse optimization. In: Proceedings of The 31st International Conference on Machine Learning, Beijing, China (2014) · Zbl 1341.90102
[20] Maros, I., Meszaros, C.: A repository of convex quadratic programming problems. Optim. Methods Softw. 11(1-4), 671-681 (1999) · Zbl 0973.90520
[21] Meng, X., Chen, H.: Accelerating Nesterov’s method for strongly convex functions with Lipschitz gradient. Math. Optim. Control 90C25, 1-13 (2011). arXiv:1109.6058v1 · Zbl 0973.90520
[22] Nemirovski, A.S.: Efficient methods in convex programming, Lecture Notes, Technion-Israel Institute of Technology (1994) · Zbl 0820.68058
[23] Nesterov, Y.: A method of solving a convex programming problem with convergence rate of \[(1/k^21\]/k2). Sov. Math. Doklady 27(2), 372-376 (1983) · Zbl 0535.90071
[24] Nesterov, Y.: Introductory Lectures on Convex Programming: A Basic Course. Kluwer Academic Publishers, Dordrecht (2004) · Zbl 1086.90045
[25] Nesterov, Y.: Gradient methods for minimizing composite objective function. CORE discussion paper (2007)
[26] Nicolas, L.R., Mark, S., Francis, B.: A stochastic gradient method with an exponential convergence rate for finite training sets. Math. Optim. Control 1-34 (2013). arXiv:1202.6258v4
[27] Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (2006) · Zbl 1104.65059
[28] O’Donoghue, B., Candes, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. (2013) · Zbl 1320.90061
[29] Patrinos, P., Bemporad, A.: An accelerated dual gradient-projection algorithm for linear model predictive control. In: Proceedings of the 51st IEEE Conference on Decision and Control. Maui, US (December 2012) · Zbl 1360.93400
[30] Pedregosa, F.: Numerical optimizers for logistic regression (2013). http://fa.bianp.net/blog/2013/numerical-optimizers-for-logistic-regression/#fn:2
[31] Polyak, B.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1-17 (1964) · Zbl 0147.35301
[32] Richter, S., Jones, C.N., Morari, M.: Real-time input-constrained MPC using fast gradient methods. In: Proceedings of the 48th IEEE Conference on Decision and Control and 28th Chinese Control Conference, Shanghai, China (December 2009)
[33] Shah, B., Buehler, R., Kempthorne, O.: Some algorithms for minimizing a function of several variables. J. Soc. Ind. Appl. Math. 12(1), 74-92 (1964) · Zbl 0127.33902
[34] Telgarsky, M.: Steepest descent analysis for unregularized linear prediction with strictly convex penalties. In: Proceedings of the 4th International Workshop on Optimization for Machine Learning (OPT), held as a part of the NIPS workshops series (December 2011)
[35] Torii, M., Hagan, M.T.: Stability of steepest descent with momentum for quadratic functions. IEEE Trans. Neural Netw. 13(3), 752-756 (2002)
[36] Vogl, T.P., Mangis, J., Rigler, A., Zink, W., Alkon, D.: Accelerating the convergence of the back-propagation method. Biol. Cybern. 59(4-5), 257-263 (1988)
[37] Worthington, P.L., Hancock, E.R.: Surface topography using shape-from-shading. Pattern Recognit. 34(4), 823-840 (2001) · Zbl 0970.68764
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.