zbMATH — the first resource for mathematics

A family of inexact SQA methods for non-smooth convex minimization with provable convergence guarantees based on the Luo-Tseng error bound property. (English) Zbl 1412.49061
Summary: We propose a new family of inexact Sequential Quadratic Approximation (SQA) methods, which we call the Inexact Regularized Proximal Newton (IRPN) method, for minimizing the sum of two closed proper convex functions, one of which is smooth and the other is possibly non-smooth. Our proposed method features strong convergence guarantees even when applied to problems with degenerate solutions while allowing the inner minimization to be solved inexactly. Specifically, we prove that when the problem possesses the so-called Luo-Tseng Error Bound (EB) property, IRPN converges globally to an optimal solution, and the local convergence rate of the sequence of iterates generated by IRPN is linear, superlinear, or even quadratic, depending on the choice of parameters of the algorithm. Prior to this work, such EB property has been extensively used to establish the linear convergence of various first-order methods. However, to the best of our knowledge, this work is the first to use the Luo-Tseng EB property to establish the superlinear convergence of SQA-type methods for non-smooth convex minimization. As a consequence of our result, IRPN is capable of solving regularized regression or classification problems under the high-dimensional setting with provable convergence guarantees. We compare our proposed IRPN with several empirically efficient algorithms by applying them to the \(\ell _1\)-regularized logistic regression problem. Experiment results show the competitiveness of our proposed method.

49M15 Newton-type methods
65K10 Numerical optimization and variational techniques
90C55 Methods of successive quadratic programming type
Full Text: DOI arXiv
[1] Beck, A.; Teboulle, M., A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2, 183-202, (2009) · Zbl 1175.94009
[2] Becker, S., Fadili, J.: A quasi-Newton proximal splitting method. In Pereira, F.C.N., Burges, C.J.C., Bottou, L., Weinberger, K. Q. (eds), Advances in Neural Information Processing Systems 25: Proceedings of the 2012 Conference, pp. 2618-2626 (2012)
[3] Bhatia, R.: Matrix Analysis, Volume 169 of Graduate. Springer, New York (1997)
[4] Byrd, RH; Nocedal, J.; Oztoprak, F., An inexact successive quadratic approximation method for L-1 regularized optimization, Math. Program. Ser. B, 157, 375-396, (2016) · Zbl 1342.49037
[5] Combettes, PL; Wajs, VR, Signal recovery by proximal forward-backward splitting, Multiscale Model. Simul., 4, 1168-1200, (2005) · Zbl 1179.94031
[6] Dontchev, A.L., Rockafellar, R.T.: Implicit Functions and Solution Mappings. Springer Monographs in Mathematics. Springer, New York (2009) · Zbl 1178.26001
[7] Facchinei, F.; Fischer, A.; Herrich, M., A family of Newton methods for nonsmooth constrained systems with nonisolated solutions, Math. Methods Oper. Res., 77, 433-443, (2013) · Zbl 1269.49046
[8] Facchinei, F.; Fischer, A.; Herrich, M., An LP-Newton method: nonsmooth equations, KKT systems, and nonisolated solutions, Math. Program. Ser. A, 146, 1-36, (2014) · Zbl 1317.90276
[9] Facchinei, F., Pang, J.-S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, vol. 1. Springer, New York (2003) · Zbl 1062.90001
[10] Fan, R-E; Chang, K-W; Hsieh, C-J; Wang, X-R; Lin, C-J, LIBLINEAR: a library for large linear classification, J. Mach. Learn. Res., 9, 1871-1874, (2008) · Zbl 1225.68175
[11] Fischer, A., Local behavior of an iterative framework for generalized equations with nonisolated solutions, Math. Program. Ser. A, 94, 91-124, (2002) · Zbl 1023.90067
[12] Fischer, A.; Herrich, M.; Izmailov, AF; Solodov, MV, A globally convergent LP-Newton method, SIAM J. Optim., 26, 2012-2033, (2016) · Zbl 1348.90583
[13] Friedman, J.; Hastie, T.; Höfling, H.; Tibshirani, R., Pathwise coordinate optimization, Ann. Appl. Stat., 1, 302-332, (2007) · Zbl 1378.90064
[14] Hou, K., Zhou, Z., So, A.M.-C., Luo, Z.-Q.: On the linear convergence of the proximal gradient method for trace norm regularization. In Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. Q., (eds), Advances in Neural Information Processing Systems 26: Proceedings of the 2013 Conference, pp. 710-718 (2013)
[15] Hsieh, C.-J., Dhillon, I. S., Ravikumar, P. K., Sustik, M. A.: Sparse inverse covariance matrix estimation using quadratic approximation. In: Shawe-Taylor, J., Zemel, R. S., Bartlett, P., Pereira, F.C.N., Weinberger, K.Q. (eds), Advances in Neural Information Processing Systems 24: Proceedings of the 2011 Conference, pp. 2330-2338 (2011)
[16] Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient Methods under the Polyak-Łojasiewicz Condition. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds) Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2016), Part I, Vol. 9851 of Lecture Notes in Artificial Intelligence, pp. 795-811. Springer International Publishing AG, Cham, Switzerland (2016)
[17] Lee, JD; Sun, Y.; Saunders, MA, Proximal Newton-type methods for minimizing composite functions, SIAM J. Optim., 24, 1420-1443, (2014) · Zbl 1306.65213
[18] Li, D-H; Fukushima, M.; Qi, L.; Yamashita, N., Regularized Newton methods for convex minimization problems with singular solutions, Comput. Optim. Appl., 28, 131-147, (2004) · Zbl 1056.90111
[19] Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. (2017). https://doi.org/10.1007/s10208-017-9366-8
[20] LIBSVM Data: Classification, Regression, and Multi-label. https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/
[21] Liu, H., So, A. M.-C., Wu, W.: Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods. Preprint (2017)
[22] Luo, Z-Q; Tseng, P., Error bound and convergence analysis of matrix splitting algorithms for the affine variational inequality problem, SIAM J. Optim., 2, 43-54, (1992) · Zbl 0777.49010
[23] Luo, Z-Q; Tseng, P., On the linear convergence of descent methods for convex essentially smooth minimization, SIAM J. Control Optim., 30, 408-425, (1992) · Zbl 0756.90084
[24] Luo, Z-Q; Tseng, P., Error bounds and convergence analysis of feasible descent methods: a general approach, Ann. Oper. Res., 46, 157-178, (1993) · Zbl 0793.90076
[25] Moré, J.J.: The Levenberg-Marquardt algorithm: implementation and theory. In: Watson, G.A. (ed.) Numerical Analysis, Volume 630 of Lecture Notes in Mathematics, pp. 105-116. Springer, Berlin (1978)
[26] Nesterov, Yu.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston (2004) · Zbl 1086.90045
[27] Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, second edn. Springer, New York (2006)
[28] O’Donoghue, B.; Candès, E., Adaptive restart for accelerated gradient schemes, Found. Comput. Math., 15, 715-732, (2015) · Zbl 1320.90061
[29] Olsen, P.A., Oztoprak, F., Nocedal, J., Rennie, S.: Newton-like methods for sparse inverse covariance estimation. In: Pereira, F.C.N., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds), Advances in Neural Information Processing Systems 25: Proceedings of the 2012 Conference, pp. 755-763 (2012)
[30] Pang, J-S, A posteriori error bounds for the linearly-constrained variational inequality problem, Math. Oper. Res., 12, 474-484, (1987)
[31] Pang, J-S, Error bounds in mathematical programming, Math. Program., 79, 299-332, (1997) · Zbl 0887.90165
[32] Parikh, N., Boyd, S.: Proximal algorithms. Foundations and Trends\(\textregistered \) in Optimization 1(3), 127-239 (2014)
[33] Qi, H.; Sun, D., A quadratically convergent newton method for computing the nearest correlation matrix, SIAM J. Matrix Anal. Appl., 28, 360-385, (2006) · Zbl 1120.65049
[34] Sardy, S.; Antoniadis, A.; Tseng, P., Automatic smoothing with wavelets for a wide class of distributions, J. Comput. Gr. Stat., 13, 399-421, (2004)
[35] Scheinberg, K.; Tang, X., Practical inexact proximal quasi-newton method with global complexity analysis, Math. Program. Ser. A, 160, 495-529, (2016) · Zbl 1366.90166
[36] Schmidt, M., van den Berg, E., Friedlander, M.P., Murphy, K.: Optimizing costly functions with simple constraints: a limited-memory projected quasi-Newton algorithm. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS 2009), pp. 456-463 (2009)
[37] Tseng, P.: Error bounds and superlinear convergence analysis of some Newton-type methods in optimization. In: Nonlinear Optimization and Related Topics, vol. 36 of Applied Optimization, pp. 445-462. Springer, Dordrecht (2000) · Zbl 0965.65091
[38] Tseng, P., Approximation accuracy, gradient methods, and error bound for structured convex optimization, Math. Program. Ser. B, 125, 263-295, (2010) · Zbl 1207.65084
[39] Tseng, P.; Yun, S., A coordinate gradient descent method for nonsmooth separable minimization, Math. Program. Ser. B, 117, 387-423, (2009) · Zbl 1166.90016
[40] Wen, B.; Chen, X.; Pong, TK, Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems, SIAM J. Optim., 27, 124-145, (2017) · Zbl 1359.90138
[41] Wright, SJ; Nowak, RD; Figueiredo, MAT, Sparse reconstruction by separable approximation, IEEE Trans. Signal Process., 57, 2479-2493, (2009) · Zbl 1391.94442
[42] Yamashita, N.; Fukushima, M.; Alefeld, G. (ed.); Chen, X. (ed.), On the rate of convergence of the Levenberg-Marquardt method, 239-249, (2001), Wien · Zbl 1001.65047
[43] Yen, I. E.-H., Hsieh, C.-J., Ravikumar, P. K., Dhillon, I. S.: Constant nullspace strong convexity and fast convergence of proximal methods under high-dimensional settings. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds), Advances in Neural Information Processing Systems 27: Proceedings of the 2014 Conference, pp. 1008-1016 (2014)
[44] Yuan, G-X; Chang, K-W; Hsieh, C-J; Lin, C-J, A comparison of optimization methods and software for large-scale L1-regularized linear classification, J. Mach. Learn. Res., 11, 3183-3234, (2010) · Zbl 1242.62065
[45] Yuan, G-X; Ho, C-H; Lin, C-J, An improved GLMNET for L1-regularized logistic regression, J. Mach. Learn. Res., 13, 1999-2030, (2012) · Zbl 1432.68404
[46] Yun, S.; Toh, K-C, A coordinate gradient descent method for \(\ell _1\)-regularized convex minimization, Comput. Optim. Appl., 48, 273-307, (2011) · Zbl 1220.90092
[47] Zhang, H.; Jiang, J.; Luo, Z-Q, On the linear convergence of a proximal gradient method for a class of nonsmooth convex minimization problems, J. Oper. Res. Soc. China, 1, 163-186, (2013) · Zbl 1334.90127
[48] Zhong, K., Yen, I.E.-H., Dhillon, I.S., Ravikumar, P.: Proximal quasi-Newton for computationally intensive \(\ell _1\)-regularized \(M\)-estimators. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds) Advances in Neural Information Processing Systems 27: Proceedings of the 2014 Conference, pp. 2375-2383 (2014)
[49] Zhou, Z.; So, AM-C, A unified approach to error bounds for structured convex optimization problems, Math. Program. Ser. A, 165, 689-728, (2017) · Zbl 1380.65109
[50] Zhou, Z., Zhang, Q., So, A.M.-C.: \(\ell _{1,p}\)-norm regularization: error bounds and convergence rate analysis of first-order methods. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 1501-1510 (2015)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.