zbMATH — the first resource for mathematics

Newton-type methods for non-convex optimization under inexact Hessian information. (English) Zbl 1451.90134
Summary: We consider variants of trust-region and adaptive cubic regularization methods for non-convex optimization, in which the Hessian matrix is approximated. Under certain condition on the inexact Hessian, and using approximate solution of the corresponding sub-problems, we provide iteration complexity to achieve \(\varepsilon\)-approximate second-order optimality which have been shown to be tight. Our Hessian approximation condition offers a range of advantages as compared with the prior works and allows for direct construction of the approximate Hessian with a priori guarantees through various techniques, including randomized sampling methods. In this light, we consider the canonical problem of finite-sum minimization, provide appropriate uniform and non-uniform sub-sampling strategies to construct such Hessian approximations, and obtain optimal iteration complexity for the corresponding sub-sampled trust-region and adaptive cubic regularization methods.

90C26 Nonconvex programming, global optimization
90C53 Methods of quasi-Newton type
65K05 Numerical mathematical programming methods
90C06 Large-scale problems in mathematical programming
Full Text: DOI
[1] Agarwal, N., Allen-Zhu, Z., Bullins, B., Hazan, E., Ma, T.: Finding approximate local minima faster than gradient descent (2016). ArXiv preprint arXiv:1611.01146 · Zbl 1369.68290
[2] Bandeira, AS; Scheinberg, K.; Vicente, LN, Convergence of trust-region methods based on probabilistic models, SIAM J. Optim., 24, 3, 1238-1264 (2014) · Zbl 1311.90186
[3] Beaton, AE; Tukey, JW, The fitting of power series, meaning polynomials, illustrated on band-spectroscopic data, Technometrics, 16, 2, 147-185 (1974) · Zbl 0282.62057
[4] Bianconcini, T.; Liuzzi, G.; Morini, B.; Sciandrone, M., On the use of iterative methods in cubic regularization for unconstrained optimization, Comput. Optim. Appl., 60, 1, 35-57 (2015) · Zbl 1308.90166
[5] Blanchet, J., Cartis, C., Menickelly, M., Scheinberg, K.: Convergence rate analysis of a stochastic trust region method for nonconvex optimization (2018). ArXiv preprint arXiv:1609.07428v3
[6] Bollapragada, R., Byrd, R., Nocedal, J.: Exact and inexact subsampled Newton methods for optimization (2016). ArXiv preprint arXiv:1609.08502
[7] Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning (2016). ArXiv preprint arXiv:1606.04838 · Zbl 1397.65085
[8] Carmon, Y., Duchi, J.C.: Gradient descent efficiently finds the cubic-regularized non-convex Newton step (2016). ArXiv preprint arXiv:1612.00547 · Zbl 07105232
[9] Cartis, C.; Gould, NI; Toint, PL, On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization problems, SIAM J. Optim., 20, 6, 2833-2852 (2010) · Zbl 1211.90225
[10] Cartis, C.; Gould, NI; Toint, PL, Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results, Math. Program., 127, 2, 245-295 (2011) · Zbl 1229.90192
[11] Cartis, C.; Gould, NI; Toint, PL, Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function-and derivative-evaluation complexity, Math. Program., 130, 2, 295-319 (2011) · Zbl 1229.90193
[12] Cartis, C., Gould, N.I., Toint, P.L.: Optimal Newton-type methods for nonconvex smooth optimization problems. Tech. rep., ERGO technical report 11-009, School of Mathematics, University of Edinburgh (2011)
[13] Cartis, C.; Gould, NI; Toint, PL, Complexity bounds for second-order optimality in unconstrained optimization, J. Complex., 28, 1, 93-108 (2012) · Zbl 1245.65063
[14] Cartis, C.; Gould, NI; Toint, PL, Evaluation complexity of adaptive cubic regularization methods for convex unconstrained optimization, Optim. Methods Softw., 27, 2, 197-219 (2012) · Zbl 1252.90061
[15] Cartis, C.; Scheinberg, K., Global convergence rate analysis of unconstrained optimization methods based on probabilistic models, Math. Program., 169, 2, 337-375 (2018) · Zbl 1407.90307
[16] Chen, R., Menickelly, M., Scheinberg, K.: Stochastic optimization using a trust-region method and random models (2015). ArXiv preprint arXiv:1504.04231 · Zbl 1401.90136
[17] Conn, AR; Gould, NI; Toint, PL, Trust Region Methods (2000), Philadelphia: SIAM, Philadelphia
[18] Conn, AR; Scheinberg, K.; Vicente, LN, Global convergence of general derivative-free trust-region algorithms to first-and second-order critical points, SIAM J. Optim., 20, 1, 387-415 (2009) · Zbl 1187.65062
[19] Curtis, F.E., Robinson, D.P., Samadi, M.: An inexact regularized Newton framework with a worst-case iteration complexity of \({\cal{O}}(\varepsilon^{-3/2})\) for nonconvex optimization (2017). ArXiv preprint arXiv:1708.00475
[20] Curtis, FE; Robinson, DP; Samadi, M., A trust region algorithm with a worst-case iteration complexity of \(\cal{O}(\varepsilon^{-3/2})\) for nonconvex optimization, Math. Program., 162, 1-2, 1-32 (2017)
[21] Dennis, JE; Moré, JJ, A characterization of superlinear convergence and its application to quasi-Newton methods, Math. Comput., 28, 126, 549-560 (1974) · Zbl 0282.65042
[22] Drineas, P.; Mahoney, MW, RandNLA: randomized numerical linear algebra, Commun. ACM, 59, 6, 80-90 (2016)
[23] Erway, JB; Gill, PE; Griffin, JD, Iterative methods for finding a trust-region step, SIAM J. Optim., 20, 2, 1110-1131 (2009) · Zbl 1189.49049
[24] Friedman, J.; Hastie, T.; Tibshirani, R., The Elements of Statistical Learning. Springer Series in Statistics (2001), Berlin: Springer, Berlin
[25] Garber, D., Hazan, E., Jin, C., Musco, C., Netrapalli, P., Sidford, A., et al.: Faster eigenvector computation via shift-and-invert preconditioning. In: International Conference on Machine Learning, pp. 2626-2634 (2016)
[26] Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points-online stochastic gradient for tensor decomposition. In: COLT, pp. 797-842 (2015)
[27] Ghadimi, S., Liu, H., Zhang, T.: Second-order methods with cubic regularization under inexact information (2017). ArXiv preprint arXiv:1710.05782
[28] Gould, NI; Lucidi, S.; Roma, M.; Toint, PL, Solving the trust-region subproblem using the Lanczos method, SIAM J. Optim., 9, 2, 504-525 (1999) · Zbl 1047.90510
[29] Gould, NI; Robinson, DP; Thorne, HS, On solving trust-region and other regularised subproblems in optimization, Math. Program. Comput., 2, 1, 21-57 (2010) · Zbl 1193.65098
[30] Grapiglia, GN; Yuan, J.; Yuan, Y., On the worst-case complexity of nonlinear stepsize control algorithms for convex unconstrained optimization, Optim. Methods Softw., 31, 3, 591-604 (2016) · Zbl 1350.90032
[31] Gratton, S.; Mouffe, M.; Toint, PL; Weber-Mendonça, M., A recursive-trust-region method for bound-constrained nonlinear optimization, IMA J. Numer. Anal., 28, 4, 827-861 (2008) · Zbl 1156.65060
[32] Gratton, S.; Royer, CW; Vicente, LN; Zhang, Z., Complexity and global rates of trust-region methods based on probabilistic models, IMA J. Numer. Anal., 38, 3, 1579-1597 (2018) · Zbl 06983857
[33] Gratton, S.; Sartenaer, A.; Toint, PL, Recursive trust-region methods for multiscale nonlinear optimization, SIAM J. Optim., 19, 1, 414-444 (2008) · Zbl 1163.90024
[34] Griewank, A.: The modification of Newton’s method for unconstrained optimization by bounding cubic terms. Technical Report NA/12. Department of Applied Mathematics and Theoretical Physics, University of Cambridge (1981)
[35] Gross, D., Nesme, V.: Note on sampling without replacing from a finite collection of matrices (2010). ArXiv preprint arXiv:1001.2738
[36] Hazan, E.; Koren, T., A linear-time algorithm for trust region problems, Math. Program., 158, 1-2, 363-381 (2016) · Zbl 1346.90654
[37] Kohler, J.M., Lucchi, A.: Sub-sampled cubic regularization for non-convex optimization (2017). ArXiv preprint arXiv:1705.05933
[38] Kuczyński, J.; Woźniakowski, H., Estimating the largest eigenvalue by the power and Lanczos algorithms with a random start, SIAM J. Matrix Anal. Appl., 13, 4, 1094-1122 (1992) · Zbl 0759.65016
[39] Larson, J.; Billups, SC, Stochastic derivative-free optimization using a trust region framework, Comput. Optim. Appl., 64, 3, 619-645 (2016) · Zbl 1381.90098
[40] Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent only converges to minimizers. In: Conference on Learning Theory, pp. 1246-1257 (2016)
[41] Lenders, F., Kirches, C., Potschka, A.: trlib: a vector-free implementation of the GLTR method for iterative solution of the trust region problem (2016). ArXiv preprint arXiv:1611.04718 · Zbl 1390.35364
[42] Mahoney, MW, Randomized algorithms for matrices and data, Found. Trends® Mach. Learn., 3, 2, 123-224 (2011) · Zbl 1232.68173
[43] Moré, JJ; Sorensen, DC, Computing a trust region step, SIAM J. Sci. Stat. Comput., 4, 3, 553-572 (1983) · Zbl 0551.65042
[44] Nesterov, Y., Accelerating the cubic regularization of Newton’s method on convex problems, Math. Program., 112, 1, 159-181 (2008) · Zbl 1167.90013
[45] Nesterov, Y.; Polyak, BT, Cubic regularization of Newton method and its global performance, Math. Program., 108, 1, 177-205 (2006) · Zbl 1142.90500
[46] Nocedal, J.; Wright, S., Numerical Optimization (2006), Berlin: Springer, Berlin · Zbl 1104.65059
[47] Roosta-Khorasani, F.; van den Doel, K.; Ascher, U., Data completion and stochastic algorithms for PDE inversion problems with many measurements, Electron. Trans. Numer. Anal., 42, 177-196 (2014) · Zbl 1312.65176
[48] Roosta-Khorasani, F.; van den Doel, K.; Ascher, U., Stochastic algorithms for inverse problems involving PDEs and many measurements, SIAM J. Sci. Comput., 36, 5, S3-S22 (2014) · Zbl 1307.65158
[49] Roosta-Khorasani, F.; Mahoney, MW, Sub-sampled Newton methods, Math. Program., 174, 1, 293-326 (2019) · Zbl 1412.49059
[50] Roosta-Khorasani, F.; Székely, GJ; Ascher, U., Assessing stochastic algorithms for large scale nonlinear least squares problems using extremal probabilities of linear combinations of gamma random variables, SIAM/ASA J. Uncertain. Quantif., 3, 1, 61-90 (2015) · Zbl 1327.65013
[51] Shalev-Shwartz, S.; Ben-David, S., Understanding Machine Learning: From Theory to Algorithms (2014), Cambridge: Cambridge University Press, Cambridge · Zbl 1305.68005
[52] Shashaani, S., Hashemi, F., Pasupathy, R.: ASTRO-DF: a class of adaptive sampling trust-region algorithms for derivative-free stochastic optimization (2016). ArXiv preprint arXiv:1610.06506 · Zbl 1403.90541
[53] Sorensen, D., Minimization of a large-scale quadratic functionsubject to a spherical constraint, SIAM J. Optim., 7, 1, 141-161 (1997) · Zbl 0878.65044
[54] Sorensen, DC, Newton’s method with a model trust region modification, SIAM J. Numer. Anal., 19, 2, 409-426 (1982) · Zbl 0483.65039
[55] Sra, S.; Nowozin, S.; Wright, SJ, Optimization for Machine Learning (2012), Cambridge: MIT Press, Cambridge
[56] Steihaug, T., The conjugate gradient method and trust regions in large scale optimization, SIAM J. Numer. Anal., 20, 3, 626-637 (1983) · Zbl 0518.65042
[57] Tripuraneni, N., Stern, M., Jin, C., Regier, J., Jordan, M.I.: Stochastic cubic regularization for fast nonconvex optimization (2017). ArXiv preprint arXiv:1711.02838
[58] Tropp, JA, User-friendly tail bounds for sums of random matrices, Found. Comput. Math., 12, 4, 389-434 (2012) · Zbl 1259.60008
[59] Tropp, J.A.: An introduction to matrix concentration inequalities (2015). ArXiv preprint arXiv:1501.01571 · Zbl 1391.15071
[60] Woodruff, D.P.: Sketching as a tool for numerical linear algebra (2014). ArXiv preprint arXiv:1411.4357 · Zbl 1316.65046
[61] Xu, P., Roosta-Khorasani, F., Mahoney, M.W.: Second-order optimization for non-convex machine learning: an empirical study (2017). ArXiv preprint arXiv:1708.07827
[62] Xu, P., Yang, J., Roosta-Khorasani, F., Ré, C., Mahoney, M.W.: Sub-sampled Newton methods with non-uniform sampling. In: Advances in Neural Information Processing Systems, pp. 3000-3008 (2016)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.