# zbMATH — the first resource for mathematics

Lower bounds for finding stationary points I. (English) Zbl 1451.90128
Summary: We prove lower bounds on the complexity of finding $$\epsilon$$-stationary points (points $$x$$ such that $$\Vert \nabla f(x)\Vert \le \epsilon)$$ of smooth, high-dimensional, and potentially non-convex functions $$f$$. We consider oracle-based complexity measures, where an algorithm is given access to the value and all derivatives of $$f$$ at a query point $$x$$. We show that for any (potentially randomized) algorithm $$\mathsf{A}$$, there exists a function $$f$$ with Lipschitz $$p$$ th order derivatives such that $$\mathsf{A}$$ requires at least $$\epsilon^{-(p+1)/p}$$ queries to find an $$\epsilon$$-stationary point. Our lower bounds are sharp to within constants, and they show that gradient descent, cubic-regularized Newton’s method, and generalized $$p$$th order regularization are worst-case optimal within their natural function classes.

##### MSC:
 90C26 Nonconvex programming, global optimization 90C06 Large-scale problems in mathematical programming 90C60 Abstract computational complexity for mathematical programming problems 68Q25 Analysis of algorithms and problem complexity
##### Software:
L-BFGS; SDPLR; Wirtinger Flow
Full Text:
##### References:
 [1] Agarwal, A.; Bartlett, PL; Ravikumar, P.; Wainwright, MJ, Information-theoretic lower bounds on the oracle complexity of convex optimization, IEEE Trans. Inf. Theory, 58, 5, 3235-3249 (2012) · Zbl 1365.94132 [2] Agarwal, N., Allen-Zhu, Z., Bullins, B., Hazan, E., Ma, T.: Finding approximate local minima faster than gradient descent. In: Proceedings of the Forty-Ninth Annual ACM Symposium on the Theory of Computing (2017) · Zbl 1369.68290 [3] Arjevani, Y.; Shalev-Shwartz, S.; Shamir, O., On lower and upper bounds in smooth and strongly convex optimization, J. Mach. Learn. Res., 17, 126, 1-51 (2016) · Zbl 1391.90467 [4] Arjevani, Y., Shamir, O., Shiff, R.: Oracle complexity of second-order methods for smooth convex optimization (2017). arXiv:1705.07260 [math.OC] · Zbl 1430.90460 [5] Ball, K.; Levy, S., An elementary introduction to modern convex geometry, Flavors of Geometry, 1-58 (1997), Cambridge: MSRI Publications, Cambridge · Zbl 0901.52002 [6] Berend, D.; Tassa, T., Improved bounds on Bell numbers and on moments of sums of random variables, Prob. Math. Stat., 30, 2, 185-205 (2010) · Zbl 1230.60014 [7] Birgin, EG; Gardenghi, JL; Martínez, JM; Santos, SA; Toint, PL, Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models, Math. Program., 163, 1-2, 359-368 (2017) · Zbl 1365.90236 [8] Boumal, N.; Voroninski, V.; Bandeira, A., The non-convex Burer-Monteiro approach works on smooth semidefinite programs, Adv. Neural Inf. Process. Syst., 30, 2757-2765 (2016) [9] Boyd, S.; Vandenberghe, L., Convex Optimization (2004), Cambridge: Cambridge University Press, Cambridge · Zbl 1058.90049 [10] Braun, G.; Guzmán, C.; Pokutta, S., Lower bounds on the oracle complexity of nonsmooth convex optimization via information theory, IEEE Trans. Inf. Theory, 63, 7, 4709-4724 (2017) · Zbl 1370.94383 [11] Burer, S.; Monteiro, RD, A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization, Math. Program., 95, 2, 329-357 (2003) · Zbl 1030.90077 [12] Candès, EJ; Li, X.; Soltanolkotabi, M., Phase retrieval via Wirtinger flow: theory and algorithms, IEEE Trans. Inf. Theory, 61, 4, 1985-2007 (2015) · Zbl 1359.94069 [13] Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: Convex until proven guilty: dimension-free acceleration of gradient descent on non-convex functions. In: Proceedings of the 34th International Conference on Machine Learning (2017) [14] Carmon, Y., Duchi, J. C., Hinder, O., Sidford, A.: Lower bounds for finding stationary points II: first-order methods (2017). arXiv: 1711.00841 [math.OC]. URL https://arxiv.org/pdf/1711.00841.pdf · Zbl 1400.90250 [15] Carmon, Y.; Duchi, JC; Hinder, O.; Sidford, A., Accelerated methods for non-convex optimization, SIAM J. Optim., 28, 2, 1751-1772 (2018) · Zbl 1400.90250 [16] Cartis, C.; Gould, NI; Toint, PL, On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization problems, SIAM J. Optim., 20, 6, 2833-2852 (2010) · Zbl 1211.90225 [17] Cartis, C.; Gould, NI; Toint, PL, Complexity bounds for second-order optimality in unconstrained optimization, J. Complex., 28, 1, 93-108 (2012) · Zbl 1245.65063 [18] Cartis, C., Gould, N.I.M., Toint, P.L.: How much patience do you have? A worst-case perspective on smooth nonconvex optimization. Optima 88, 1-10 (2012) [19] Cartis, C.; Gould, NI; Toint, PL, A note about the complexity of minimizing nesterov’s smooth Chebyshev-Rosenbrock function, Optim. Methods Softw., 28, 451-457 (2013) · Zbl 1273.90199 [20] Cartis, C., Gould, N.I.M., Toint, P.L.: Worst-case evaluation complexity and optimality of second-order methods for nonconvex smooth optimization (2017). arXiv:1709.07180 [math.OC] · Zbl 1451.90177 [21] Chowla, S.; Herstein, IN; Moore, WK, On recursions connected with symmetric groups I, Can. J. Math., 3, 328-334 (1951) · Zbl 0043.25904 [22] Conn, AR; Gould, NIM; Toint, PL, Trust Region Methods (2000), Bangkok: SIAM, Bangkok [23] Hager, WW; Zhang, H., A survey of nonlinear conjugate gradient methods, Pac. J. Optim., 2, 1, 35-58 (2006) · Zbl 1117.90048 [24] Hinder, O.: Cutting plane methods can be extended into nonconvex optimization. In: Proceedings of the Thirty First Annual Conference on Computational Learning Theory (2018) [25] Jarre, F., On Nesterov’s smooth Chebyshev-Rosenbrock function, Optim. Methods Softw., 28, 3, 478-500 (2013) · Zbl 1266.49068 [26] Jin, C., Ge, R., Netrapalli, P., Kakade, S.M., Jordan, M.I.: How to escape saddle points efficiently. In: Proceedings of the 34th International Conference on Machine Learning (2017) [27] Keshavan, RH; Montanari, A.; Oh, S., Matrix completion from noisy entries, J. Mach. Learn. Res., 11, 2057-2078 (2010) · Zbl 1242.62069 [28] LeCun, Y.; Bengio, Y.; Hinton, G., Deep learning, Nature, 521, 7553, 436-444 (2015) [29] Liu, D.; Nocedal, J., On the limited memory BFGS method for large scale optimization, Math. Program., 45, 1, 503-528 (1989) · Zbl 0696.90048 [30] Loh, P-L; Wainwright, MJ, High-dimensional regression with noisy and missing data: provable guarantees with nonconvexity, Ann. Stat., 40, 3, 1637-1664 (2012) · Zbl 1257.62063 [31] Loh, P-L; Wainwright, MJ, Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima, J. Mach. Learn. Res., 16, 559-616 (2013) · Zbl 1360.62276 [32] Monteiro, RD; Svaiter, BF, An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods, SIAM J. Optim., 23, 2, 1092-1125 (2013) · Zbl 1298.90071 [33] Murty, K.; Kabadi, S., Some NP-complete problems in quadratic and nonlinear programming, Math. Program., 39, 117-129 (1987) · Zbl 0637.90078 [34] Nemirovski, A.: Efficient methods in convex programming. The Israel Institute of Technology, Technion (1994) [35] Nemirovski, A.; Yudin, D., Problem Complexity and Method Efficiency in Optimization (1983), Hoboken: Wiley, Hoboken [36] Nesterov, Y., A method of solving a convex programming problem with convergence rate $${O}(1/k^2)$$, Sov. Math. Dokl., 27, 2, 372-376 (1983) · Zbl 0535.90071 [37] Nesterov, Y., Introductory Lectures on Convex Optimization (2004), Cambridge: Kluwer Academic Publishers, Cambridge · Zbl 1086.90045 [38] Nesterov, Y., Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J. Optim., 22, 2, 341-362 (2012) · Zbl 1257.90073 [39] Nesterov, Y., How to make the gradients small, Optima, 88, 10-11 (2012) [40] Nesterov, Y.; Polyak, B., Cubic regularization of Newton method and its global performance, Math. Program. Ser. A, 108, 177-205 (2006) · Zbl 1142.90500 [41] Nocedal, J.; Wright, SJ, Numerical Optimization (2006), Berlin: Springer, Berlin [42] Sun, J.; Qu, Q.; Wright, J., A geometric analysis of phase retrieval, Found. Comput. Math., 18, 5, 1131-1198 (2018) · Zbl 1401.94049 [43] Traub, J.; Wasilkowski, H.; Wozniakowski, H., Information-Based Complexity (1988), Cambridge: Academic Press, Cambridge · Zbl 0654.94004 [44] Vavasis, SA, Black-box complexity of local minimization, SIAM J. Optim., 3, 1, 60-80 (1993) · Zbl 0794.90045 [45] Woodworth, BE; Srebro, N., Tight complexity bounds for optimizing composite objectives, Adv. Neural Inf. Process. Syst., 30, 3639-3647 (2016) [46] Woodworth, B. E., Srebro, N.: Lower bound for randomized first order convex optimization (2017). arXiv:1709.03594 [math.OC] [47] Zhang, X.; Ling, C.; Qi, L., The best rank-1 approximation of a symmetric tensor and related spherical optimization problems, SIAM J. Matrix Anal. Appl., 33, 3, 806-821 (2012) · Zbl 1269.15026
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.