×

Analytical convergence regions of accelerated gradient descent in nonconvex optimization under regularity condition. (English) Zbl 1440.93071

Summary: There is a growing interest in using robust control theory to analyze and design optimization and machine learning algorithms. This paper studies a class of nonconvex optimization problems whose cost functions satisfy the so-called regularity condition (RC). Empirical studies show that accelerated gradient descent (AGD) algorithms (e.g. Nesterov’s acceleration and heavy-ball) with proper initializations often work well in practice. However, the convergence of such AGD algorithms is largely unknown in the literature. The main contribution of this paper is the analytical characterization of the convergence regions of AGD under RC via robust control tools. Since such optimization problems arise frequently in many applications such as phase retrieval, training of neural networks and matrix sensing, our result shows promise of robust control theory in these areas.

MSC:

93B35 Sensitivity (robustness)
90C26 Nonconvex programming, global optimization

Software:

Wirtinger Flow
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Aybat, Necdet Serhat; Fallah, Alireza; Gurbuzbalaban, Mert; Ozdaglar, Asuman, Robust accelerated gradient methods for smooth strongly convex functions (2018), arXiv preprint arXiv:1805.10579 · Zbl 1461.62145
[2] Candes, Emmanuel J.; Li, Xiaodong; Soltanolkotabi, Mahdi, Phase retrieval via Wirtinger flow: Theory and algorithms, IEEE Transactions on Information Theory, 61, 4, 1985-2007 (2015) · Zbl 1359.94069
[3] Chen, Yuxin; Candès, Emmanuel J., Solving random quadratic systems of equations is nearly as easy as solving linear systems, Communications on Pure and Applied Mathematics, 70, 5, 822-883 (2017), 3628877 · Zbl 1379.90024
[4] Cherukuri, Ashish; Mallada, Enrique; Low, Steven; Cortés, Jorge, The role of convexity in saddle-point dynamics: Lyapunov function and robustness, IEEE Transactions on Automatic Control, 63, 8, 2449-2464 (2017) · Zbl 1423.93343
[5] Chi, Yuejie; Lu, Yue M.; Chen, Yuxin, Nonconvex optimization meets low-rank matrix factorization: An overview, IEEE Transactions on Signal Processing, 67, 20, 5239-5269 (2019) · Zbl 07123429
[6] Cyrus, Saman; Hu, Bin; Van Scoy, Bryan; Lessard, Laurent, A robust accelerated optimization algorithm for strongly convex functions, (2018 annual American control conference, ACC (2018), IEEE), 1376-1381
[7] Dhingra, Neil K.; Khong, Sei Zhen; Jovanovic, Mihailo R., The proximal augmented Lagrangian method for nonsmooth composite optimization, IEEE Transactions on Automatic Control, 64, 7, 2861-2868 (2019) · Zbl 1482.90168
[8] Fazlyab, Mahyar; Ribeiro, Alejandro; Morari, Manfred; Preciado, Victor M., Analysis of optimization algorithms via integral quadratic constraints: Nonstrongly convex problems, SIAM Journal on Optimization, 28, 3, 2654-2689 (2018) · Zbl 1406.90089
[9] Hu, Bin; Lessard, Laurent, Control interpretations for first-order optimization methods, (2017 American control conference, ACC (2017), IEEE), 3114-3119
[10] Hu, Bin; Lessard, Laurent, Dissipativity theory for Nesterov’s accelerated method, (Proceedings of the 34th international conference on machine learning (vol. 70) (2017)), 1549-1557
[11] Hu, Bin; Seiler, Peter; Lessard, Laurent, Analysis of approximate stochastic gradient using quadratic constraints and sequential semidefinite programs (2017), arXiv preprint arXiv:1711.00987 · Zbl 1465.90052
[12] Hu, Bin, Seiler, Peter, & Rantzer, Anders (2017). A unified analysis of stochastic optimization methods using jump system theory and quadratic constraints. In Proceedings of the 2017 conference on learning theory (vol. 65) (pp. 1157-1189).
[13] Hu, Bin, Wright, Stephen, & Lessard, Laurent (2018). Dissipativity theory for accelerating stochastic variance reduction: A unified analysis of SVRG and Katyusha using semidefinite programs. In Proceedings of the 35th international conference on machine learning (vol. 80) (pp. 2038-2047).
[14] Keshavan, Raghunandan H.; Montanari, Andrea; Oh, Sewoong, Matrix completion from a few entries, IEEE Transactions on Information Theory, 56, 6, 2980-2998 (2010) · Zbl 1366.62111
[15] Kolarijani, Arman Sharifi, Esfahani, Peyman Mohajerin, & Keviczky, Tamas (2018). Fast gradient-based methods with exponential rate: A hybrid control framework. In Proceedings of the 35th international conference on machine learning (vol. 80) (pp. 2728-2736). · Zbl 07256450
[16] Lessard, Laurent; Recht, Benjamin; Packard, Andrew, Analysis and design of optimization algorithms via integral quadratic constraints, SIAM Journal on Optimization, 26, 1, 57-95 (2016) · Zbl 1329.90103
[17] Li, Yuanxin; Chi, Yuejie; Zhang, Huishuai; Liang, Yingbin, Non-convex low-rank matrix recovery with arbitrary outliers via median-truncated gradient descent, Information and Inference: A Journal of the IMA (2019), arXiv:http://oup.prod.sis.lan/imaiai/advance-article-pdf/doi/10.1093/imaiai/iaz009/28566401/iaz009.pdf · Zbl 1471.62376
[18] Li, Yuanzhi; Yuan, Yang, Convergence analysis of two-layer neural networks with ReLU activation, (Advances in neural information processing systems (2017)), 597-607
[19] Nesterov, Yurii, Introductory lectures on convex optimization: A basic course (vol. 87) (2003), Springer Science & Business Media · Zbl 1086.90045
[20] Nishihara, Robert, Lessard, Laurent, Recht, Benjamin, Packard, Andrew, & Jordan, Michael I. (2015). A general analysis of the convergence of ADMM. In proceedings of the 32nd international conference on international conference on machine learning (vol. 37) (pp. 343-352).
[21] Pauwels, Edouard Jean Robert; Beck, Amir; Eldar, Yonina C.; Sabach, Shoham, On fienup methods for sparse phase retrieval, IEEE Transactions on Signal Processing, 66, 4, 982-991 (2017) · Zbl 1414.94467
[22] Polyak, Boris T., Some methods of speeding up the convergence of iteration methods, USSR Computational Mathematics and Mathematical Physics, 4, 5, 1-17 (1964) · Zbl 0147.35301
[23] Rantzer, Anders, On the Kalman-Yakubovich-Popov lemma, Systems & Control Letters, 28, 1, 7-10 (1996) · Zbl 0866.93052
[24] Sundararajan, Akhil; Hu, Bin; Lessard, Laurent, Robust convergence analysis of distributed optimization algorithms, (2017 55th annual allerton conference on communication, control, and computing, Allerton (2017), IEEE), 1206-1212
[25] Tu, Stephen, Boczar, Ross, Simchowitz, Max, Soltanolkotabi, Mahdi, & Recht, Ben (2016). Low-rank solutions of linear matrix equations via procrustes flow. In International conference on machine learning (pp. 964-973).
[26] Van Scoy, Bryan; Freeman, Randy A.; Lynch, Kevin M., The fastest known globally convergent first-order method for minimizing strongly convex functions, IEEE Control Systems Letters, 2, 1, 49-54 (2018)
[27] Wang, Gang; Giannakis, Georgios B.; Eldar, Yonina C., Solving systems of random quadratic equations via truncated amplitude flow, IEEE Transactions on Information Theory, 64, 2, 773-794 (2017) · Zbl 1390.90451
[28] Wilson, Ashia C.; Recht, Benjamin; Jordan, Michael I., A Lyapunov analysis of momentum methods in optimization (2016), arXiv preprint arXiv:1611.02635
[29] Zhang, Huishuai, Chi, Yuejie, & Liang, Yingbin (2016). Provable non-convex phase retrieval with outliers: Median truncated Wirtinger flow. International conference on machine learning (pp. 1022-1031).
[30] Zhang, Huishuai; Liang, Yingbin; Chi, Yuejie, A nonconvex approach for phase retrieval: Reshaped Wirtinger flow and incremental algorithms, Journal of Machine Learning Research (JMLR), 18, 141, 1-35 (2017) · Zbl 1440.94020
[31] Zhou, Yi; Liang, Yingbin, Characterization of gradient dominance and regularity conditions for neural networks (2017), arXiv preprint arXiv:1710.06910
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.