Levin, Eitan; Kileel, Joe; Boumal, Nicolas Finding stationary points on bounded-rank matrices: a geometric hurdle and a smooth remedy. (English) Zbl 07681267 Math. Program. 199, No. 1-2 (A), 831-864 (2023). Summary: We consider the problem of provably finding a stationary point of a smooth function to be minimized on the variety of bounded-rank matrices. This turns out to be unexpectedly delicate. We trace the difficulty back to a geometric obstacle: On a nonsmooth set, there may be sequences of points along which standard measures of stationarity tend to zero, but whose limit points are not stationary. We name such events apocalypses, as they can cause optimization algorithms to converge to non-stationary points. We illustrate this explicitly for an existing optimization algorithm on bounded-rank matrices. To provably find stationary points, we modify a trust-region method on a standard smooth parameterization of the variety. The method relies on the known fact that second-order stationary points on the parameter space map to stationary points on the variety. Our geometric observations and proposed algorithm generalize beyond bounded-rank matrices. We give a geometric characterization of apocalypses on general constraint sets, which implies that Clarke-regular sets do not admit apocalypses. Such sets include smooth manifolds, manifolds with boundaries, and convex sets. Our trust-region method supports parameterization by any complete Riemannian manifold. Cited in 1 ReviewCited in 2 Documents MSC: 65K10 Numerical optimization and variational techniques 49J53 Set-valued and variational analysis 90C26 Nonconvex programming, global optimization 90C46 Optimality conditions and duality in mathematical programming Keywords:low-rank matrices; singular algebraic varieties; Riemannian optimization; variational analysis; Clarke regularity; tangent cones; nonsmooth sets; stationarity PDFBibTeX XMLCite \textit{E. Levin} et al., Math. Program. 199, No. 1--2 (A), 831--864 (2023; Zbl 07681267) Full Text: DOI arXiv References: [1] Absil, P-A; Baker, CG; Gallivan, KA, Trust-region methods on Riemannian manifolds, Found. Comput. Math., 7, 3, 303-330 (2007) · Zbl 1129.65045 [2] Absil, P-A; Mahony, R.; Sepulchre, R., Optimization Algorithms on Matrix Manifolds (2008), Princeton, NJ: Princeton University Press, Princeton, NJ · Zbl 1147.65043 [3] Agarwal, N.; Boumal, N.; Bullins, B.; Cartis, C., Adaptive regularization with cubics on manifolds, Math. Program., 188, 1, 85-134 (2021) · Zbl 1470.90087 [4] Barber, RF; Ha, W., Gradient descent with non-convex constraints: local concavity determines convergence, Information and Inference: A Journal of the IMA, 7, 4, 755-806 (2018) · Zbl 1476.90251 [5] Bauschke, HH; Combettes, PL, Convex Analysis and Monotone Operator Theory in Hilbert Spaces (2011), Berlin: Springer, Berlin · Zbl 1218.47001 [6] Bendokat, T., Zimmermann, R., Absil, P.-A.: A Grassmann manifold handbook: Basic geometry and computational aspects. arXiv preprint arXiv:2011.13699 (2020) [7] Bi, Y., Lavaei, J.: On the absence of spurious local minima in nonlinear low-rank matrix recovery problems. In: A. Banerjee and K. Fukumizu, editors, Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, pp. 379-387. PMLR, 13-15 (Apr 2021) [8] Boumal, N.: An introduction to optimization on smooth manifolds. To appear with Cambridge University Press, (Jan 2022) · Zbl 07633911 [9] Boumal, N.; Absil, P-A; Cartis, C., Global rates of convergence for nonconvex optimization on manifolds, IMA J. Numer. Anal., 39, 1, 1-33 (2018) · Zbl 1483.65092 [10] Cartis, C.; Gould, N.; Toint, P., Complexity bounds for second-order optimality in unconstrained optimization, J. Complex., 28, 1, 93-108 (2012) · Zbl 1245.65063 [11] Clarke, FH; Ledyaev, YS; Stern, RJ; Wolenski, PR, Nonsmooth Analysis and Control Theory (2008), Berlin: Springer, Berlin [12] Curtis, F.; Lubberts, Z.; Robinson, D., Concise complexity analyses for trust region methods, Optim. Lett., 12, 8, 1713-1724 (2018) · Zbl 1412.90139 [13] De Lathauwer, L.; De Moor, B.; Vandewalle, J., A multilinear singular value decomposition, SIAM J. Matrix Anal. Appl., 21, 4, 1253-1278 (2000) · Zbl 0962.15005 [14] Deutsch, FR, Best Approximation in Inner Product Spaces (2012), Berlin: Springer, Berlin [15] Ding, L., Zhang, Y., Chen, Y.: Low-rank matrix recovery with non-quadratic loss: projected gradient method and regularity projection oracle. arXiv preprint arXiv:2008.13777 (2020) [16] Dragomir, R.-A., d’Aspremont, A., Bolte, J.: Quartic first-order methods for low-rank minimization. To appear in Journal of Optimization Theory and Applications, arXiv:1901.10791 (2021) [17] Du, S. S., Jin, C., Lee, J. D., Jordan, M. I., Singh, A., Poczos, B.: Gradient descent can take exponential time to escape saddle points. In: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., (2017) [18] Gao, B., Absil, P.-A.: A Riemannian rank-adaptive method for low-rank matrix completion. arXiv:2103.14768 [19] Gillis, N.; Glineur, F., Low-rank matrix approximation with weights or missing data is NP-hard, SIAM J. Matrix Anal. Appl., 32, 4, 1149-1165 (2011) · Zbl 1242.65077 [20] Ha, W.; Liu, H.; Foygel Barber, R., An equivalence between critical points for rank constraints versus low-rank factorizations, SIAM J. Optim., 30, 4, 2927-2955 (2020) · Zbl 1453.90127 [21] Hackbusch, W., Tensor Spaces and Numerical Tensor Calculus (2012), Berlin: Springer, Berlin · Zbl 1244.65061 [22] Hesse, R.; Luke, DR, Nonconvex notions of regularity and convergence of fundamental algorithms for feasibility problems, SIAM J. Optim., 23, 4, 2397-2419 (2013) · Zbl 1288.65094 [23] Hosseini, S.; Luke, DR; Uschmajew, A., Tangent and Normal Cones for Low-Rank Matrices, 45-53 (2019), Cham: Springer International Publishing, Cham · Zbl 1417.15050 [24] Hosseini, S.; Uschmajew, A., A gradient sampling method on algebraic varieties and application to nonsmooth low-rank optimization, SIAM J. Optim., 29, 4, 2853-2880 (2019) · Zbl 1428.49015 [25] Hou, T. Y., Li, Z., Zhang, Z.: Fast global convergence for low-rank matrix recovery via Riemannian gradient descent with random initialization. arXiv preprint arXiv:2012.15467 (2020) [26] Hou, T. Y., Li, Z., Zhang, Z.: Asymptotic escape of spurious critical points on the low-rank matrix manifold. arXiv preprint arXiv:2107.09207 (2021) [27] Jain, P., Tewari, A., Kar, P.: On iterative hard thresholding methods for high-dimensional M-estimation. In: Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc. (2014) [28] Jia, X., Kanzow, C., Mehlitz, P., Wachsmuth, G.: An augmented Lagrangian method for optimization problems with structured geometric constraints. arXiv preprint arXiv:2105.08317 (2021) [29] Khrulkov, V.; Oseledets, I., Desingularization of bounded-rank matrix sets, SIAM J. Matrix Anal. Appl., 39, 1, 451-471 (2018) · Zbl 1453.65095 [30] Lee, JM, Introduction to Smooth Manifolds (2012), Berlin: Springer, Berlin · Zbl 1258.53002 [31] Levin, E., Towards Optimization on Varieties (2020), Princeton, NJ: Princeton University, Princeton, NJ [32] Levin, E., Kileel, J., Boumal, N.: Finding stationary points on bounded-rank matrices: A geometric hurdle and a smooth remedy. arXiv preprint arXiv:2107.03877 (2021) [33] Li, X-R; Song, W.; Xiu, N-H, Optimality conditions for rank-constrained matrix optimization, J. Oper. Res. Soc. China., 7, 2, 285-301 (2019) · Zbl 1438.90272 [34] Ma, C.; Li, Y.; Chi, Y., Beyond Procrustes: Balancing-free gradient descent for asymmetric low-rank matrix sensing, IEEE Trans. Signal Process., 69, 867-877 (2021) · Zbl 07591387 [35] Mishra, B.; Meyer, G.; Bonnabel, S.; Sepulchre, R., Fixed-rank matrix factorizations and Riemannian low-rank optimization, Comput. Stat., 29, 3-4, 591-621 (2014) · Zbl 1306.65107 [36] Olikier, G., Absil, P.-A.: On the continuity of the tangent cone to the determinantal variety. Technical Report UCL-INMA-2021.06, University of Louvain (April 2021). Accessed May 2021 · Zbl 1493.49021 [37] Olikier, G., Gallivan, K. A., Absil, P.-A.: An apocalypse-free first-order low-rank optimization algorithm. arXiv preprint arXiv:2201.03962 (2022) [38] Oseledets, IV, Tensor-train decomposition, SIAM J. Sci. Comput., 33, 5, 2295-2317 (2011) · Zbl 1232.15018 [39] Park, D.; Kyrillidis, A.; Caramanis, C.; Sanghavi, S., Finding low-rank solutions via nonconvex matrix factorization, efficiently and provably, SIAM J. Imag. Sci., 11, 4, 2165-2204 (2018) · Zbl 1419.90065 [40] Rockafellar, RT; Wets, RJ-B, Variational Analysis (2009), Berlin: Springer, Berlin [41] Ruszczyński, A., Nonlinear Optimization (2006), Princeton, NJ: Princeton University Press, Princeton, NJ · Zbl 1108.90001 [42] Schneider, R.; Uschmajew, A., Convergence results for projected line-search methods on varieties of low-rank matrices via Łojasiewicz inequality, SIAM J. Optim., 25, 1, 622-646 (2015) · Zbl 1355.65079 [43] Tan, M., Tsang, I. W., Wang, L., Vandereycken, B., Pan, S. J.: Riemannian pursuit for big matrix recovery. In: International Conference on Machine Learning, pp. 1539-1547 (2014) [44] Uschmajew, A., Vandereycken, B.: Line-search methods and rank increase on low-rank matrix varieties. In: Proceedings of the 2014 International Symposium on Nonlinear Theory and its Applications (NOLTA2014), pp. 52-55 (2014) [45] Uschmajew, A., Vandereycken, B.: Greedy rank updates combined with Riemannian descent methods for low-rank optimization. In: 2015 International Conference on Sampling Theory and Applications (SampTA), pp. 420-424. IEEE (2015) [46] Uschmajew, A.; Vandereycken, B., On critical points of quadratic low-rank matrix optimization problems, IMA J. Numer. Anal., 40, 4, 2626-2651 (2020) · Zbl 1464.65042 [47] Vandereycken, B., Low-rank matrix completion by Riemannian optimization, SIAM J. Optim., 23, 2, 1214-1236 (2013) · Zbl 1277.15021 [48] Zhou, G., Huang, W., Gallivan, K. A., Van Dooren, P., Absil, P.-A.: A Riemannian rank-adaptive method for low-rank optimization. Neurocomputing, 192:72-80 (2016). Advances in artificial neural networks, machine learning and computational intelligence [49] Zhu, Z.; Li, Q.; Tang, G.; Wakin, MB, Global optimality in low-rank matrix optimization, IEEE Trans. Signal Process., 66, 13, 3614-3628 (2018) · Zbl 1414.90297 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.