×

Finding low-rank solutions via nonconvex matrix factorization, efficiently and provably. (English) Zbl 1419.90065

Summary: A rank-\(r\) matrix \(X \in \mathbb{R}^{m \times n}\) can be written as a product \(U V^\top\), where \(U \in \mathbb{R}^{m \times r}\) and \(V \in \mathbb{R}^{n \times r}\). One could exploit this observation in optimization: e.g., consider the minimization of a convex function \(f(X)\) over rank-\(r\) matrices, where the set of low-rank matrices is modeled via \(UV^\top\). Though such parameterization reduces the number of variables and is more computationally efficient (of particular interest is the case \(r \ll \min\{m, n\}\)), it comes at a cost: \(f(UV^\top)\) becomes a nonconvex function w.r.t. \(U\) and \(V\). We study such parameterization on generic convex objectives \(f\) and focus on first-order, gradient descent algorithms. We propose the bifactored gradient descent \((\mathtt{BFGD})\) algorithm, an efficient first-order method that operates directly on the \(U, V\) factors. We show that when \(f\) is (restricted) smooth, \(\mathtt{BFGD}\) has local sublinear convergence; when \(f\) is both (restricted) smooth and (restricted) strongly convex, it has local linear convergence. For several applications, we provide simple and efficient initialization schemes that provide initial conditions, good enough for the above convergence results to hold, globally. Extensive experimental results support our arguments that \(\mathtt{BFGD}\) is an efficient and accurate nonconvex method, compared to state-of-the-art approaches.

MSC:

90C06 Large-scale problems in mathematical programming
90C26 Nonconvex programming, global optimization
65K05 Numerical mathematical programming methods
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] S. Aaronson, {\it The learnability of quantum states}, Proc. A, 463 (2007), pp. 3089-3114. · Zbl 1140.81317
[2] A. Agarwal, S. Negahban, and M. Wainwright, {\it Fast global convergence rates of gradient methods for high-dimensional statistical recovery}, in Advances in Neural Information Processing Systems, 2010, pp. 37-45.
[3] R. Agrawal, A. Gupta, Y. Prabhu, and M. Varma, {\it Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages}, in Proceedings of the 22nd International Conference on World Wide Web, 2013, pp. 13-24.
[4] A. Anandkumar and R. Ge, {\it Efficient approaches for escaping higher order saddle points in non-convex optimization}, in Proceedings of the Conference on Learning Theory, 2016, pp. 81-102.
[5] H. Andrews and C. Patterson III, {\it Singular value decomposition (SVD) image coding}, IEEE Trans. Commun., 24 (1976), pp. 425-432.
[6] J. Baglama and L. Reichel, {\it Augmented implicitly restarted Lanczos bidiagonalization methods}, SIAM J. Sci. Comput., 27 (2005), pp. 19-42. · Zbl 1087.65039
[7] S. Balakrishnan, M. Wainwright, and B. Yu, {\it Statistical guarantees for the EM algorithm: From population to sample-based analysis}, Ann. Statist., 45 (2017), pp. 77-120. · Zbl 1367.62052
[8] L. Balzano, R. Nowak, and B. Recht, {\it Online identification and tracking of subspaces from highly incomplete information}, in Proceedings of the (Allerton), 48th Annual Allerton Conference on Communication, Control, and Computing IEEE, 2010, pp. 704-711.
[9] S. Becker, J. Bobin, and E. Candès, {\it NESTA: A fast and accurate first-order method for sparse recovery}, SIAM J. Imaging Sci., 4 (2011), pp. 1-39. · Zbl 1209.90265
[10] S. Becker, E. Candès, and M. Grant, {\it Templates for convex cone problems with applications to sparse signal recovery}, Math. Program. Comput., 3 (2011), pp. 165-218. · Zbl 1257.90042
[11] S. Becker, V. Cevher, and A. Kyrillidis, {\it Randomized low-memory singular value projection}, in Proceedings of the 10th International Conference on Sampling Theory and Applications, 2013.
[12] J. Bennett and S. Lanning, {\it The Netflix prize}, in Proceedings of the KDD Cup and Workshop, 2007, p. 35.
[13] K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain, {\it Sparse local embeddings for extreme multi-label classification}, in Advances in Neural Information Processing Systems, 2015, pp. 730-738.
[14] S. Bhojanapalli, A. Kyrillidis, and S. Sanghavi, {\it Dropping convexity for faster semi-definite optimization}, in Proceedings of the Conference on Learning Theory, 2016, pp. 530-582.
[15] S. Bhojanapalli, B. Neyshabur, and N. Srebro, {\it Global optimality of local search for low rank matrix recovery}, in Advances in Neural Information Processing Systems, 2016, pp. 3873-3881.
[16] P. Biswas, T.-C. Liang, K.-C. Toh, Y. Ye, and T.-C. Wang, {\it Semidefinite programming approaches for sensor network localization with noisy distance measurements}, IEEE Trans. Automation Sci. Engrg., 3 (2006), pp. 360-371.
[17] N. Boumal, {\it Optimization and Estimation on Manifolds}, Ph.D. thesis, Catholic University of Louvain, Louvain-la-Neuve, Belgium, 2014.
[18] N. Boumal and P.-A. Absil, {\it RTRMC: A Riemannian trust-region method for low-rank matrix completion}, in Advances in Neural Information Processing Systems, 2011, pp. 406-414.
[19] N. Boumal, V. Voroninski, and A. Bandeira, {\it The non-convex Burer-Monteiro approach works on smooth semidefinite programs}, in Advances in Neural Information Processing Systems, 2016, pp. 2757-2765.
[20] N. Boumal, {\it A Riemannian Low-Rank Method for Optimization over Semidefinite Matrices with Block-Diagonal Constraints}, preprint, , 2015.
[21] S. Burer and R. Monteiro, {\it A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization}, Math. Program., 95 (2003), pp. 329-357. · Zbl 1030.90077
[22] S. Burer and R. Monteiro, {\it Local minima and convergence in low-rank semidefinite programming}, Math. Program., 103 (2005), pp. 427-444. · Zbl 1099.90040
[23] J. Cai, E. Candès, and Z. Shen, {\it A singular value thresholding algorithm for matrix completion}, SIAM J. Optim., 20 (2010), pp. 1956-1982. · Zbl 1201.90155
[24] E. Candes, Y. Eldar, T. Strohmer, and V. Voroninski, {\it Phase retrieval via matrix completion}, SIAM Rev., 57 (2015), pp. 225-251. · Zbl 1344.49057
[25] E. Candes and Y. Plan, {\it Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements}, IEEE Trans. Inform. Theory, 57 (2011), pp. 2342-2359. · Zbl 1366.90160
[26] E. Candès and B. Recht, {\it Exact matrix completion via convex optimization}, Found. Comput. Math., 9 (2009), pp. 717-772. · Zbl 1219.90124
[27] G. Carneiro, A. Chan, P. Moreno, and N. Vasconcelos, {\it Supervised learning of semantic classes for image annotation and retrieval}, IEEE Trans. Pattern Anal. Machine Intell., 29 (2007), pp. 394-410.
[28] Y. Chen, S. Bhojanapalli, S. Sanghavi, and R. Ward, {\it Coherent matrix completion}, in Proceedings of the 31st International Conference on Machine Learning, 2014, pp. 674-682.
[29] Y. Chen and S. Sanghavi, {\it A general framework for high-dimensional estimation in the presence of incoherence}, in Proceedings of the 48th Annual Allerton Conference on Communication, Control, and Computing, IEEE, 2010, pp. 1570-1576.
[30] Y. Chen and M. Wainwright, {\it Fast Low-Rank Estimation by Projected Gradient Descent: General Statistical and Algorithmic Guarantees}, preprint, , 2015.
[31] K.-Y. Chiang, C.-J. Hsieh, N. Natarajan, I. Dhillon, and A. Tewari, {\it Prediction and clustering in signed networks: A local to global perspective}, J. Mach. Learn. Res., 15 (2014), pp. 1177-1213. · Zbl 1319.91134
[32] M. Cohen, J. Nelson, and D. Woodruff, {\it Optimal approximate matrix product in terms of stable rank}, in LIPIcs. Leibniz Int. Proc. Inform. 55, Schloss Dagstuhl Leibniz-Zentrum für Informatik, 2016. · Zbl 1404.65032
[33] M. Collins, S. Dasgupta, and R. Schapire, {\it A generalization of principal components analysis to the exponential family}, in Advances in Neural Information Processing Systems, 2001, pp. 617-624.
[34] M. Davenport, Y. Plan, E. van den Berg, and M. Wootters, {\it 1-bit matrix completion}, Inform. Inference, 3 (2014), pp. 189-223. · Zbl 1309.62124
[35] D. DeCoste, {\it Collaborative prediction using ensembles of maximum margin matrix factorizations}, in Proceedings of the 23rd International Conference on Machine Learning, ACM, 2006, pp. 249-256.
[36] P. Drineas and R. Kannan, {\it Fast Monte-Carlo algorithms for approximate matrix multiplication}, in Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science, 2001, pp. 452-459.
[37] P. Drineas, R. Kannan, and M. Mahoney, {\it Fast Monte Carlo algorithms for matrices} I: {\it Approximating matrix multiplication}, SIAM J. Comput., 36 (2006), pp. 132-157. · Zbl 1111.68147
[38] A. Edelman, T. Arias, and S. Smith, {\it The geometry of algorithms with orthogonality constraints}, SIAM J. Matrix Anal. Appl., 20 (1998), pp. 303-353. · Zbl 0928.65050
[39] M. Fazel, {\it Matrix Rank Minimization with Applications}, Ph.D. thesis, Stanford University, 2002.
[40] M. Fazel, E. Candes, B. Recht, and P. Parrilo, {\it Compressed sensing and robust recovery of low rank matrices}, in 42nd Asilomar Conference on Signals, Systems and Computers, IEEE, 2008, pp. 1043-1047.
[41] M. Fazel, H. Hindi, and S. Boyd, {\it Rank minimization and applications in system theory}, in Proceedings of the American Control Conference, Vol. 4, IEEE, 2004, pp. 3273-3278.
[42] R. Ge, J. Lee, and T. Ma, {\it Matrix completion has no spurious local minimum}, in Advances in Neural Information Processing Systems, 2016, pp. 2973-2981.
[43] D. Gross, Y.-K. Liu, S. Flammia, S. Becker, and J. Eisert, {\it Quantum state tomography via compressed sensing}, Phys. Rev. Lett., 105 (2010), 150401.
[44] N. Gupta and S. Singh, {\it Collectively Embedding Multi-Relational Data for Predicting User Preferences}, preprint, , 2015.
[45] B. Haeffele and R. Vidal, {\it Global Optimality in Tensor Factorization, Deep Learning, and Beyond}, preprint, , 2015.
[46] B. Haeffele and R. Vidal, {\it Structured Low-Rank Matrix Factorization: Global Optimality, Algorithms, and Applications}, preprint, , 2017.
[47] N. Halko, P.-G. Martinsson, and J. Tropp, {\it Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions}, SIAM Rev., 53 (2011), pp. 217-288. · Zbl 1269.65043
[48] M. Hardt and M. Wootters, {\it Fast matrix completion without the condition number}, in Proceedings of the 27th Conference on Learning Theory, 2014, pp. 638-678.
[49] E. Hazan, {\it Sparse approximate solutions to semidefinite programs}, in Proceedings of LATIN 2008: Theoretical Informatics, Springer, New York, 2008, pp. 306-316. · Zbl 1136.90430
[50] U. Helmke and J. Moore, {\it Optimization and Dynamical Systems}, Springer, New York, 2012. · Zbl 0984.49001
[51] M. Jaggi and M. Sulovsk, {\it A simple algorithm for nuclear norm regularized problems}, in Proceedings of the 27th International Conference on Machine Learning, 2010, pp. 471-478.
[52] P. Jain, C. Jin, S. Kakade, and P. Netrapalli, {\it Computing Matrix Square Root via Non Convex Local Search}, preprint, , 2015.
[53] P. Jain, R. Meka, and I. Dhillon, {\it Guaranteed rank minimization via singular value projection}, in Advances in Neural Information Processing Systems, 2010, pp. 937-945.
[54] P. Jain, P. Netrapalli, and S. Sanghavi, {\it Low-rank matrix completion using alternating minimization}, in Proceedings of the 45th Annual ACM Symposium on Theory of Computing, 2013, pp. 665-674. · Zbl 1293.65073
[55] C. Jin, S. Kakade, and P. Netrapalli, {\it Provable efficient online matrix completion via non-convex stochastic gradient descent}, in Advances in Neural Information Processing Systems, 2016, pp. 4520-4528.
[56] C. Johnson, {\it Logistic matrix factorization for implicit feedback data}, in Advances in Neural Information Processing Systems, Vol. 27, 2014.
[57] M. Journée, F. Bach, P.-A. Absil, and R. Sepulchre, {\it Low-rank optimization on the cone of positive semidefinite matrices}, SIAM J. Optim., 20 (2010), pp. 2327-2351. · Zbl 1215.65108
[58] A. Kalev, R. Kosut, and I. Deutsch, {\it Quantum tomography protocols with positivity are compressed sensing protocols}, Quantum Inform., 1 (2015), 15018.
[59] R. Keshavan, {\it Efficient Algorithms for Collaborative Filtering}, Ph.D. thesis, Stanford University, 2012.
[60] R. Keshavan, A. Montanari, and S. Oh, {\it Matrix completion from a few entries}, IEEE Trans. Inform. Theory, 56 (2010), pp. 2980-2998. · Zbl 1366.62111
[61] R. H. Keshavan and S. Oh, {\it A Gradient Descent Algorithm on the Grassman Manifold for Matrix Completion}, preprint, , 2009.
[62] V. Khrulkov and I. Oseledets, {\it Desingularization of bounded-rank matrix sets}, SIAM J. Matrix Anal. Appl., 39 (2018), pp. 451-471. · Zbl 1453.65095
[63] Y. Koren, R. Bell, and C. Volinsky, {\it Matrix factorization techniques for recommender systems}, Computer, 42 (2009), pp. 30-37.
[64] D. Kressner, M. Steinlechner, and B. Vandereycken, {\it Low-rank tensor completion by Riemannian optimization}, BIT, 54 (2014), pp. 447-468. · Zbl 1300.65040
[65] A. Kyrillidis and V. Cevher, {\it Matrix recipes for hard thresholding methods}, J. Math. Imaging Vision, 48 (2014), pp. 235-265. · Zbl 1311.90141
[66] A. Kyrillidis, M. Vlachos, and A. Zouzias, {\it Approximate matrix multiplication with application to linear embeddings}, in Proceedings of the IEEE International Symposium on Information Theory, 2014, pp. 2182-2186.
[67] R. Larsen, {\it PROPACK-Software for Large and Sparse SVD Calculations}, (2004).
[68] S. Laue, {\it A hybrid algorithm for convex semidefinite optimization}, in Proceedings of the 29th International Conference on International Conference on Machine Learning, Omnipress, 2012, pp. 1083-1090.
[69] K. Lee and Y. Bresler, {\it ADMiRA: Atomic decomposition for minimum rank approximation}, IEEE Trans. Information Theory, 56 (2010), pp. 4402-4416. · Zbl 1366.94112
[70] R. Lehoucq, D. Sorensen, and C. Yang, {\it ARPACK Users’ Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods}, Software Environ. Tools 6, SIAM, Philadelphia, 1998. · Zbl 0901.65021
[71] X. Li, Z. Wang, J. Lu, R. Arora, J. Haupt, H. Liu, and T. Zhao, {\it Symmetry, Saddle Points, and Global Geometry of Nonconvex Matrix Factorization}, preprint, , 2016. · Zbl 1432.90123
[72] Z. Lin, M. Chen, and Y. Ma, {\it The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices}, preprint, , 2010.
[73] X. Liu, Z. Wen, and Y. Zhang, {\it Limited memory block Krylov subspace optimization for computing dominant singular value decompositions}, SIAM J. Sci. Comput., 35 (2013), pp. A1641-A1668. · Zbl 1278.65045
[74] Y. Liu, M. Wu, C. Miao, P. Zhao, and X.-L. Li, {\it Neighborhood regularized logistic matrix factorization for drug-target interaction prediction}, PLoS Comput. Biol., 12 (2016), e1004760.
[75] F. Lu, S. Keles, S. Wright, and G. Wahba, {\it Framework for kernel regularization with application to protein clustering}, Proc. Nat. Acad. Sci. U.S.A., 102 (2005), pp. 12332-12337. · Zbl 1135.62345
[76] A. Makadia, V. Pavlovic, and S. Kumar, {\it A new baseline for image annotation}, in Proceedings of Computer Vision–ECCV 2008, Springer, Berlin, 2008, pp. 316-329.
[77] L. Mirsky, {\it Symmetric gage functions and unitarily invariant norms}, Quart. J. Math., 11 (1960), pp. 50-59. · Zbl 0105.01101
[78] B. Mishra, A. Apuroop, and R. Sepulchre, {\it A Riemannian Geometry for Low-Rank Matrix Completion}, preprint, , 2012.
[79] B. Mishra, G. Meyer, S. Bonnabel, and R. Sepulchre, {\it Fixed-rank matrix factorizations and Riemannian low-rank optimization}, Comput. Statist., 29 (2014), pp. 591-621. · Zbl 1306.65107
[80] B. Mishra, G. Meyer, and R. Sepulchre, {\it Low-rank optimization for distance matrix completion}, in 50th IEEE conference on Decision and Control and European Control Conference, IEEE, 2011, pp. 4455-4460.
[81] S. Negahban and M. Wainwright, {\it Restricted strong convexity and weighted matrix completion: Optimal bounds with noise}, J. Mach. Learn. Res., 13 (2012), pp. 1665-1697. · Zbl 1436.62204
[82] Y. Nesterov, {\it Introductory Lectures on Convex Optimization: A Basic Course}, Springer, Berlin, 2004. · Zbl 1086.90045
[83] D. Park, A. Kyrillidis, S. Bhojanapalli, C. Caramanis, and S. Sanghavi, {\it Provable Burer-Monteiro Factorization for a Class of Norm-Constrained Matrix Problems}, preprint, , 2016.
[84] D. Park, A. Kyrillidis, C. Carmanis, and S. Sanghavi, {\it Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach}, in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017, pp. 65-74.
[85] B. Recht, M. Fazel, and P. Parrilo, {\it Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization}, SIAM Rev., 52 (2010), pp. 471-501. · Zbl 1198.90321
[86] J. Rennie and N. Srebro, {\it Fast maximum margin matrix factorization for collaborative prediction}, in Proceedings of the 22nd International Conference on Machine Learning, ACM, 2005, pp. 713-719.
[87] A. I. Schein, L. K. Saul, and L. H. Ungar, {\it A generalized linear model for principal component analysis of binary data}, in Proceedings of AISTATS, 2003.
[88] N. Srebro, J. Rennie, and T. Jaakkola, {\it Maximum-margin matrix factorization}, in Advances in Neural Information Processing Systems, 2004, pp. 1329-1336.
[89] R. Sun and Z.-Q. Luo, {\it Guaranteed matrix completion via nonconvex factorization}, in Proceedings of the IEEE 56th Annual Symposium on Foundations of Computer Science, 2015, pp. 270-289.
[90] J. Tanner and K. Wei, {\it Normalized iterative hard thresholding for matrix completion}, SIAM J. Sci. Comput., 35 (2013), pp. S104-S125. · Zbl 1282.65043
[91] M. Tipping and C. Bishop, {\it Probabilistic principal component analysis}, J. R. Stat. Soc. Ser. B Stat. Methodol., 61 (1999), pp. 611-622. · Zbl 0924.62068
[92] S. Tu, R. Boczar, M. Simchowitz, M. Soltanolkotabi, and B. Recht, {\it Low-rank solutions of linear matrix equations via procrustes flow}, in Proceedings of the 33rd International Conference on International Conference on Machine Learning, Vol. 48, 2016, pp. 964-973.
[93] A. Uschmajew and B. Vandereycken, {\it Greedy rank updates combined with Riemannian descent methods for low-rank optimization}, in Proceedings of the 12th International Conference on Sampling Theory and Applications, 2015.
[94] K. Verstrepen, {\it Collaborative Filtering with Binary, Positive-Only Data}, Ph.D. thesis, University of Antwerpen, 2015.
[95] I. Waldspurger, A. d’Aspremont, and S. Mallat, {\it Phase recovery, MaxCut and complex semidefinite programming}, Math. Program., 149 (2015), pp. 47-81. · Zbl 1329.94018
[96] C. Wang, S. Yan, L. Zhang, and H.-J. Zhang, {\it Multi-label sparse coding for automatic image annotation}, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, pp. 1643-1650.
[97] L. Wang, X. Zhang, and Q. Gu, {\it A Unified Computational and Statistical Framework for Nonconvex Low-Rank Matrix Estimation}, preprint, , 2016.
[98] L. Wang, X. Zhang, and Q. Gu, {\it A unified variance reduction-based framework for nonconvex low-rank matrix recovery}, in Proceedings of the International Conference on Machine Learning, 2017, pp. 3712-3721.
[99] A. Waters, A. Sankaranarayanan, and R. Baraniuk, {\it SpaRCS: Recovering low-rank and sparse matrices from compressive measurements}, in Advances in Neural Information Processing Systems, 2011, pp. 1089-1097.
[100] K. Weinberger, F. Sha, Q. Zhu, and L. Saul, {\it Graph Laplacian regularization for large-scale semidefinite programming}, in Advances in Neural Information Processing Systems, 2007, pp. 1489-1496.
[101] J. Weston, S. Bengio, and N. Usunier, {\it WSABIE: Scaling up to large vocabulary image annotation}, in Proceedings of IJCAI, 2011, pp. 2764-2770.
[102] M. Wootters, {\it private communication}, 2016.
[103] X. Yi, D. Park, Y. Chen, and C. Caramanis, {\it Fast algorithms for robust PCA via gradient descent}, in Advances in Neural Information Processing Systems, 2016, pp. 4152-4160.
[104] A. Yurtsever, Q. Tran-Dinh, and V. Cevher, {\it A universal primal-dual convex optimization framework}, in Advances in Neural Information Processing Systems, Vol. 28, 2015, pp. 3132-3140.
[105] D. Zhang and L. Balzano, {\it Global Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation}, preprint, , 2015.
[106] T. Zhao, Z. Wang, and H. Liu, {\it A nonconvex optimization framework for low rank matrix estimation}, in Advances in Neural Information Processing Systems, Vol. 28, 2015, pp. 559-567.
[107] Q. Zheng and J. Lafferty, {\it A convergent gradient descent algorithm for rank minimization and semidefinite programming from random linear measurements}, in Advances in Neural Information Processing Systems, 2015, pp. 109-117.
[108] Q. Zheng and J. Lafferty, {\it Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent}, preprint, , 2016.
[109] G. Zhou, W. Huang, K. Gallivan, P. Van Dooren, and P.-A. Absil, {\it A Riemannian rank-adaptive method for low-rank optimization}, Neurocomputing, 192 (2016), pp. 72-80.
[110] Z. Zhu, Q. Li, G. Tang, and M. Wakin, {\it The Global Optimization Geometry of Nonsymmetric Matrix Factorization and Sensing}, preprint, , 2017.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.