×

Bridging convex and nonconvex optimization in robust PCA: noise, outliers and missing data. (English) Zbl 1486.62181

Summary: This paper delivers improved theoretical guarantees for the convex programming approach in low-rank matrix estimation, in the presence of (1) random noise, (2) gross sparse outliers and (3) missing data. This problem, often dubbed as robust principal component analysis (robust PCA), finds applications in various domains. Despite the wide applicability of convex relaxation, the available statistical support (particularly the stability analysis in the presence of random noise) remains highly suboptimal, which we strengthen in this paper. When the unknown matrix is well conditioned, incoherent and of constant rank, we demonstrate that a principled convex program achieves near-optimal statistical accuracy, in terms of both the Euclidean loss and the \({\ell_{\infty }}\) loss. All of this happens even when nearly a constant fraction of observations are corrupted by outliers with arbitrary magnitudes. The key analysis idea lies in bridging the convex program in use and an auxiliary nonconvex optimization algorithm, and hence the title of this paper.

MSC:

62H25 Factor analysis and principal components; correspondence analysis
62H12 Estimation in multivariate analysis
90C25 Convex programming
90C26 Nonconvex programming, global optimization
94A12 Signal theory (characterization, reconstruction, filtering, etc.)
PDFBibTeX XMLCite
Full Text: DOI arXiv Link

References:

[1] Agarwal, A., Negahban, S. and Wainwright, M. J. (2012). Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. Ann. Statist. 40 1171-1197. · Zbl 1274.62219 · doi:10.1214/12-AOS1000
[2] Ahmed, A., Recht, B. and Romberg, J. (2014). Blind deconvolution using convex programming. IEEE Trans. Inf. Theory 60 1711-1732. · Zbl 1360.94057 · doi:10.1109/TIT.2013.2294644
[3] Cai, H., Cai, J.-F. and Wei, K. (2019). Accelerated alternating projections for robust principal component analysis. J. Mach. Learn. Res. 20 Paper No. 20, 33. · Zbl 1483.62098
[4] Cai, T. T., Ma, Z. and Wu, Y. (2013). Sparse PCA: Optimal rates and adaptive estimation. Ann. Statist. 41 3074-3110. · Zbl 1288.62099 · doi:10.1214/13-AOS1178
[5] Cai, J.-F. and Wei, K. (2018). Solving systems of phaseless equations via Riemannian optimization with optimal sampling complexity. Preprint. arXiv:1809.02773.
[6] Candès, E. J., Li, X. and Soltanolkotabi, M. (2015). Phase retrieval via Wirtinger flow: Theory and algorithms. IEEE Trans. Inf. Theory 61 1985-2007. · Zbl 1359.94069 · doi:10.1109/TIT.2015.2399924
[7] Candès, E. and Plan, Y. (2010). Matrix completion with noise. Proc. IEEE 98 925-936.
[8] Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9 717-772. · Zbl 1219.90124 · doi:10.1007/s10208-009-9045-5
[9] Candès, E. J. and Tao, T. (2010). The power of convex relaxation: Near-optimal matrix completion. IEEE Trans. Inf. Theory 56 2053-2080. · Zbl 1366.15021 · doi:10.1109/TIT.2010.2044061
[10] Candès, E. J., Li, X., Ma, Y. and Wright, J. (2011). Robust principal component analysis? J. ACM 58 Art. 11, 37. · Zbl 1327.62369 · doi:10.1145/1970392.1970395
[11] Chandrasekaran, V., Parrilo, P. A. and Willsky, A. S. (2012). Latent variable graphical model selection via convex optimization. Ann. Statist. 40 1935-1967. · Zbl 1257.62061 · doi:10.1214/11-AOS949
[12] Chandrasekaran, V., Sanghavi, S., Parrilo, P. A. and Willsky, A. S. (2011). Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21 572-596. · Zbl 1226.90067 · doi:10.1137/090761793
[13] Charisopoulos, V., Davis, D., Díaz, M. and Drusvyatskiy, D. (2019). Composite optimization for robust blind deconvolution. Preprint. arXiv::1901.01624. · Zbl 1475.94032
[14] Chen, Y. (2015). Incoherence-optimal matrix completion. IEEE Trans. Inf. Theory 61 2909-2923. · Zbl 1359.15022 · doi:10.1109/TIT.2015.2415195
[15] Chen, Y. and Candès, E. J. (2017). Solving random quadratic systems of equations is nearly as easy as solving linear systems. Comm. Pure Appl. Math. 70 822-883. · Zbl 1379.90024 · doi:10.1002/cpa.21638
[16] Chen, Y. and Candès, E. J. (2018). The projected power method: An efficient algorithm for joint alignment from pairwise differences. Comm. Pure Appl. Math. 71 1648-1714. · Zbl 1480.90199 · doi:10.1002/cpa.21760
[17] Chen, Y. and Chi, Y. (2014). Robust spectral compressed sensing via structured matrix completion. IEEE Trans. Inf. Theory 60 6576-6601. · Zbl 1360.94064 · doi:10.1109/TIT.2014.2343623
[18] Chen, Y., Guibas, L. J. and Huang, Q. (2014). Near-optimal joint optimal matching via convex relaxation. In International Conference on Machine Learning (ICML) 100-108.
[19] Chen, J., Liu, D. and Li, X. (2020). Nonconvex rectangular matrix completion via gradient descent without \[{\ell_{2,\infty }}\] regularization. IEEE Trans. Inform. Theory 66 5806-5841. · Zbl 1448.90078 · doi:10.1109/TIT.2020.2992234
[20] Chen, Y. and Wainwright, M. J. (2015). Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees. arXiv:1509.03025.
[21] Chen, Y., Jalali, A., Sanghavi, S. and Caramanis, C. (2013). Low-rank matrix recovery from errors and erasures. IEEE Transactions on Information Theory 59 4324-4337.
[22] Chen, Y., Jalali, A., Sanghavi, S. and Xu, H. (2014). Clustering partially observed graphs via convex optimization. J. Mach. Learn. Res. 15 2213-2238. · Zbl 1319.62123
[23] Chen, Y., Chi, Y., Fan, J. and Ma, C. (2019a). Gradient descent with random initialization: Fast global convergence for nonconvex phase retrieval. Math. Program. 176 5-37. · Zbl 1415.90086 · doi:10.1007/s10107-019-01363-6
[24] Chen, Y., Fan, J., Ma, C. and Yan, Y. (2019b). Inference and uncertainty quantification for noisy matrix completion. Proc. Natl. Acad. Sci. USA 116 22931-22937. · Zbl 1431.90117 · doi:10.1073/pnas.1910053116
[25] Chen, Y., Chi, Y., Fan, J., Ma, C. and Yan, Y. (2020a). Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex optimization. SIAM J. Optim. 30 3098-3121. · Zbl 1477.90060 · doi:10.1137/19M1290000
[26] Chen, Y., Chi, Y., Fan, J. and Ma, C. (2020b). Spectral methods for data science: A statistical perspective. Preprint. arXiv::2012.08496.
[27] Chen, Y., Fan, J., Wang, B. and Yan, Y. (2020c). Convex and nonconvex optimization are both minimax-optimal for noisy blind deconvolution. Preprint. arXiv::2008.01724.
[28] Chen, Y., Fan, J., Ma, C. and Yan, Y. (2021). Supplement to “Bridging convex and nonconvex optimization in robust PCA: Noise, outliers and missing data.” https://doi.org/10.1214/21-AOS2066SUPP
[29] Cherapanamjeri, Y., Gupta, K. and Jain, P. (2017). Nearly optimal robust matrix completion. In Proceedings of the 34th International Conference on Machine Learning 70 797-805. JMLR.org.
[30] Chi, Y., Lu, Y. M. and Chen, Y. (2019). Nonconvex optimization meets low-rank matrix factorization: An overview. IEEE Trans. Signal Process. 67 5239-5269. · Zbl 07123429 · doi:10.1109/TSP.2019.2937282
[31] Davenport, M. A. and Romberg, J. (2016). An overview of low-rank matrix recovery from incomplete observations. IEEE Journal of Selected Topics in Signal Processing 10 608-622.
[32] Donoho, D. and Gavish, M. (2014). Minimax risk of matrix denoising by singular value thresholding. Ann. Statist. 42 2413-2440. · Zbl 1310.62014 · doi:10.1214/14-AOS1257
[33] Drusvyatskiy, D., Ioffe, A. D. and Lewis, A. S. (2021). Nonsmooth optimization using Taylor-like models: Error bounds, convergence, and termination criteria. Math. Program. 185 357-383. · Zbl 1459.65083 · doi:10.1007/s10107-019-01432-w
[34] Fan, J., Fan, Y. and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. J. Econometrics 147 186-197. · Zbl 1429.62185 · doi:10.1016/j.jeconom.2008.09.017
[35] Fan, J., Liao, Y. and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 603-680. · Zbl 1411.62138 · doi:10.1111/rssb.12016
[36] Fan, J., Wang, W. and Zhong, Y. (2017). An \[{\ell_{\infty }}\] eigenvector perturbation bound and its application to robust covariance estimation. J. Mach. Learn. Res. 18 Paper No. 207, 42. · Zbl 1473.15015
[37] Fan, J., Wang, W. and Zhong, Y. (2019). Robust covariance estimation for approximate factor models. J. Econometrics 208 5-22. · Zbl 1452.62410 · doi:10.1016/j.jeconom.2018.09.003
[38] Fan, J., Sun, Q., Zhou, W.-X. and Zhu, Z. (2018). Principal component analysis for big data. Preprint. arXiv::1801.01602.
[39] Feng, J., Xu, H. and Yan, S. (2013). Online robust PCA via stochastic optimization. In Advances in Neural Information Processing Systems 404-412.
[40] Ganesh, A., Wright, J., Li, X., Candes, E. J. and Ma, Y. (2010). Dense error correction for low-rank matrices via principal component pursuit. In 2010 IEEE International Symposium on Information Theory 1513-1517. IEEE.
[41] Goldfarb, D., Ma, S. and Scheinberg, K. (2013). Fast alternating linearization methods for minimizing the sum of two convex functions. Math. Program. 141 349-382. · Zbl 1280.65051 · doi:10.1007/s10107-012-0530-2
[42] Gross, D. (2011). Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inform. Theory 57 1548-1566. · Zbl 1366.94103 · doi:10.1109/TIT.2011.2104999
[43] Gu, Q., Wang, Z. W. and Liu, H. (2016). Low-rank and sparse structure pursuit via alternating minimization. In Artificial Intelligence and Statistics 600-609.
[44] Guo, H., Qiu, C. and Vaswani, N. (2014). An online algorithm for separating sparse and low-dimensional signal sequences from their sum. IEEE Trans. Signal Process. 62 4284-4297. · Zbl 1394.94216 · doi:10.1109/TSP.2014.2331612
[45] Hsu, D., Kakade, S. M. and Zhang, T. (2011). Robust matrix decomposition with sparse corruptions. IEEE Trans. Inform. Theory 57 7221-7234. · Zbl 1365.15018 · doi:10.1109/TIT.2011.2158250
[46] Huang, Q. and Guibas, L. (2013). Consistent shape maps via semidefinite programming. Computer Graphics Forum 32 177-186.
[47] Jain, P., Netrapalli, P. and Sanghavi, S. (2013). Low-rank matrix completion using alternating minimization (extended abstract). In STOC’13—Proceedings of the 2013 ACM Symposium on Theory of Computing 665-674. ACM, New York. · Zbl 1293.65073 · doi:10.1145/2488608.2488693
[48] Jolliffe, I. T. (1986). Principal Component Analysis. Springer Series in Statistics. Springer, New York. · doi:10.1007/978-1-4757-1904-8
[49] Keshavan, R. H., Montanari, A. and Oh, S. (2010). Matrix completion from a few entries. IEEE Trans. Inform. Theory 56 2980-2998. · Zbl 1366.62111 · doi:10.1109/TIT.2010.2046205
[50] Klopp, O. (2014). Noisy low-rank matrix completion with general sampling distribution. Bernoulli 20 282-303. · Zbl 1400.62115 · doi:10.3150/12-BEJ486
[51] Klopp, O., Lounici, K. and Tsybakov, A. B. (2017). Robust matrix completion. Probab. Theory Related Fields 169 523-564. · Zbl 1383.62167 · doi:10.1007/s00440-016-0736-y
[52] Koltchinskii, V., Lounici, K. and Tsybakov, A. B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302-2329. · Zbl 1231.62097 · doi:10.1214/11-AOS894
[53] Krahmer, F. and Stöger, D. (2021). On the convex geometry of blind deconvolution and matrix completion. Comm. Pure Appl. Math. 74 790-832. · Zbl 1510.65085
[54] Li, X. (2013). Compressed sensing and matrix completion with constant proportion of corruptions. Constr. Approx. 37 73-99. · Zbl 1258.93076 · doi:10.1007/s00365-012-9176-9
[55] Li, Y., Ma, C., Chen, Y. and Chi, Y. (2019). Nonconvex matrix factorization from rank-one measurements. In The 22nd International Conference on Artificial Intelligence and Statistics 1496-1505.
[56] Ma, S. and Aybat, N. S. (2018). Efficient optimization algorithms for robust principal component analysis and its variants. Proceedings of the IEEE 106 1411-1426.
[57] Ma, C., Wang, K., Chi, Y. and Chen, Y. (2020). Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval, matrix completion, and blind deconvolution. Found. Comput. Math. 20 451-632. · Zbl 1445.90089 · doi:10.1007/s10208-019-09429-9
[58] Mazumder, R., Hastie, T. and Tibshirani, R. (2010). Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11 2287-2322. · Zbl 1242.68237
[59] Negahban, S. and Wainwright, M. J. (2012). Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. J. Mach. Learn. Res. 13 1665-1697. · Zbl 1436.62204
[60] Netrapalli, P., Jain, P. and Sanghavi, S. (2015). Phase retrieval using alternating minimization. IEEE Trans. Signal Process. 63 4814-4826. · Zbl 1394.94421 · doi:10.1109/TSP.2015.2448516
[61] Netrapalli, P., Niranjan, U., Sanghavi, S., Anandkumar, A. and Jain, P. (2014). Non-convex robust PCA. In Advances in Neural Information Processing Systems 1107-1115.
[62] Pearson, K. (1901). LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2 559-572. · JFM 32.0710.04
[63] Qiu, C. and Vaswani, N. (2010). Real-time robust principal components’ pursuit. In 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton) 591-598. IEEE.
[64] Qiu, C., Vaswani, N., Lois, B. and Hogben, L. (2014). Recursive robust PCA or recursive sparse recovery in large but structured noise. IEEE Trans. Inform. Theory 60 5007-5039. · Zbl 1360.94093 · doi:10.1109/TIT.2014.2331344
[65] Shen, Y., Wen, Z. and Zhang, Y. (2014). Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization. Optim. Methods Softw. 29 239-263. · Zbl 1285.90068 · doi:10.1080/10556788.2012.700713
[66] Singer, A. (2011). Angular synchronization by eigenvectors and semidefinite programming. Appl. Comput. Harmon. Anal. 30 20-36. · Zbl 1206.90116 · doi:10.1016/j.acha.2010.02.001
[67] Srebro, N. and Shraibman, A. (2005). Rank, trace-norm and max-norm. In Learning Theory. Lecture Notes in Computer Science 3559 545-560. Springer, Berlin. · Zbl 1137.68563 · doi:10.1007/11503415_37
[68] Sun, R. and Luo, Z.-Q. (2016). Guaranteed matrix completion via non-convex factorization. IEEE Trans. Inform. Theory 62 6535-6579. · Zbl 1359.94179 · doi:10.1109/TIT.2016.2598574
[69] Tao, M. and Yuan, X. (2011). Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21 57-81. · Zbl 1218.90115 · doi:10.1137/100781894
[70] Vaswani, N. and Narayanamurthy, P. (2018). Static and dynamic robust PCA and matrix completion: A review. Proceedings of the IEEE 106 1359-1379.
[71] Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210-268. Cambridge Univ. Press, Cambridge.
[72] Wang, G., Giannakis, G. B. and Eldar, Y. C. (2018). Solving systems of random quadratic equations via truncated amplitude flow. IEEE Trans. Inform. Theory 64 773-794. · Zbl 1390.90451 · doi:10.1109/TIT.2017.2756858
[73] Wei, K., Cai, J.-F., Chan, T. F. and Leung, S. (2016). Guarantees of Riemannian optimization for low rank matrix recovery. SIAM J. Matrix Anal. Appl. 37 1198-1222. · Zbl 1347.65109 · doi:10.1137/15M1050525
[74] Wong, R. K. W. and Lee, T. C. M. (2017). Matrix completion with noisy entries and outliers. J. Mach. Learn. Res. 18 Paper No. 147, 25. · Zbl 1443.15017
[75] Yi, X., Park, D., Chen, Y. and Caramanis, C. (2016). Fast algorithms for robust PCA via gradient descent. In NIPS 4152-4160.
[76] Zhan, J., Lois, B., Guo, H. and Vaswani, N. (2016). Online (and offline) robust PCA: Novel algorithms and performance guarantees. In Artificial Intelligence and Statistics 1488-1496.
[77] Zhang, H., Chi, Y. and Liang, Y. (2016). Provable non-convex phase retrieval with outliers: Median truncated Wirtinger flow. In International Conference on Machine Learning 1022-1031.
[78] Zhang, X., Wang, L. W. and Gu, Q. (2018). A unified framework for nonconvex low-rank plus sparse matrix recovery. In International Conference on Artificial Intelligence and Statistics.
[79] Zheng, Q. and Lafferty, J. (2016). Convergence analysis for rectangular matrix completion using Burer-Monteiro factorization and gradient descent. arXiv:1605.07051.
[80] Zhou, Z., Li, X., Wright, J., Candès, E. and Ma, Y. (2010). Stable principal component pursuit. In International Symposium on Information Theory 1518-1522
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.