zbMATH — the first resource for mathematics

Principal component analysis by optimization of symmetric functions has no spurious local optima. (English) Zbl 1430.90468
90C26 Nonconvex programming, global optimization
62H25 Factor analysis and principal components; correspondence analysis
15A23 Factorization of matrices
15A18 Eigenvalues, singular values, and eigenvectors
ElemStatLearn; SDPLR
Full Text: DOI
[1] O. Alter, P. O. Brown, and D. Botstein, Singular value decomposition for genome-wide expression data processing and modeling, Proc. Natl. Acad. Sci. USA, 97 (2000), pp. 10101-10106.
[2] O. Alter and G. Golub, Singular value decomposition of genome-scale MRNA lengths distribution reveals asymmetry in RNA gel electrophoresis band broadening, Proc. Natl. Acad. Sci. USA, 103 (2006), pp. 11828-11833, https://doi.org/10.1073/pnas.0604756103.
[3] S. Bhojanapalli, A. Kyrillidis, and S. Sanghavi, Dropping convexity for faster semi-definite optimization, in Proceedings of the Conference on Learning Theory, 2016, pp. 530-582.
[4] S. Bhojanapalli, B. Neyshabur, and N. Srebro, Global optimality of local search for low rank matrix recovery, in Proceedings of Advances in Neural Information Processing Systems, 2016, pp. 3873-3881.
[5] I. Borg and P. Groenen, Modern Multidimensional Scaling: Theory and Applications, Springer Ser. Statist., Springer New York, 2013, https://books.google.co.uk/books?id=NYDSBwAAQBAJ. · Zbl 0862.62052
[6] N. Boumal, V. Voroninski, and A. Bandeira, The non-convex Burer-Monteiro approach works on smooth semidefinite programs, in Proceedings of Advances in Neural Information Processing Systems, 2016, pp. 2757-2765.
[7] S. Burer and R. D. Monteiro, A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization, Math. Program., 95 (2003), pp. 329-357. · Zbl 1030.90077
[8] A. d’Aspremont, L. E. Ghaoui, M. I. Jordan, and G. R. Lanckriet, A direct formulation for sparse PCA using semidefinite programming, in Proceedings of Advances in Neural Information Processing Systems, 2005, pp. 41-48.
[9] Y. Deshpande and A. Montanari, Information-theoretically optimal sparse pca, in Proceedings of the IEEE International Symposium on Information Theory, IEEE, 2014, pp. 2197-2201.
[10] C. Eckart and G. Young, The approximation of one matrix by another of lower rank, Psychometrika, 1 (1936), pp. 211-218, https://doi.org/10.1007/BF02288367. · JFM 62.1075.02
[11] A. Edelman, T. A. Arias, and S. T. Smith, The geometry of algorithms with orthogonality constraints, SIAM J. Matrix Anal. Appl., 20 (1998), pp. 303-353. · Zbl 0928.65050
[12] A. Eftekhari, G. Ongie, L. Balzano, and M. B. Wakin, Streaming principal component analysis from incomplete data, J. Mach. Learn. Res., 20 (2019), pp. 1-62. · Zbl 1441.62159
[13] R. Ge, F. Huang, C. Jin, and Y. Yuan, Escaping from saddle points—online stochastic gradient for tensor decomposition, in Proceedings of the Conference on Learning Theory, 2015, pp. 797-842.
[14] R. Ge, J. D. Lee, and T. Ma, Matrix completion has no spurious local minimum, in Proceedings of Advances in Neural Information Processing Systems, 2016, pp. 2973-2981.
[15] R. Ge, J. D. Lee, and T. Ma, Learning One-Hidden-Layer Neural Networks with Landscape Design, https://arxiv.org/abs/1711.00501, 2017.
[16] R. Ge and T. Ma, On the optimization landscape of tensor decompositions, in Proceedings of Advances in Neural Information Processing Systems, 2017, pp. 3653-3663.
[17] N. Gillis, The why and how of nonnegative matrix factorization, in Regularization, Optimization, Kernels, and Support Vector Machines, J. A. K. Suykens, M. Signoretto, and A. Argyriou, eds., Chapman & Hall/CRC Press, Boca Raton, FL, 2015, pp. 257-292.
[18] G. Golub and C. Van Loan, Matrix Computations, Johns Hopkins University Press, Baltimore, 1996, https://books.google.ch/books?id=mlOa7wPX6OYC. · Zbl 0865.65009
[19] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Ser. Statist., Springer New York, 2013, https://books.google.co.uk/books?id=yPfZBwAAQBAJ. · Zbl 1273.62005
[20] R. A. Hauser, A. Eftekhari, and H. F. Matzinger, Pca by determinant optimization has no spurious local optima, in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ACM, 2018, pp. 1504-1511.
[21] R. Horn, R. Horn, and C. Johnson, Matrix Analysis, Cambridge University Press, Cambridge, UK, 1990, https://books.google.co.uk/books?id=PlYQN0ypTwEC. · Zbl 0704.15002
[22] R. Horn and C. Johnson, Topics in Matrix Analysis, Cambridge University Press, Cambridge, UK, 1994, https://books.google.co.uk/books?id=ukd0AgAAQBAJ. · Zbl 0801.15001
[23] H. Hotelling, Relations between two sets of variates, Biometrika, (1936), pp. 321-377. · Zbl 0015.40705
[24] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis, Wiley Ser. Adaptive Learning Syst. Signal Process. Learning Commun. Control, Wiley, 2004, New York, https://books.google.co.uk/books?id=96D0ypDwAkkC.
[25] C. Jin, R. Ge, P. Netrapalli, S. M. Kakade, and M. I. Jordan, How to escape saddle points efficiently, in J. Mach. Learn. Res., 70 (2017), pp. 1724-1732.
[26] C. Jin, S. M. Kakade, and P. Netrapalli, Provable efficient online matrix completion via non-convex stochastic gradient descent, in Proceedings of Advances in Neural Information Processing Systems, 2016, pp. 4520-4528.
[27] I. M. Johnstone and A. Y. Lu, On consistency and sparsity for principal components analysis in high dimensions, J. Amer. Statist. Assoc., 104 (2009), pp. 682-693. · Zbl 1388.62174
[28] M. Journée, Y. Nesterov, P. Richtárik, and R. Sepulchre, Generalized power method for sparse principal component analysis, J. Mach. Learn. Res., 11 (2010), pp. 517-553. · Zbl 1242.62048
[29] S. Karlin and Y. Rinott, A generalized Cauchy Binet formula and applications to total positivity and majorization, J. Multivariate Anal., 27 (1988), pp. 284-299. · Zbl 0653.62038
[30] A. S. Lewis and H. S. Sendov, Quadratic expansions of spectral functions, Linear Algebra Appl., 340 (2002), pp. 97-121. · Zbl 0993.15008
[31] Q. Li and G. Tang, The Nonconvex Geometry of Low-Rank Matrix Optimizations with General Objective Functions, https://arxiv.org/abs/1611.03060v1, 2016.
[32] L. Mirsky, Symmetric gauge functions and unitarily invariant norms, Quart. J. Math. Oxford, (1966), pp. 1156-1159.
[33] A. Mokhtari, A. Ozdaglar, and A. Jadbabaie, Escaping saddle points in constrained optimization, in Proceedings of Advances in Neural Information Processing Systems, 2018, pp. 3633-3643.
[34] NIST/SEMATECH Engineering Statistics Handbook, https://books.google.co.uk/books?id=v-XXjwEACAAJ, 2002.
[35] K. Pearson, On lines and planes of closest fit to systems of points in space, Philos. Mag., 2 (1901), pp. 559-572, https://doi.org/10.1080/14786440109462720. · JFM 32.0710.04
[36] B. Schölkopf and A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA, 2002, https://books.google.co.uk/books?id=y8ORL3DWt4sC.
[37] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, Cambridge, UK, 2004, https://books.google.co.uk/books?id=MX0hAwAAQBAJ. · Zbl 0994.68074
[38] M. Soltanolkotabi, A. Javanmard, and J. D. Lee, Theoretical Insights into the Optimization Landscape of Over-Parameterized Shallow Neural Networks, https://arxiv.org/abs/1707.04926, 2017. · Zbl 1428.68255
[39] J. Sun, Q. Qu, and J. Wright, When Are Nonconvex Problems Not Scary?, https://arxiv.org/abs/1510.06096, 2015.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.