zbMATH — the first resource for mathematics

The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics. (English) Zbl 07114917
Summary: The singular value matrix decomposition plays a ubiquitous role throughout statistics and related fields. Myriad applications including clustering, classification, and dimensionality reduction involve studying and exploiting the geometric structure of singular values and singular vectors.
This paper provides a novel collection of technical and theoretical tools for studying the geometry of singular subspaces using the two-to-infinity norm. Motivated by preliminary deterministic Procrustes analysis, we consider a general matrix perturbation setting in which we derive a new Procrustean matrix decomposition. Together with flexible machinery developed for the two-to-infinity norm, this allows us to conduct a refined analysis of the induced perturbation geometry with respect to the underlying singular vectors even in the presence of singular value multiplicity. Our analysis yields singular vector entrywise perturbation bounds for a range of popular matrix noise models, each of which has a meaningful associated statistical inference task. In addition, we demonstrate how the two-to-infinity norm is the preferred norm in certain statistical settings. Specific applications discussed in this paper include covariance estimation, singular subspace recovery, and multiple graph inference.
Both our Procrustean matrix decomposition and the technical machinery developed for the two-to-infinity norm may be of independent interest.

62H12 Estimation in multivariate analysis
62H25 Factor analysis and principal components; correspondence analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDF BibTeX Cite
Full Text: DOI Euclid arXiv
[1] Abbe, E., Fan, J., Wang, K. and Zhong, Y. (2017). Entrywise eigenvector analysis of random matrices with low expected rank. Preprint. Available at arXiv:1709.09565.
[2] Bai, Z. and Silverstein, J. W. (2010). Spectral Analysis of Large Dimensional Random Matrices, 2nd ed. Springer Series in Statistics. Springer, New York. · Zbl 1301.60002
[3] Benaych-Georges, F. and Nadakuditi, R. R. (2011). The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Adv. Math.227 494-521. · Zbl 1226.15023
[4] Benidis, K., Sun, Y., Babu, P. and Palomar, D. P. (2016). Orthogonal sparse eigenvectors: A Procrustes problem. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 4683-4686.
[5] Bhatia, R. (1997). Matrix Analysis. Graduate Texts in Mathematics169. Springer, New York. · Zbl 0863.15001
[6] Bojanczyk, A. W. and Lutoborski, A. (1999). The Procrustes problem for orthogonal Stiefel matrices. SIAM J. Sci. Comput.21 1291-1304. · Zbl 0962.65037
[7] Cai, T. T., Ma, Z. and Wu, Y. (2013). Sparse PCA: Optimal rates and adaptive estimation. Ann. Statist.41 3074-3110. · Zbl 1288.62099
[8] Cai, T. T. and Zhang, A. (2018). Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics. Ann. Statist.46 60-89. · Zbl 1395.62122
[9] Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math.9 717-772. · Zbl 1219.90124
[10] Cape, J., Tang, M. and Priebe, C. E. (2018). Signal-plus-noise matrix models: Eigenvector deviations and fluctuations. Biometrika. To appear. Available at arXiv:1802.00381. · Zbl 07051945
[11] Chen, L., Vogelstein, J. T., Lyzinski, V. and Priebe, C. E. (2016). A joint graph inference case study: The C. elegans chemical and electrical connectome. Worm5 1-8.
[12] Chikuse, Y. (2003). Statistics on Special Manifolds. Lecture Notes in Statistics174. Springer, New York. · Zbl 1026.62051
[13] Davis, C. and Kahan, W. M. (1970). The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal.7 1-46. · Zbl 0198.47201
[14] Dryden, I. L., Koloydenko, A. and Zhou, D. (2009). Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging. Ann. Appl. Stat.3 1102-1123. · Zbl 1196.62063
[15] Dryden, I. L. and Mardia, K. V. (2016). Statistical Shape Analysis with Applications in R, 2nd ed. Wiley Series in Probability and Statistics. Wiley, Chichester. · Zbl 1381.62003
[16] Edelman, A., Arias, T. A. and Smith, S. T. (1999). The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl.20 303-353. · Zbl 0928.65050
[17] Eldar, Y. C. and Kutyniok, G. (2012). Compressed Sensing: Theory and Applications. Cambridge Univ. Press, Cambridge.
[18] Eldridge, J., Belkin, M. and Wang, Y. (2018). Unperturbed: Spectral analysis beyond Davis-Kahan. In: Proceedings of Algorithmic Learning Theory. Proceedings of Machine Learning Research 83, PMLR 321-358. · Zbl 1406.60014
[19] Fan, J., Liao, Y. and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. B. Stat. Methodol.75 603-680. · Zbl 1411.62138
[20] Fan, J., Rigollet, P. and Wang, W. (2015). Estimation of functionals of sparse covariance matrices. Ann. Statist.43 2706-2737. · Zbl 1327.62338
[21] Fan, J., Wang, W. and Zhong, Y. (2017). An \(\ell_{\infty}\) eigenvector perturbation bound and its application to robust covariance estimation. J. Mach. Learn. Res.18 1-42.
[22] Fishkind, D. E., Sussman, D. L., Tang, M., Vogelstein, J. T. and Priebe, C. E. (2013). Consistent adjacency-spectral partitioning for the stochastic block model when the model parameters are unknown. SIAM J. Matrix Anal. Appl.34 23-39. · Zbl 1314.05186
[23] Gower, J. C. and Dijksterhuis, G. B. (2004). Procrustes Problems. Oxford Statistical Science Series30. Oxford Univ. Press, Oxford.
[24] Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Soc. Netw.5 109-137.
[25] Horn, R. A. and Johnson, C. R. (2012). Matrix Analysis. Cambridge Univ. Press, Cambridge.
[26] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist.29 295-327. · Zbl 1016.62078
[27] Jolliffe, I. T. (1986). Principal Component Analysis. Springer Series in Statistics. Springer, New York. · Zbl 1011.62064
[28] Koltchinskii, V. and Lounici, K. (2017). Concentration inequalities and moment bounds for sample covariance operators. Bernoulli23 110-133. · Zbl 1366.60057
[29] Koltchinskii, V. and Lounici, K. (2017). New asymptotic results in principal component analysis. Sankhyā A79 254-297. · Zbl 06822893
[30] Le, C. M., Levina, E. and Vershynin, R. (2016). Optimization via low-rank approximation for community detection in networks. Ann. Statist.44 373-400. · Zbl 1331.62312
[31] Lei, J. and Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. Ann. Statist.43 215-237. · Zbl 1308.62041
[32] Levin, K., Athreya, A., Tang, M., Lyzinski, V. and Priebe, C. E. (2017). A central limit theorem for an omnibus embedding of random dot product graphs. Preprint. Available at arXiv:1705.09355.
[33] Lu, L. and Peng, X. (2013). Spectra of edge-independent random graphs. Electron. J. Combin.20 1-18. · Zbl 1295.05211
[34] Lyzinski, V. (2018). Information recovery in shuffled graphs via graph matching. IEEE Trans. Inform. Theory64 3254-3273. · Zbl 1395.94422
[35] Lyzinski, V., Park, Y., Priebe, C. E. and Trosset, M. (2017). Fast embedding for JOFC using the raw stress criterion. J. Comput. Graph. Statist.26 786-802.
[36] Lyzinski, V., Sussman, D. L., Fishkind, D. E., Pao, H., Chen, L., Vogelstein, J. T., Park, Y. and Priebe, C. E. (2015). Spectral clustering for divide-and-conquer graph matching. Parallel Comput.47 70-87.
[37] Lyzinski, V., Sussman, D. L., Tang, M., Athreya, A. and Priebe, C. E. (2014). Perfect clustering for stochastic blockmodel graphs via adjacency spectral embedding. Electron. J. Stat.8 2905-2922. · Zbl 1308.62131
[38] Mao, X., Sarkar, P. and Chakrabarti, D. (2017). Estimating mixed memberships with sharp eigenvector deviations. Preprint. Available at arXiv:1709.00407.
[39] Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Statist.36 2791-2817. · Zbl 1168.62058
[40] O’Rourke, S., Vu, V. and Wang, K. (2016). Eigenvectors of random matrices: A survey. J. Combin. Theory Ser. A144 361-442. · Zbl 1347.15044
[41] O’Rourke, S., Vu, V. and Wang, K. (2018). Random perturbation of low rank matrices: Improving classical bounds. Linear Algebra Appl.540 26-59. · Zbl 1380.65076
[42] Paul, D. and Aue, A. (2014). Random matrix theory in statistics: A review. J. Statist. Plann. Inference150 1-29. · Zbl 1287.62011
[43] Priebe, C. E., Marchette, D. J., Ma, Z. and Adali, S. (2013). Manifold matching: Joint optimization of fidelity and commensurability. Braz. J. Probab. Stat.27 377-400. · Zbl 1298.62102
[44] Rebrova, E. and Vershynin, R. (2018). Norms of random matrices: Local and global problems. Adv. Math.324 40-83. · Zbl 1380.60016
[45] Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist.39 1878-1915. · Zbl 1227.62042
[46] Rudelson, M. and Vershynin, R. (2015). Delocalization of eigenvectors of random matrices with independent entries. Duke Math. J.164 2507-2538. · Zbl 1352.60007
[47] Sarkar, P. and Bickel, P. J. (2015). Role of normalization in spectral clustering for stochastic blockmodels. Ann. Statist.43 962-990. · Zbl 1320.62150
[48] Stewart, G. W. and Sun, J. G. (1990). Matrix Perturbation Theory. Computer Science and Scientific Computing. Academic Press, Boston, MA. · Zbl 0706.65013
[49] Sussman, D. L., Tang, M., Fishkind, D. E. and Priebe, C. E. (2012). A consistent adjacency spectral embedding for stochastic blockmodel graphs. J. Amer. Statist. Assoc.107 1119-1128. · Zbl 1443.62188
[50] Sussman, D. L., Tang, M. and Priebe, C. E. (2014). Consistent latent position estimation and vertex classification for random dot product graphs. IEEE Trans. Pattern Anal. Mach. Intell.36 48-57.
[51] Tang, M., Athreya, A., Sussman, D. L., Lyzinski, V., Park, Y. and Priebe, C. E. (2017). A semiparametric two-sample hypothesis testing problem for random graphs. J. Comput. Graph. Statist.26 344-354. · Zbl 1450.62040
[52] Tang, M., Athreya, A., Sussman, D. L., Lyzinski, V. and Priebe, C. E. (2017). A nonparametric two-sample hypothesis testing problem for random graphs. Bernoulli23 1599-1630. · Zbl 1450.62040
[53] Tang, M., Cape, J. and Priebe, C. E. (2017). Asymptotically efficient estimators for stochastic blockmodels: The naive MLE, the rank-constrained MLE, and the spectral. Preprint. Available at arXiv:1710.10936.
[54] Tang, M. and Priebe, C. E. (2018). Limit theorems for eigenvectors of the normalized Laplacian for random graphs. Ann. Statist.46 2360-2415. · Zbl 1408.62120
[55] Vershynin, R. (2018). High-dimensional probability: An introduction with applications in data science. Online version 2018-02-09. · Zbl 1430.60005
[56] von Luxburg, U. (2007). A tutorial on spectral clustering. Stat. Comput.17 395-416.
[57] Wedin, P. (1972). Perturbation bounds in connection with singular value decomposition. BIT12 99-111. · Zbl 0239.15015
[58] Weyl, H. (1912). Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung). Math. Ann.71 441-479. · JFM 43.0436.01
[59] Yao, J., Zheng, S. and Bai, Z. (2015). Large Sample Covariance Matrices and High-Dimensional Data Analysis. Cambridge Series in Statistical and Probabilistic Mathematics39. Cambridge Univ. Press, New York. · Zbl 1380.62011
[60] Young, S. J. and Scheinerman, E. R. (2007). Random dot product graph models for social networks. In Algorithms and Models for the Web-Graph. Lecture Notes in Computer Science4863 138-149. Springer, Berlin. · Zbl 1136.05322
[61] Yu, Y., Wang, T. and Samworth, R. J. (2015). A useful variant of the Davis-Kahan theorem for statisticians. Biometrika102 315-323. · Zbl 1452.15010
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.