Minimax sparse principal subspace estimation in high dimensions. (English) Zbl 1288.62103

Summary: We study sparse principal components analysis in high dimensions, where \(p\) (the number of variables) can be much larger than \(n\) (the number of observations), and analyze the problem of estimating the subspace spanned by the principal eigenvectors of the population covariance matrix. We introduce two complementary notions of \(\ell_{q}\) subspace sparsity: row sparsity and column sparsity. We prove nonasymptotic lower and upper bounds on the minimax subspace estimation error for \(0\leq q\leq 1\). The bounds are optimal for row sparse subspaces and nearly optimal for column sparse subspaces, and apply to general classes of covariance matrices, and they show that \(\ell_{q}\) constrained estimates can achieve optimal minimax rates without restrictive spiked covariance conditions. Interestingly, the form of the rates matches known results for sparse regression when the effective noise variance is defined appropriately. Our proof employs a novel variational \(\sin\Theta\) theorem that may be useful in other regularized spectral estimation problems.


62H25 Factor analysis and principal components; correspondence analysis
62H12 Estimation in multivariate analysis
15A18 Eigenvalues, singular values, and eigenvectors
62C20 Minimax procedures in statistical decision theory


Full Text: DOI arXiv Euclid


[1] Amini, A. A. and Wainwright, M. J. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist. 37 2877-2921. · Zbl 1173.62049
[2] Bhatia, R. (1997). Matrix Analysis. Graduate Texts in Mathematics 169 . Springer, New York. · Zbl 0863.15001
[3] Birnbaum, A., Johnstone, I. M., Nadler, B. and Paul, D. (2013). Minimax bounds for sparse PCA with noisy high-dimensional data. Ann. Statist. 41 1055-1084. · Zbl 1292.62071
[4] Chen, X., Zou, C. and Cook, R. D. (2010). Coordinate-independent sparse sufficient dimension reduction and variable selection. Ann. Statist. 38 3696-3723. · Zbl 1204.62107
[5] Chikuse, Y. (2003). Statistics on Special Manifolds. Lecture Notes in Statistics 174 . Springer, New York. · Zbl 1026.62051
[6] d’Aspremont, A., El Ghaoui, L., Jordan, M. I. and Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434-448 (electronic). · Zbl 1128.90050
[7] Donoho, D. L. and Johnstone, I. M. (1994). Minimax risk over \(l_{p}\)-balls for \(l_{q}\)-error. Probab. Theory Related Fields 99 277-303. · Zbl 0802.62006
[8] Edelman, A., Arias, T. A. and Smith, S. T. (1999). The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20 303-353. · Zbl 0928.65050
[9] Gilbert, E. N. (1952). A comparison of signalling alphabets. Bell System Technical Journal 31 504-522.
[10] Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology 24 498-520. · JFM 59.1183.01
[11] Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682-693. · Zbl 1388.62174
[12] Jolliffe, I. T., Trendafilov, N. T. and Uddin, M. (2003). A modified principal component technique based on the LASSO. J. Comput. Graph. Statist. 12 531-547.
[13] Journée, M., Nesterov, Y., Richtárik, P. and Sepulchre, R. (2010). Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11 517-553. · Zbl 1242.62048
[14] Ledoux, M. (2001). The Concentration of Measure Phenomenon. Mathematical Surveys and Monographs 89 . Amer. Math. Soc., Providence, RI. · Zbl 0995.60002
[15] Lounici, K. (2013). Sparse principal component analysis with missing observations. In High Dimensional Probability VI (C. Houdré, D. M. Mason, J. Rosiński and J. A. Wellner, eds.). Progress in Probability 66 327-356. Springer, Basel. · Zbl 1267.62073
[16] Ma, Z. (2013). Sparse principal component analysis and iterative thresholding. Ann. Statist. 41 772-801. · Zbl 1267.62074
[17] Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896 . Springer, Berlin. · Zbl 1170.60006
[18] Mendelson, S. (2010). Empirical processes with a bounded \(\psi_{1}\) diameter. Geom. Funct. Anal. 20 988-1027. · Zbl 1204.60042
[19] Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Statist. 36 2791-2817. · Zbl 1168.62058
[20] Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of \(M\)-estimators with decomposable regularizers. Statist. Sci. 27 538-557. · Zbl 1331.62350
[21] Pajor, A. (1998). Metric entropy of the Grassmann manifold. In Convex Geometric Analysis ( Berkeley , CA , 1996). Mathematical Sciences Research Institute Publications 34 181-188. Cambridge Univ. Press, Cambridge. · Zbl 0942.46013
[22] Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617-1642. · Zbl 1134.62029
[23] Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2 559-572. · JFM 32.0710.04
[24] Raskutti, G., Wainwright, M. J. and Yu, B. (2011). Minimax rates of estimation for high-dimensional linear regression over \(\ell_{q}\)-balls. IEEE Trans. Inform. Theory 57 6976-6994. · Zbl 1365.62276
[25] Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. J. Multivariate Anal. 99 1015-1034. · Zbl 1141.62049
[26] Shen, D., Shen, H. and Marron, J. S. (2013). Consistency of sparse PCA in high dimension, low sample size contexts. J. Multivariate Anal. 115 317-333. · Zbl 1258.62072
[27] Stewart, G. W. and Sun, J. G. (1990). Matrix Perturbation Theory . Academic Press, Boston, MA. · Zbl 0706.65013
[28] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267-288. · Zbl 0850.62538
[29] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes . Springer, New York. · Zbl 0862.60002
[30] Varšamov, R. R. (1957). The evaluation of signals in codes with correction of errors. Dokl. Akad. Nauk SSSR ( N.S ) 117 739-741. · Zbl 0081.36905
[31] Vu, V. Q. and Lei, J. (2012a). Minimax rates of estimation for sparse PCA in high dimensions. In Proceedings of the 15 th International Conference on Artificial Intelligence and Statistics ( AISTATS ). JMLR Workshop and Conference Proceedings Volume 22.
[32] Vu, V. Q. and Lei, J. (2012b). Squared-norm empirical process in Banach space. Available at . 1312.1005
[33] Witten, D. M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 515-534.
[34] Yu, B. (1997). Assouad, Fano, and Le Cam. In Festschrift for Lucien Le Cam (D. Pollard, E. Torgersen and G. L. Yang, eds.) 423-435. Springer, New York. · Zbl 0896.62032
[35] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49-67. · Zbl 1141.62030
[36] Zhao, P., Rocha, G. and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37 3468-3497. · Zbl 1369.62164
[37] Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist. 15 265-286.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.