High-dimensional analysis of semidefinite relaxations for sparse principal components. (English) Zbl 1173.62049

Summary: Principal components analysis (PCA) is a classical method for dimensionality reduction based on extracting the dominant eigenvectors of the sample covariance matrix. However, PCA is well known to behave poorly in the “large \(p\), small \(n\)” setting, where the problem dimension \(p\) is comparable to or larger than the sample size \(n\). This paper studies PCA in this high-dimensional regime, but under the additional assumption that the maximal eigenvector is sparse, say, with at most \(k\) nonzero components.
We consider a spiked covariance model where a base matrix is perturbed by adding a \(k\)-sparse maximal eigenvector, and we analyze two computationally tractable methods for recovering the support set of this maximal eigenvector as follows: (a) a simple diagonal thresholding method, with transitions from success to failure as a function of the rescaled sample size \(\theta _{dia}(n, p, k)=n/[k^{2}\log (p - k)]\); and (b) a more sophisticated semidefinite programming (SDP) relaxation, which succeeds once the rescaled sample size \(\theta _{sdp}(n, p, k)=n/[k\log (p - k)]\) is larger than a critical threshold. In addition, we prove that no method, including the best method which has exponential-time complexity, can succeed in recovering the support if the order parameter \(\theta _{sdp}(n, p, k)\) is below a threshold. Our results thus highlight an interesting trade-off between computational and statistical efficiency in high-dimensional inference.


62H25 Factor analysis and principal components; correspondence analysis
90C90 Applications of mathematical programming
62F12 Asymptotic properties of parametric estimators
65C60 Computational problems in statistics (MSC2010)
60F10 Large deviations


Full Text: DOI arXiv


[1] Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis . Wiley, New York. · Zbl 0651.62041
[2] Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked populations models. J. Multivariate Anal. 97 1382-1408. · Zbl 1220.15011
[3] Bickel, P. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199-227. · Zbl 1132.62040
[4] Birgé, L. (2001). An alternative point of view on Lepski’s method. In State of the Art in Probability and Statistics. IMS Lecture Notes 37 113-133. · Zbl 1373.62142
[5] Buldygin, V. V. and Kozachenko, Y. V. (2000). Metric Characterization of Random Variables and Random Processes . Amer. Math. Soc., Providence, RI. · Zbl 0998.60503
[6] Candes, E. and Tao, T. (2006). The Dantzig selector: Statistical estimation when p is much larger than n . Ann. Statist. 35 2313-2351. · Zbl 1139.62019
[7] Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory . Wiley, New York. · Zbl 0762.94001
[8] d’Aspremont, A., Bannerjee, O. and El Ghaoui, L. (2007). First order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl. 30 56-66. · Zbl 1156.90423
[9] d’Aspremont, A., El Ghaoui, L., Jordan, M. I. and Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434-448. · Zbl 1128.90050
[10] Davidson, K. R. and Szarek, S. J. (2001). Local operator theory, random matrices, and Banach spaces. In Handbook of Banach Spaces 1 317-336. Elsevier, Amsterdam, NL. · Zbl 1067.46008
[11] Donoho, D. (2006). Compressed sensing. IEEE Trans. Info. Theory 52 1289-1306. · Zbl 1288.94016
[12] El Karoui, N. (2008). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist. 36 2717-2756. · Zbl 1196.62064
[13] Geman, S. (1980). A limit theorem for the norm of random matrices. Ann. Probab. 8 252-261. · Zbl 0428.60039
[14] Gradshteyn, I. S. and Ryzhik, I. M. (2000). Tables of Integrals , Series , and Products . Academic Press, New York, NY. · Zbl 0981.65001
[15] Grimmett, G. R. and Stirzaker, D. R. (1992). Probability and Random Processes . Oxford Science Publications, Clarendon Press, Oxford. · Zbl 0759.60002
[16] Has’minskii, R. Z. (1978). A lower bound on the risks of nonparametric estimates of densities in the uniform metric. Theory Probab. Appl. 23 794-798. · Zbl 0449.62032
[17] Hiriart-Urruty, J. and Lemaréchal, C. (1993). Convex Analysis and Minimization Algorithms 1 . Springer, New York. · Zbl 0795.49001
[18] Horn, R. A. and Johnson, C. R. (1986). Matrix Analysis . Cambridge Univ. Press, New York, NY.
[19] Johnstone, I. M. (2001). Chi-square oracle inequalities. In State of the Art in Probability and Statistics IMS Lecture Notes 37 (M. de Gunst, C. Klaassen and A. van der Vaart, eds.) 399-418. Institute of Mathematical Statistics. · Zbl 1373.62062
[20] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295-327. · Zbl 1016.62078
[21] Johnstone, I. M. and Lu, A. (2004). Sparse principal components. Technical report, Stanford Univ.
[22] Jolliffe, I. T. (2004). Principal Component Analysis . Springer, New York. · Zbl 1011.62064
[23] Jolliffe, I. T., Trendafilov, N. T. and Uddin, M. (2003). A modified principal component technique based on the LASSO. J. Comput. Graph. Statist. 12 531-547. JSTOR:
[24] Laurent, B. and Massart, P. (1998). Adaptive estimation of a quadratic functional by model selection. Ann. Statist. 28 1303-1338. · Zbl 1105.62328
[25] Ledoux, M. (2001). The Concentration of Measure Phenomenon . Amer. Math. Soc., Providence, RI. · Zbl 0995.60002
[26] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces : Isoperimetry and Processes . Springer, New York. · Zbl 0748.60004
[27] Matousek, J. (2002). Lectures on Discrete Geometry . Springer, New York.
[28] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082
[29] Moghaddam, B., Weiss, Y. and Avidan, S. (2005). Spectral bounds for sparse PCA: Exact and greedy algorithms. In Neural Information Processing Systems ( NIPS ). Vancouver, Canada.
[30] Paul, D. (2005). Nonparametric estimation of principal components. Ph.D. thesis, Stanford Univ.
[31] Paul, D. (2007). Asymptotics of sample eigenstructure for a large-dimensional spiked covariance model. Statist. Sinica 17 1617-1642. · Zbl 1134.62029
[32] Paul, D. and Johnstone, I. (2008). Augmented sparse principal component analysis for high-dimensional data. Technical report, UC Davis.
[33] Rockafellar, G. (1970). Convex Analysis . Princeton Univ. Press, Princeton. · Zbl 0193.18401
[34] Stewart, G. W. and Sun, J. G. (1990). Matrix Perturbation Theory . Academic Press, New York. · Zbl 0706.65013
[35] Tropp, J. (2006). Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theory 52 1030-1051. · Zbl 1288.94025
[36] Wainwright, M. J. (2006). Sharp thresholds for high-dimensional and noisy recovery of sparsity using the Lasso. In IEEE Trans. Inform. Theory . Technical Report 709, Dept. Statistics, UC Berkeley.
[37] Wainwright, M. J. (2007). Information-theoretic bounds for sparsity recovery in the high-dimensional and noisy setting. In Int. Symposium on Information Theory . Technical Report 725, Dept. Statistics, UC Berkeley.
[38] Yang, Y. and Barron, A. (1999). Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 1564-1599. · Zbl 0978.62008
[39] Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19-35. · Zbl 1142.62408
[40] Zhang, Z., Zha, H. and Simon, H. (2002). Low-rank approximations with sparse factors I: Basic algorithms and error analysis. SIAM J. Matrix Anal. Appl. 23 706-727. · Zbl 1003.65041
[41] Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist. 15 262-286.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.