Rate-optimal posterior contraction for sparse PCA. (English) Zbl 1312.62078

Summary: Principal component analysis (PCA) is possibly one of the most widely used statistical tools to recover a low-rank structure of the data. In the high-dimensional settings, the leading eigenvector of the sample covariance can be nearly orthogonal to the true eigenvector. A sparse structure is then commonly assumed along with a low rank structure. Recently, minimax estimation rates of sparse PCA were established under various interesting settings. On the other side, Bayesian methods are becoming more and more popular in high-dimensional estimation, but there is little work to connect frequentist properties and Bayesian methodologies for high-dimensional data analysis. In this paper, we propose a prior for the sparse PCA problem and analyze its theoretical properties. The prior adapts to both sparsity and rank. The posterior distribution is shown to contract to the truth at optimal minimax rates. In addition, a computationally efficient strategy for the rank-one case is discussed.


62H25 Factor analysis and principal components; correspondence analysis
62G05 Nonparametric estimation
62F15 Bayesian inference
Full Text: DOI arXiv Euclid


[1] Amini, A. A. and Wainwright, M. J. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist. 37 2877-2921. · Zbl 1173.62049
[2] Banerjee, S. and Ghosal, S. (2014). Posterior convergence rates for estimating large precision matrices using graphical models. Electron. J. Stat. 8 2111-2137. · Zbl 1302.62124
[3] Barron, A. R. (1988). The exponential convergence of posterior probabilities with implications for Bayes estimators of density functions. Technical report, Univ. Illinois, Champaign, IL.
[4] Barron, A. R. (1989). Uniformly powerful goodness of fit tests. Ann. Statist. 17 107-124. · Zbl 0674.62032
[5] Barron, A. R. (1999). Information-theoretic characterization of Bayes performance and the choice of priors in parametric and nonparametric problems. In Bayesian Statistics 6 ( Alcoceber , 1998) 27-52. Oxford Univ. Press, New York. · Zbl 0974.62020
[6] Barron, A., Schervish, M. J. and Wasserman, L. (1999). The consistency of posterior distributions in nonparametric problems. Ann. Statist. 27 536-561. · Zbl 0980.62039
[7] Bhattacharya, A., Pati, D., Pillai, N. S. and Dunson, D. B. (2012). Bayesian shrinkage. Preprint. Available at . · Zbl 1419.62050
[8] Birnbaum, A., Johnstone, I. M., Nadler, B. and Paul, D. (2013). Minimax bounds for sparse PCA with noisy high-dimensional data. Ann. Statist. 41 1055-1084. · Zbl 1292.62071
[9] Bishop, C. M. (1999a). Bayesian PCA. Adv. Neural Inf. Process. Syst. 11 382-388.
[10] Bishop, C. M. (1999b). Variational principal components. In Proceedings Ninth International Conference on Artificial Neural Networks , ICANN’ 99 1 509-514. IET.
[11] Cai, T., Liu, W. and Luo, X. (2011). A constrained \(\ell_1\) minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594-607. · Zbl 1232.62087
[12] Cai, T. T., Ma, Z. and Wu, Y. (2013). Sparse PCA: Optimal rates and adaptive estimation. Ann. Statist. 41 3074-3110. · Zbl 1288.62099
[13] Cai, T., Ma, Z. and Wu, Y. (2014). Optimal estimation and rank detection for sparse spiked covariance matrices. Probab. Theory Related Fields . · Zbl 1314.62130
[14] Castillo, I. (2008). Lower bounds for posterior rates with Gaussian process priors. Electron. J. Stat. 2 1281-1299. · Zbl 1320.62067
[15] Castillo, I. and van der Vaart, A. (2012). Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann. Statist. 40 2069-2101. · Zbl 1257.62025
[16] d’Aspremont, A., El Ghaoui, L., Jordan, M. I. and Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434-448 (electronic). · Zbl 1128.90050
[17] Davis, C. and Kahan, W. M. (1970). The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal. 7 1-46. · Zbl 0198.47201
[18] Gao, C. and Zhou, H. H. (2015). Supplement to “Rate-optimal posterior contraction for sparse PCA.” . · Zbl 1312.62078
[19] Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500-531. · Zbl 1105.62315
[20] Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682-693. · Zbl 1388.62174
[21] Jolliffe, I. T. (1986). Principal Component Analysis . Springer, New York. · Zbl 1011.62064
[22] Le Cam, L. (1973). Convergence of estimates under dimensionality restrictions. Ann. Statist. 1 38-53. · Zbl 0255.62006
[23] Ma, Z. (2013). Sparse principal component analysis and iterative thresholding. Ann. Statist. 41 772-801. · Zbl 1267.62074
[24] Pati, D., Bhattacharya, A., Pillai, N. S. and Dunson, D. (2014). Posterior contraction in sparse Bayesian factor models for massive covariance matrices. Ann. Statist. 42 1102-1130. · Zbl 1305.62124
[25] Paul, D. and Johnstone, I. M. (2012). Augmented sparse principal component analysis for high dimensional data. Preprint. Available at .
[26] Pollard, D. (1990). Empirical processes: Theory and applications. In NSF-CBMS Regional Conference Series in Probability and Statistics 2 1-86. IMS, Hayward, CA. · Zbl 0741.60001
[27] Polson, N. G. and Scott, J. G. (2011). Shrink globally, act locally: Sparse Bayesian regularization and prediction. In Bayesian Statistics 9 501-538. Oxford Univ. Press, Oxford.
[28] Schwartz, L. (1965). On Bayes procedures. Z. Wahrsch. Verw. Gebiete 4 10-26. · Zbl 0158.17606
[29] Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions. Ann. Statist. 29 687-714. · Zbl 1041.62022
[30] van der Pas, S. L., Kleijn, B. J. K. and van der Vaart, A. W. (2014). The horseshoe estimator: Posterior concentration around nearly black vectors. Electron. J. Stat. 8 2585-2618. · Zbl 1309.62060
[31] van der Vaart, A. W. and van Zanten, J. H. (2008). Rates of contraction of posterior distributions based on Gaussian process priors. Ann. Statist. 36 1435-1463. · Zbl 1141.60018
[32] Vershynin, R. (2010). Introduction to the nonasymptotic analysis of random matrices. Preprint. Available at .
[33] Vu, V. Q. and Lei, J. (2013). Minimax sparse principal subspace estimation in high dimensions. Ann. Statist. 41 2905-2947. · Zbl 1288.62103
[34] Zhao, L. H. (2000). Bayesian aspects of some nonparametric problems. Ann. Statist. 28 532-552. · Zbl 1010.62025
[35] Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist. 15 265-286.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.