×

zbMATH — the first resource for mathematics

Sparsistency and agnostic inference in sparse PCA. (English) Zbl 1308.62125
Summary: The presence of a sparse “truth” has been a constant assumption in the theoretical analysis of sparse PCA and is often implicit in its methodological development. This naturally raises questions about the properties of sparse PCA methods and how they depend on the assumption of sparsity. Under what conditions can the relevant variables be selected consistently if the truth is assumed to be sparse? What can be said about the results of sparse PCA without assuming a sparse and unique truth? We answer these questions by investigating the properties of the recently proposed Fantope projection and selection (FPS) method in the high-dimensional setting. Our results provide general sufficient conditions for sparsistency of the FPS estimator. These conditions are weak and can hold in situations where other estimators are known to fail. On the other hand, without assuming sparsity or identifiability, we show that FPS provides a sparse, linear dimension-reducing transformation that is close to the best possible in terms of maximizing the predictive covariance.

MSC:
62H25 Factor analysis and principal components; correspondence analysis
62H12 Estimation in multivariate analysis
Software:
ElemStatLearn; PMA
PDF BibTeX XML Cite
Full Text: DOI Euclid arXiv
References:
[1] Akaike, H. (1973). Information theory and an extension of the likelihood principle. In Proceedings of the Second International Symposium of Information Theory . Akadémiai Kiado, Budapest. · Zbl 0283.62006
[2] Amini, A. A. and Wainwright, M. J. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist. 37 2877-2921. · Zbl 1173.62049 · doi:10.1214/08-AOS664 · arxiv:0803.4026
[3] Berk, R. H. (1966). Limiting behavior of posterior distributions when the model is incorrect. Ann. Math. Statist. 37 51-58; Correction, Ibid 37 745-746. · Zbl 0151.23802 · doi:10.1214/aoms/1177699597
[4] Berthet, Q. and Rigollet, P. (2013a). Optimal detection of sparse principal components in high dimension. Ann. Statist. 41 1780-1815. · Zbl 1277.62155 · doi:10.1214/13-AOS1127 · euclid:aos/1378386239 · arxiv:1202.5070
[5] Berthet, Q. and Rigollet, P. (2013b). Computational lower bounds for sparse PCA. Preprint. Available at . · Zbl 1277.62155 · arxiv.org
[6] Birnbaum, A., Johnstone, I. M., Nadler, B. and Paul, D. (2013). Minimax bounds for sparse PCA with noisy high-dimensional data. Ann. Statist. 41 1055-1084. · Zbl 1292.62071 · doi:10.1214/12-AOS1014 · euclid:aos/1371150893 · arxiv:1203.0967
[7] Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2010). Distributed optimization and statistical learning via the alternating direction method of multipliers. Faund. Trends Mach. Learn. 3 1-122. · Zbl 1229.90122 · doi:10.1561/2200000016
[8] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data : Methods , Theory and Applications . Springer, Heidelberg. · Zbl 1273.62015 · doi:10.1007/978-3-642-20192-9
[9] Buja, A., Hastie, T. and Tibshirani, R. (1989). Linear smoothers and additive models. Ann. Statist. 17 453-555. · Zbl 0689.62029 · doi:10.1214/aos/1176347115
[10] Cai, T. T., Ma, Z. and Wu, Y. (2013). Sparse PCA: Optimal rates and adaptive estimation. Ann. Statist. 41 3074-3110. · Zbl 1288.62099 · doi:10.1214/13-AOS1178 · euclid:aos/1388545679 · arxiv:1211.1309
[11] d’Aspremont, A., Bach, F. and El Ghaoui, L. (2008). Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res. 9 1269-1294. · Zbl 1225.68170 · www.jmlr.org
[12] d’Aspremont, A., El Ghaoui, L., Jordan, M. I. and Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434-448 (electronic). · Zbl 1128.90050 · doi:10.1137/050645506
[13] Deshpande, Y. and Montanari, A. (2013). Finding hidden cliques of size \(\sqrt{N/e}\) in nearly linear time. Preprint. Available at . · Zbl 1347.05227 · doi:10.1007/s10208-014-9215-y · arxiv:1304.7047
[14] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. · Zbl 1073.62547 · doi:10.1198/016214501753382273
[15] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971-988. · Zbl 1055.62078 · doi:10.3150/bj/1106314846
[16] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning : Data Mining , Inference , and Prediction , 2nd ed. Springer, New York. · Zbl 1273.62005 · doi:10.1007/978-0-387-84858-7
[17] Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 498-520. · JFM 59.1182.04
[18] Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability ( Berkeley , Calif. , 1965 / 66), Vol. I : Statistics 221-233. Univ. California Press, Berkeley.
[19] Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682-693. · Zbl 1388.62174 · doi:10.1198/jasa.2009.0121
[20] Jolliffe, I. T., Trendafilov, N. T. and Uddin, M. (2003). A modified principal component technique based on the LASSO. J. Comput. Graph. Statist. 12 531-547. · doi:10.1198/1061860032148
[21] Journée, M., Nesterov, Y., Richtárik, P. and Sepulchre, R. (2010). Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11 517-553. · Zbl 1242.62048 · www.jmlr.org
[22] Kearns, M. J., Schapire, R. E. and Sellie, L. M. (1994). Toward efficient agnostic learning. Mach. Learn. 17 115-141. · Zbl 0938.68797 · doi:10.1007/BF00993468
[23] Krauthgamer, R., Nadler, B. and Vilenchik, D. (2013). Do semidefinite relaxations solve sparse PCA up to the information limit? Preprint. Available at . · Zbl 1320.62138 · doi:10.1214/15-AOS1310 · euclid:aos/1431695645 · arxiv:1306.3690
[24] Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254-4278. · Zbl 1191.62101 · doi:10.1214/09-AOS720 · arxiv:0711.3933
[25] Lounici, K. (2013). Sparse principal component analysis with missing observations. Progr. Probab. 66 327-356. · Zbl 1267.62073 · doi:10.1007/978-3-0348-0490-5_20 · arxiv:1205.7060
[26] Ma, Z. (2013). Sparse principal component analysis and iterative thresholding. Ann. Statist. 41 772-801. · Zbl 1267.62074 · doi:10.1214/13-AOS1097 · euclid:aos/1368018173 · arxiv:1112.2432
[27] Mackey, L. W. (2009). Deflation methods for sparse PCA. In Advances in Neural Information Processing Systems 21 (D. Koller, D. Schuurmans, Y. Bengio and L. Bottou, eds.) 1017-1024. Curran Associates, Red Hook, NY.
[28] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281 · arxiv:math/0608017
[29] Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of \(M\)-estimators with decomposable regularizers. Statist. Sci. 27 538-557. · Zbl 1331.62350 · doi:10.1214/12-STS400 · euclid:ss/1356098555
[30] Overton, M. L. and Womersley, R. S. (1992). On the sum of the largest eigenvalues of a symmetric matrix. SIAM J. Matrix Anal. Appl. 13 41-45. · Zbl 0747.15005 · doi:10.1137/0613006
[31] Paul, D. and Johnstone, I. M. (2012). Augmented sparse principal component analysis for high dimensional data. Preprint. Available at . · arxiv.org
[32] Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philos. Mag. 2 559-572. · JFM 32.0246.07
[33] Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. (2011). High-dimensional covariance estimation by minimizing \(\ell_1\)-penalized log-determinant divergence. Electron. J. Stat. 5 935-980. · Zbl 1274.62190 · doi:10.1214/11-EJS631 · euclid:ejs/1316092865 · arxiv:0811.3628
[34] Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494-515. · Zbl 1320.62135 · doi:10.1214/08-EJS176 · euclid:ejs/1214491853 · arxiv:0801.4837
[35] Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. J. Multivariate Anal. 99 1015-1034. · Zbl 1141.62049 · doi:10.1016/j.jmva.2007.06.007
[36] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes : With Applications to Statistics . Springer, New York. · Zbl 0862.60002
[37] Vu, V. Q. and Lei, J. (2012). Minimax rates of estimation for sparse PCA in high dimensions. In Proc. Fifteenth International Conference on Artificial Intelligence and Statistics JMLR W&CP 22 1278-1286.
[38] Vu, V. Q. and Lei, J. (2013). Minimax sparse principal subspace estimation in high dimensions. Ann. Statist. 41 2905-2947. · Zbl 1288.62103 · doi:10.1214/13-AOS1151 · euclid:aos/1388545673 · arxiv:1211.0373
[39] Vu, V. Q., Cho, J., Lei, J. and Rohe, K. (2013). Fantope projection and selection: A near-optimal convex relaxation of sparse PCA. In Advances in Neural Information Processing Systems ( NIPS ) 26 (C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani and K. Q. Weinberger, eds.) 2670-2678. Curran Associates, Red Hook, NY.
[40] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using \(\ell_1\)-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183-2202. · Zbl 1367.62220 · doi:10.1109/TIT.2009.2016018
[41] White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50 1-25. · Zbl 0478.62088 · doi:10.2307/1912526
[42] Witten, D. M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 515-534.
[43] Yuan, X.-T. and Zhang, T. (2013). Truncated power method for sparse eigenvalue problems. J. Mach. Learn. Res. 14 899-925. · Zbl 1320.62141 · www.jmlr.org
[44] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008 · www.jmlr.org
[45] Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist. 15 265-286. · doi:10.1198/106186006X113430
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.