# zbMATH — the first resource for mathematics

Limiting laws for divergent spiked eigenvalues and largest nonspiked eigenvalue of sample covariance matrices. (English) Zbl 1456.62113
Let $$\mathbf{Y}=\mathbf{\Gamma X}$$ be the data matrix, where $$\mathbf{X}$$ be a $$(p+l)\times n$$ random matrix whose entries are independent with mean means and unit variances and $$\mathbf{\Gamma}$$ is a $$p\times(p+l)$$ deterministic matrix under condition $$l/p\rightarrow0$$. Let $$\mathbf{\Sigma}=\mathbf{\Gamma}\mathbf{\Gamma}^\intercal$$ be the population covariance matrix. The sample covariance matrix in such a case is $S_n=\frac{1}{n}\mathbf{Y}\mathbf{Y}^\intercal=\frac{1}{n}\mathbf{\Gamma X}\mathbf{X}^\intercal\mathbf{\Gamma}^\intercal.$ Let $$\mathbf{V}\mathbf{\Lambda}^{1/2}\mathbf{U}$$ denote the singular value decomposition of matrix $$\mathbf{\Gamma}$$, where $$\mathbf{V}$$ and $$\mathbf{U}$$ are orthogonal matrices and $$\mathbf{\Lambda}$$ is a diagonal matrix consisting in descending order eigenvalues $$\mu_1\geqslant\mu_2\geqslant\ldots\geqslant\mu_p$$ of matrix $$\mathbf{\Sigma}$$.
Authors of the paper suppose that there are $$K$$ spiked eigenvalues that are separated from the rest. They assume that eigenvalues $$\mu_1\geqslant\ldots\geqslant\mu_K$$ tends to infinity, while the other eigenvalues $$\mu_{K+1}\geqslant\ldots\geqslant\mu_p$$ are bounded.
In the paper, the asymptotic behaviour is considered of the spiked eigenvalues and the largest non-spiked eigenvalue. The limiting normal distribution for the spiked sample eigenvalues is established. The limiting Tracy-Widom law for the largest non-spiked eigenvalues is obtained. Estimation of the number of spikes and the convergence of the leading eigenvectors are considered.

##### MSC:
 62H25 Factor analysis and principal components; correspondence analysis 60B20 Random matrices (probabilistic aspects) 15B52 Random matrices (algebraic aspects) 60F05 Central limit and other weak theorems 62H10 Multivariate distribution of statistics
Full Text:
##### References:
 [1] Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, 2nd ed. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. Wiley, New York. · Zbl 0651.62041 [2] Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica 71 135-171. · Zbl 1136.62354 [3] Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica 70 191-221. · Zbl 1103.91399 [4] Bai, J. and Ng, S. (2008). Large dimensional factor analysis. Found. Trends Econ. 3 89-163. [5] Bai, Z. and Silverstein, J. W. (2010). Spectral Analysis of Large Dimensional Random Matrices, 2nd ed. Springer Series in Statistics. Springer, New York. · Zbl 1301.60002 [6] Bai, Z. and Yao, J. (2008). Central limit theorems for eigenvalues in a spiked population model. Ann. Inst. Henri Poincaré Probab. Stat. 44 447-474. · Zbl 1274.62129 [7] Bai, Z. D., Yasunori, F. and Kwok, P. C. (2017). High-dimensional consistency of AIC and BIC for estimating the number of significant components in principal component analysis. Ann. Statist. To appear. · Zbl 1395.62119 [8] Baik, J., Ben Arous, G. and Péché, S. (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33 1643-1697. · Zbl 1086.15022 [9] Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97 1382-1408. · Zbl 1220.15011 [10] Bao, Z., Pan, G. and Zhou, W. (2015). Universality for the largest eigenvalue of sample covariance matrices with general population. Ann. Statist. 43 382-421. · Zbl 1408.60006 [11] Bao, Z. G., Pan, G. M. and Zhou, W. (2014). Local density of the spectrum on the edge for sample covariance matrices with general population. [12] Birnbaum, A., Johnstone, I. M., Nadler, B. and Paul, D. (2013). Minimax bounds for sparse PCA with noisy high-dimensional data. Ann. Statist. 41 1055-1084. · Zbl 1292.62071 [13] Bloemendal, A., Knowles, A., Yau, H.-T. and Yin, J. (2016). On the principal components of sample covariance matrices. Probab. Theory Related Fields 164 459-552. · Zbl 1339.15023 [14] Cai, T., Ma, Z. and Wu, Y. (2015). Optimal estimation and rank detection for sparse spiked covariance matrices. Probab. Theory Related Fields 161 781-815. · Zbl 1314.62130 [15] Cai, T. T, Han, X. and Pan, G. (2019). Supplement to “Limiting laws for divergent spiked eigenvalues and largest nonspiked eigenvalue of sample covariance matrices.” https://doi.org/10.1214/18-AOS1798SUPP. [16] Cai, T. T., Ma, Z. and Wu, Y. (2013). Sparse PCA: Optimal rates and adaptive estimation. Ann. Statist. 41 3074-3110. · Zbl 1288.62099 [17] Chamberlain, G. and Rothschild, M. (1983). Arbitrage, factor structure, and mean-variance analysis on large asset markets. Econometrica 51 1281-1304. · Zbl 0523.90017 [18] Chen, B. B. and Pan, G. M. (2012). Convergence of the largest eigenvalue of normalized sample covariance matrices when $$p$$ and $$n$$ both tend to infinity with their ratio converging to zero. Bernoulli 18 1405-1420. · Zbl 1279.60012 [19] Ding, X. (2015). Convergence of sample eigenvectors of spiked population model. Comm. Statist. Theory Methods 44 3825-3840. · Zbl 1331.15026 [20] El Karoui, N. (2007). Tracy-Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. Ann. Probab. 35 663-714. · Zbl 1117.60020 [21] Fan, J., Liao, Y. and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 603-680. · Zbl 1411.62138 [22] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning, 2nd ed. Springer Series in Statistics. Springer, New York. · Zbl 1273.62005 [23] Hoyle, D. C. and Rattray, M. (2004). Principal-component-analysis eigenvalue spectra from data with symmetry-breaking structure. Phys. Rev. E 69 026124. · Zbl 1078.68121 [24] Jiang, T. (2005). Maxima of entries of Haar distributed matrices. Probab. Theory Related Fields 131 121-144. · Zbl 1067.15021 [25] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295-327. · Zbl 1016.62078 [26] Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682-693. · Zbl 1388.62174 [27] Jung, S. and Marron, J. S. (2009). PCA consistency in high dimension, low sample size context. Ann. Statist. 37 4104-4130. · Zbl 1191.62108 [28] Knowles, A. and Yin, J. (2013). The isotropic semicircle law and deformation of Wigner matrices. Comm. Pure Appl. Math. 66 1663-1750. · Zbl 1290.60004 [29] Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Statist. 36 2791-2817. · Zbl 1168.62058 [30] Onatski, A. (2009). Testing hypotheses about the numbers of factors in large factor models. Econometrica 77 1447-1479. · Zbl 1182.62180 [31] Onatski, A. (2010). Determining the number of factors from empirical distribution of eigenvalues. Rev. Econ. Stat. 92 1004-1016. [32] Onatski, A. (2012). Asymptotics of the principal components estimator of large factor models with weakly influential factors. J. Econometrics 168 244-258. · Zbl 1443.62497 [33] Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617-1642. · Zbl 1134.62029 [34] Shen, D., Shen, H., Zhu, H. and Marron, J. S. (2013). Surprising asymptotic conical structure in critical sample eigen-directions. Available at http://arxiv.org/abs/1303.6171. [35] Stock, J. H. and Watson, M. W. (2002). Forecasting using principal components from a large number of predictors. J. Amer. Statist. Assoc. 97 1167-1179. · Zbl 1041.62081 [36] Wang, W. and Fan, J. (2017). Asymptotics of empirical eigenstructure for high dimensional spiked covariance. Ann. Statist. 45 1342-1374. · Zbl 1373.62299 [37] Yata, K. · Zbl 1236.62065
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.