# zbMATH — the first resource for mathematics

Limiting laws for divergent spiked eigenvalues and largest nonspiked eigenvalue of sample covariance matrices. (English) Zbl 1456.62113
Let $$\mathbf{Y}=\mathbf{\Gamma X}$$ be the data matrix, where $$\mathbf{X}$$ be a $$(p+l)\times n$$ random matrix whose entries are independent with mean means and unit variances and $$\mathbf{\Gamma}$$ is a $$p\times(p+l)$$ deterministic matrix under condition $$l/p\rightarrow0$$. Let $$\mathbf{\Sigma}=\mathbf{\Gamma}\mathbf{\Gamma}^\intercal$$ be the population covariance matrix. The sample covariance matrix in such a case is $S_n=\frac{1}{n}\mathbf{Y}\mathbf{Y}^\intercal=\frac{1}{n}\mathbf{\Gamma X}\mathbf{X}^\intercal\mathbf{\Gamma}^\intercal.$ Let $$\mathbf{V}\mathbf{\Lambda}^{1/2}\mathbf{U}$$ denote the singular value decomposition of matrix $$\mathbf{\Gamma}$$, where $$\mathbf{V}$$ and $$\mathbf{U}$$ are orthogonal matrices and $$\mathbf{\Lambda}$$ is a diagonal matrix consisting in descending order eigenvalues $$\mu_1\geqslant\mu_2\geqslant\ldots\geqslant\mu_p$$ of matrix $$\mathbf{\Sigma}$$.
Authors of the paper suppose that there are $$K$$ spiked eigenvalues that are separated from the rest. They assume that eigenvalues $$\mu_1\geqslant\ldots\geqslant\mu_K$$ tends to infinity, while the other eigenvalues $$\mu_{K+1}\geqslant\ldots\geqslant\mu_p$$ are bounded.
In the paper, the asymptotic behaviour is considered of the spiked eigenvalues and the largest non-spiked eigenvalue. The limiting normal distribution for the spiked sample eigenvalues is established. The limiting Tracy-Widom law for the largest non-spiked eigenvalues is obtained. Estimation of the number of spikes and the convergence of the leading eigenvectors are considered.

##### MSC:
 62H25 Factor analysis and principal components; correspondence analysis 60B20 Random matrices (probabilistic aspects) 15B52 Random matrices (algebraic aspects) 60F05 Central limit and other weak theorems 62H10 Multivariate distribution of statistics
Full Text:
##### References:
  Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, 2nd ed. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. Wiley, New York. · Zbl 0651.62041  Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica 71 135-171. · Zbl 1136.62354  Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica 70 191-221. · Zbl 1103.91399  Bai, J. and Ng, S. (2008). Large dimensional factor analysis. Found. Trends Econ. 3 89-163.  Bai, Z. and Silverstein, J. W. (2010). Spectral Analysis of Large Dimensional Random Matrices, 2nd ed. Springer Series in Statistics. Springer, New York. · Zbl 1301.60002  Bai, Z. and Yao, J. (2008). Central limit theorems for eigenvalues in a spiked population model. Ann. Inst. Henri Poincaré Probab. Stat. 44 447-474. · Zbl 1274.62129  Bai, Z. D., Yasunori, F. and Kwok, P. C. (2017). High-dimensional consistency of AIC and BIC for estimating the number of significant components in principal component analysis. Ann. Statist. To appear. · Zbl 1395.62119  Baik, J., Ben Arous, G. and Péché, S. (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33 1643-1697. · Zbl 1086.15022  Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97 1382-1408. · Zbl 1220.15011  Bao, Z., Pan, G. and Zhou, W. (2015). Universality for the largest eigenvalue of sample covariance matrices with general population. Ann. Statist. 43 382-421. · Zbl 1408.60006  Bao, Z. G., Pan, G. M. and Zhou, W. (2014). Local density of the spectrum on the edge for sample covariance matrices with general population.  Birnbaum, A., Johnstone, I. M., Nadler, B. and Paul, D. (2013). Minimax bounds for sparse PCA with noisy high-dimensional data. Ann. Statist. 41 1055-1084. · Zbl 1292.62071  Bloemendal, A., Knowles, A., Yau, H.-T. and Yin, J. (2016). On the principal components of sample covariance matrices. Probab. Theory Related Fields 164 459-552. · Zbl 1339.15023  Cai, T., Ma, Z. and Wu, Y. (2015). Optimal estimation and rank detection for sparse spiked covariance matrices. Probab. Theory Related Fields 161 781-815. · Zbl 1314.62130  Cai, T. T, Han, X. and Pan, G. (2019). Supplement to “Limiting laws for divergent spiked eigenvalues and largest nonspiked eigenvalue of sample covariance matrices.” https://doi.org/10.1214/18-AOS1798SUPP.  Cai, T. T., Ma, Z. and Wu, Y. (2013). Sparse PCA: Optimal rates and adaptive estimation. Ann. Statist. 41 3074-3110. · Zbl 1288.62099  Chamberlain, G. and Rothschild, M. (1983). Arbitrage, factor structure, and mean-variance analysis on large asset markets. Econometrica 51 1281-1304. · Zbl 0523.90017  Chen, B. B. and Pan, G. M. (2012). Convergence of the largest eigenvalue of normalized sample covariance matrices when $$p$$ and $$n$$ both tend to infinity with their ratio converging to zero. Bernoulli 18 1405-1420. · Zbl 1279.60012  Ding, X. (2015). Convergence of sample eigenvectors of spiked population model. Comm. Statist. Theory Methods 44 3825-3840. · Zbl 1331.15026  El Karoui, N. (2007). Tracy-Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. Ann. Probab. 35 663-714. · Zbl 1117.60020  Fan, J., Liao, Y. and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 603-680. · Zbl 1411.62138  Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning, 2nd ed. Springer Series in Statistics. Springer, New York. · Zbl 1273.62005  Hoyle, D. C. and Rattray, M. (2004). Principal-component-analysis eigenvalue spectra from data with symmetry-breaking structure. Phys. Rev. E 69 026124. · Zbl 1078.68121  Jiang, T. (2005). Maxima of entries of Haar distributed matrices. Probab. Theory Related Fields 131 121-144. · Zbl 1067.15021  Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295-327. · Zbl 1016.62078  Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682-693. · Zbl 1388.62174  Jung, S. and Marron, J. S. (2009). PCA consistency in high dimension, low sample size context. Ann. Statist. 37 4104-4130. · Zbl 1191.62108  Knowles, A. and Yin, J. (2013). The isotropic semicircle law and deformation of Wigner matrices. Comm. Pure Appl. Math. 66 1663-1750. · Zbl 1290.60004  Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Statist. 36 2791-2817. · Zbl 1168.62058  Onatski, A. (2009). Testing hypotheses about the numbers of factors in large factor models. Econometrica 77 1447-1479. · Zbl 1182.62180  Onatski, A. (2010). Determining the number of factors from empirical distribution of eigenvalues. Rev. Econ. Stat. 92 1004-1016.  Onatski, A. (2012). Asymptotics of the principal components estimator of large factor models with weakly influential factors. J. Econometrics 168 244-258. · Zbl 1443.62497  Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617-1642. · Zbl 1134.62029  Shen, D., Shen, H., Zhu, H. and Marron, J. S. (2013). Surprising asymptotic conical structure in critical sample eigen-directions. Available at http://arxiv.org/abs/1303.6171.  Stock, J. H. and Watson, M. W. (2002). Forecasting using principal components from a large number of predictors. J. Amer. Statist. Assoc. 97 1167-1179. · Zbl 1041.62081  Wang, W. and Fan, J. (2017). Asymptotics of empirical eigenstructure for high dimensional spiked covariance. Ann. Statist. 45 1342-1374. · Zbl 1373.62299  Yata, K. · Zbl 1236.62065
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.