×

Statistical eigen-inference from large Wishart matrices. (English) Zbl 1168.62056

Summary: We consider settings where the observations are drawn from a zero-mean multivariate (real or complex) normal distribution with the population covariance matrix having eigenvalues of arbitrary multiplicity. We assume that the eigenvectors of the population covariance matrix are unknown and focus on inferential procedures that are based on the sample eigenvalues alone (i.e., “eigen-inference”).
Results found in the literature establish the asymptotic normality of the fluctuations in the trace of the powers of the sample covariance matrix. We develop concrete algorithms for analytically computing the limiting quantities and the covariance of the fluctuations. We exploit the asymptotic normality of the trace of the powers of the sample covariance matrix to develop eigenvalue-based procedures for testing and estimation.
Specifically, we formulate a simple test of hypotheses for the population eigenvalues and a technique for estimating the population eigenvalues in settings where the cumulative distribution function of the (nonrandom) population eigenvalues has a staircase structure. Monte Carlo simulations are used to demonstrate the superiority of the proposed methodologies over classical techniques and the robustness of the proposed techniques in high-dimensional, (relatively) small sample size settings. The improved performance results from the fact that the proposed inference procedures are “global” (in a sense that we describe) and exploit “global” information thereby overcoming the inherent biases that cripple classical inference procedures which are “local” and rely on “local” information.

MSC:

62H15 Hypothesis testing in multivariate analysis
62E20 Asymptotic distribution theory in statistics
65C60 Computational problems in statistics (MSC2010)
15B52 Random matrices (algebraic aspects)

Software:

MOPS

References:

[1] Anderson, G. W. and Zeitouni, O. (2006). A CLT for a band matrix model. Probab. Theory Related Fields 134 283-338. · Zbl 1084.60014 · doi:10.1007/s00440-004-0422-3
[2] Anderson, T. W. (1963). Asymptotic theory of principal component analysis. Ann. Math. Statist. 34 122-248. · Zbl 0202.49504 · doi:10.1214/aoms/1177704248
[3] Bai, Z. D. and Silverstein, J. W. (1998). No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices. Ann. Probab. 26 316-345. · Zbl 0937.60017 · doi:10.1214/aop/1022855421
[4] Bai, Z. D. and Silverstein, J. W. (2004). CLT for linear spectral statistics of large-dimensional sample covariance matrices. Ann. Probab. 32 553-605. · Zbl 1063.60022 · doi:10.1214/aop/1078415845
[5] Bai, Z. D. and Silverstein, J. W. (2006). Spectral Analysis of Large Dimensional Random Matrices . Science Press, Beijing. · Zbl 1196.60002
[6] Baik, J., Ben Arous, G. and Péché, S. (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33 1643-1697. · Zbl 1086.15022 · doi:10.1214/009117905000000233
[7] Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97 1382-1408. · Zbl 1220.15011 · doi:10.1016/j.jmva.2005.08.003
[8] Butler, R. W. and Wood, A. T. A. (2002). Laplace approximations for hypergeometric functions with matrix argument. Ann. Statist. 30 1155-1177. · Zbl 1029.62047 · doi:10.1214/aos/1031689021
[9] Butler, R. W. and Wood, A. T. A. (2005). Laplace approximations to hypergeometric functions of two matrix arguments. J. Multivariate Anal. 94 1-18. · Zbl 1075.62015 · doi:10.1016/j.jmva.2004.05.010
[10] Collins, B., Mingo, J., Śniady, P. and Speicher, R. (2007). Second order freeness and fluctuations of random matrices. III. Higher order freeness and free cumulants. Doc. Math. 12 1-70. · Zbl 1123.46047
[11] Dey, D. K. and Srinivasan, C. (1985). Estimation of a covariance matrix under Stein’s loss. Ann. Statist. 13 1581-1591. · Zbl 0582.62042 · doi:10.1214/aos/1176349756
[12] Dumitriu, I., Edelman, A. and Shuman, G. (2007). MOPS: Multivariate orthogonal polynomials (symbolically). Symbolic Comput. 42 587-620. · Zbl 1122.33019 · doi:10.1016/j.jsc.2007.01.005
[13] Dumitriu, I. and Rassart, E. (2003). Path counting and random matrix theory. Electron. J. Combin. 7 R-43. · Zbl 1031.05017
[14] El Karoui, N. (2006). Spectrum estimation for large dimensional covariance matrices using random matrix theory. Available at http://arxiv.org/abs/math.ST/0609418.
[15] El Karoui, N. (2007). Tracy-Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. Ann. Probab. 35 663-714. · Zbl 1117.60020 · doi:10.1214/009117906000000917
[16] Haff, L. R. (1980). Empirical Bayes estimation of the multivariate normal covariance matrix. Ann. Statist. 8 586-597. · Zbl 0441.62045 · doi:10.1214/aos/1176345010
[17] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295-327. · Zbl 1016.62078 · doi:10.1214/aos/1009210544
[18] Ledoit, O. and Wolf, M. (2002). Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann. Statist. 30 1081-1102. · Zbl 1029.62049 · doi:10.1214/aos/1031689018
[19] Mingo, J. A. and Speicher, R. (2006). Second order freeness and fluctuations of random matrices. I. Gaussian and Wishart matrices and cyclic Fock spaces. J. Funct. Anal. 235 226-270. · Zbl 1100.46040 · doi:10.1016/j.jfa.2005.10.007
[20] Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory . Wiley, New York. · Zbl 0556.62028
[21] Nadakuditi, R. R. (2007). Applied stochastic eigen-analysis. Ph.D. dissertation, Massachusetts Institute of Technology, Dept. Electrical Engineering and Computer Science.
[22] Nica, A. and Speicher, R. (2006). Lectures on the Combinatorics of Free Probability. London Mathematical Society Lecture Note Series 335 . Cambridge Univ. Press. · Zbl 1133.60003 · doi:10.1017/CBO9780511735127
[23] Paul, D. (2005). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617-1642. · Zbl 1134.62029
[24] Rao, N. R. (2006). RMTool: A random matrix calculator in MATLAB. Available online at http://www.mit.edu/ raj/rmtool.
[25] Rao, N. R. and Edelman, A. (2006). Free probability, sample covariance matrices and signal processing. In Proceedings of ICASSP 5 V-1001-V-1004.
[26] Silverstein, J. W. and Combettes, J. W. (1992). Signal detection via spectral theory of large dimensional random matrices. IEEE Trans. Signal Process. 40 2100-2105. No. 8.
[27] Smith, S. T. (2005). Covariance, subspace, and intrinsic Cramér-Rao bounds. IEEE Trans. Signal Process. 53 1610-1630. No. 5. · Zbl 1370.94242
[28] Srivastava, M. S. (2005). Some tests concerning the covariance matrix in high-dimensional data. J. Japan Statist. Soc. 35 251-272.
[29] Srivastava, M. S. (2006). Some tests criteria for the covariance matrix with fewer observations than the dimension. Acta Comment. Univ. Tartu. Math. 10 77-93. · Zbl 1136.62353
[30] Srivastava, M. S. (2007). Multivariate theory for analyzing high-dimensional data. J. Japan Statis. Soc. 37 53-86. · Zbl 1140.62047
[31] Tracy, C. and Widom, H. (1994). Level-spacing distributions and the Airy kernel. Commun. Math. Phys. 159 151-174. · Zbl 0789.35152 · doi:10.1007/BF02100489
[32] Tracy, C. A. and Widom, H. (1996). On orthogonal and symplectic matrix ensembles. Commun. Math. Phys. 177 727-754. · Zbl 0851.60101 · doi:10.1007/BF02099545
[33] Van Trees, H. L. (2002). Detection, Estimation, and Modulation Theory. Part IV. Optimum Array Processing . Wiley, New York. · Zbl 0301.93050
[34] Wishart, J. (1928). The generalized product moment distribution in samples from a normal multivariate population. Biometrika 20 32-52. · JFM 54.0565.02
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.