×

Operator norm consistent estimation of large-dimensional sparse covariance matrices. (English) Zbl 1196.62064

Summary: Estimating covariance matrices is a problem of fundamental importance in multivariate statistics. In practice it is increasingly frequent to work with data matrices \(X\) of dimension \(n\times p\), where \(p\) and \(n\) are both large. Results from random matrix theory show very clearly that in this setting, standard estimators like the sample covariance matrix perform in general very poorly.
In this “large \(n\), large \(p\)” setting, it is sometimes the case that practitioners are willing to assume that many elements of the population covariance matrix are equal to 0, and hence this matrix is sparse. We develop an estimator to handle this situation. The estimator is shown to be consistent in the operator norm, when, for instance, we have \(p\asymp n\) as \(n\rightarrow \infty \). In other words the largest singular value of the difference between the estimator and the population covariance matrix goes to zero. This implies consistency of all the eigenvalues and consistency of eigenspaces associated to isolated eigenvalues. We also propose a notion of sparsity for matrices, that is, “compatible” with spectral analysis and is independent of the ordering of the variables.

MSC:

62H12 Estimation in multivariate analysis
15A18 Eigenvalues, singular values, and eigenvectors

References:

[1] Anderson, G. W. and Zeitouni, O. (2006). A CLT for a band matrix model. Probab. Theory Related Fields 134 283-338. · Zbl 1084.60014 · doi:10.1007/s00440-004-0422-3
[2] Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis , 3rd ed. Wiley, Hoboken, NJ. · Zbl 1039.62044
[3] Bai, Z. D. and Silverstein, J. W. (2004). CLT for linear spectral statistics of large-dimensional sample covariance matrices. Ann. Probab. 32 553-605. · Zbl 1063.60022 · doi:10.1214/aop/1078415845
[4] Bengtsson, T. and Furrer, R. (2007). Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. J. Multivariate Anal. 98 227-255. · Zbl 1105.62091 · doi:10.1016/j.jmva.2006.08.003
[5] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289-300. JSTOR: · Zbl 0809.62014
[6] Bhatia, R. (1997). Matrix Analysis . Springer, New York. · Zbl 0863.15001
[7] Bickel, P. J. and Levina, E. (2007). Regularized estimation of large covariance matrices. Ann. Statist. 36 199-227. · Zbl 1132.62040 · doi:10.1214/009053607000000758
[8] Bickel, P. J. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 2577-2604. · Zbl 1196.62062 · doi:10.1214/08-AOS600
[9] d’Aspremont, A., Banerjee, O. and El Ghaoui, L. (2008). First-order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl. 30 56-66. · Zbl 1156.90423 · doi:10.1137/060670985
[10] Davis, C. and Kahan, W. M. (1970). The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal. 7 1-46. JSTOR: · Zbl 0198.47201 · doi:10.1137/0707001
[11] El Karoui, N. (2007). Tracy-Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. Ann. Probab. 35 663-714. · Zbl 1117.60020 · doi:10.1214/009117906000000917
[12] El Karoui, N. (2008). Spectrum estimation for large dimensional covariance matrices using random matrix theory. Ann. Statist. 36 2757-2790. · Zbl 1168.62052 · doi:10.1214/07-AOS581
[13] Geman, S. (1980). A limit theorem for the norm of random matrices. Ann. Probab. 8 252-261. · Zbl 0428.60039 · doi:10.1214/aop/1176994775
[14] Haff, L. R. (1980). Empirical Bayes estimation of the multivariate normal covariance matrix. Ann. Statist. 8 586-597. · Zbl 0441.62045 · doi:10.1214/aos/1176345010
[15] Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13-30. JSTOR: · Zbl 0127.10602 · doi:10.2307/2282952
[16] Horn, R. A. and Johnson, C. R. (1990). Matrix Analysis . Cambridge Univ. Press. · Zbl 0704.15002
[17] Huang, J. Z., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93 85-98. · Zbl 1152.62346 · doi:10.1093/biomet/93.1.85
[18] James, W. and Stein, C. (1961). Estimation with quadratic loss. Proc. 4th Berkeley Symp. Math. Statist. Probab. I 361-379. Univ. California Press, Berkeley. · Zbl 1281.62026
[19] Jonsson, D. (1982). Some limit theorems for the eigenvalues of a sample covariance matrix. J. Multivariate Anal. 12 1-38. · Zbl 0491.62021 · doi:10.1016/0047-259X(82)90080-X
[20] Ledoit, O. and Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal. 88 365-411. · Zbl 1032.62050 · doi:10.1016/S0047-259X(03)00096-4
[21] Marčenko, V. A. and Pastur, L. A. (1967). Distribution of eigenvalues in certain sets of random matrices. Mat. Sb. (N.S.) 72 507-536. · Zbl 0152.16101
[22] Silverstein, J. W. (1995). Strong convergence of the empirical distribution of eigenvalues of large-dimensional random matrices. J. Multivariate Anal. 55 331-339. · Zbl 0851.62015 · doi:10.1006/jmva.1995.1083
[23] Stanley, R. P. (1997). Enumerative Combinatorics . I . Cambridge Univ. Press. · Zbl 0889.05001
[24] Stewart, G. W. and Sun, J. G. (1990). Matrix Perturbation Theory . Academic Press, Boston, MA. · Zbl 0706.65013
[25] Wigner, E. (1955). Characteristic vectors of bordered matrices with infinite dimensions. Ann. of Math. 62 548-564. JSTOR: · Zbl 0067.08403 · doi:10.2307/1970079
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.