×

zbMATH — the first resource for mathematics

Spectrum estimation: a unified framework for covariance matrix estimation and PCA in large dimensions. (English) Zbl 1328.62340
Summary: Covariance matrix estimation and principal component analysis (PCA) are two cornerstones of multivariate analysis. Classic textbook solutions perform poorly when the dimension of the data is of a magnitude similar to the sample size, or even larger. In such settings, there is a common remedy for both statistical problems: nonlinear shrinkage of the eigenvalues of the sample covariance matrix. The optimal nonlinear shrinkage formula depends on unknown population quantities and is thus not available. It is, however, possible to consistently estimate an oracle nonlinear shrinkage, which is motivated on asymptotic grounds. A key tool to this end is consistent estimation of the set of eigenvalues of the population covariance matrix (also known as the spectrum), an interesting and challenging problem in its own right. Extensive Monte Carlo simulations demonstrate that our methods have desirable finite-sample properties and outperform previous proposals.

MSC:
62H12 Estimation in multivariate analysis
62H25 Factor analysis and principal components; correspondence analysis
Software:
HiDimDA; SNOPT
PDF BibTeX XML Cite
Full Text: DOI arXiv
References:
[1] Amini, A. A., High-dimensional principal component analysis. technical report UCB/EECS-2011-104, (2011), Department of Electrical Engineering and Computer Sciences, University of California at Berkeley
[2] Anatolyev, S., Inference in regression models with many regressors, J. Econometrics, 170, 2, 368-382, (2012) · Zbl 1443.62054
[3] Bai, Z. D.; Silverstein, J. W., No eigenvalues outside the suppport of the limiting spectral distribution of large-dimensional random matrices, Ann. Probab., 26, 1, 316-345, (1998) · Zbl 0937.60017
[4] Bai, Z. D.; Silverstein, J. W., Exact separation of eigenvalues of large-dimensional sample covariance matrices, Ann. Probab., 27, 3, 1536-1555, (1999) · Zbl 0964.60041
[5] Bai, Z. D.; Silverstein, J. W., Spectral analysis of large-dimensional random matrices, (2010), Springer New York · Zbl 1301.60002
[6] Bickel, P. J.; Freedman, D. A., Asymptotic theory for the bootstrap, Ann. Statist., 9, 6, 1196-1217, (1981) · Zbl 0449.62034
[7] Bickel, P. J.; Levina, E., Regularized estimation of large covariance matrices, Ann. Statist., 36, 1, 199-227, (2008) · Zbl 1132.62040
[8] Connor, G.; Korajczyk, R. A., A test for the number of factors in an approximate factor model, J. Finance, 48, 1263-1291, (1993)
[9] Demetrescu, M.; Hanck, C., A simple nonstationary-volatility robust panel unit root test, Econom. Lett., 117, 1, 10-13, (2012) · Zbl 1254.91576
[10] El Karoui, N., Spectrum estimation for large dimensional covariance matrices using random matrix theory, Ann. Statist., 36, 6, 2757-2790, (2008) · Zbl 1168.62052
[11] Fan, J.; Fan, Y.; Lv, J., High dimensional covariance matrix estimation using a factor model, J. Econometrics, 147, 1, 186-197, (2008) · Zbl 1429.62185
[12] Gill, P. E.; Murray, W.; Saunders, M. A., SNOPT: an SQP algorithm for large-scale constrained optimization, SIAM J. Optim., 12, 4, 979-1006, (2002) · Zbl 1027.90111
[13] Guo, S.-M.; He, J.; Monnier, N.; Sun, G.; Wohland, T.; Bathe, M., Bayesian approach to the analysis of fluorescence correlation spectroscopy data II: application to simulated and in vitro data, Anal. Chem., 84, 9, 3880-3888, (2012)
[14] Haufe, S.; Treder, M.; Gugler, M.; Sagebaum, M.; Curio, G.; Blankertz, B., EEG potentials predict upcoming emergency brakings during simulated driving, J. Neural Eng., 8, 5, (2011)
[15] Hotelling, H., Analysis of a complex of statistical variables into principal components, J. Educ. Psychol., 24, 6, 417-441, (1933), 498-520
[16] Huang, T.-K.; Schneider, J., Learning auto-regressive models from sequence and non-sequence data, (Shawe-Taylor, J.; Zemel, R.; Bartlett, P.; Pereira, F.; Weinberger, K., Advances in Neural Information Processing Systems. Vol. 24, (2011), The MIT Press Cambridge), 1548-1556
[17] Jolliffe, I. T., Principal component analysis, (2002), Springer New York · Zbl 1011.62064
[18] Khan, M., Are accruals mispriced? evidence from tests of an intertemporal capital asset pricing model, J. Account. Econ., 45, 1, 55-77, (2008)
[19] Lawley, D. N., A general method for approximating to the distribution of likelihood ratio criteria, Biometrika, 43, 3-4, 295-303, (1956) · Zbl 0073.13602
[20] Ledoit, O.; Péché, S., Eigenvectors of some large sample covariance matrix ensembles, Probab. Theory Related Fields, 150, 1-2, 233-264, (2011) · Zbl 1229.60009
[21] Ledoit, O.; Wolf, M., A well-conditioned estimator for large-dimensional covariance matrices, J. Multivariate Anal., 88, 2, 365-411, (2004) · Zbl 1032.62050
[22] Ledoit, O.; Wolf, M., Nonlinear shrinkage estimation of large-dimensional covariance matrices, Ann. Statist., 40, 2, 1024-1060, (2012) · Zbl 1274.62371
[23] Lin, J.; Bentler, P., A third moment adjusted test statistic for small sample factor analysis, Multivariate Behav. Res., 47, 3, 448-462, (2012)
[24] Lin, J.-A.; Zhu, H.b.; Knickmeyer, R.; Styner, M.; Gilmore, J.; Ibrahim, J., Projection regression models for multivariate imaging phenotype, Genet. Epidemiol., 36, 6, 631-641, (2012)
[25] Marčenko, V. A.; Pastur, L. A., Distribution of eigenvalues for some sets of random matrices, Sb. Math., 1, 4, 457-483, (1967) · Zbl 0162.22501
[26] Markon, K., Modeling psychopathology structure: A symptom-level analysis of axis I and II disorders, Psychol. Med., 40, 2, 273-288, (2010)
[27] Mestre, X., Improved estimation of eigenvalues and eigenvectors of covariance matrices using their sample estimates, IEEE Trans. Inform. Theory, 54, 11, 5113-5129, (2008) · Zbl 1318.62191
[28] Nguyen, L.; Rheinschmitt, R.; Wild, T.; Brink, S., Limits of channel estimation and signal combining for multipoint cellular radio (comp), (2011 8th International Symposium on Wireless Communication Systems (ISWCS), (2011), IEEE), 176-180
[29] Pearson, K., On line and planes of closest fit to systems of points in space, Phil. Mag. Ser 6, 2, 11, 559-572, (1901) · JFM 32.0710.04
[30] Pedro Duarte Silva, A., Two-group classification with high-dimensional correlated data: A factor model approach, Comput. Statist. Data Anal., 55, 11, 2975-2990, (2011) · Zbl 1218.62064
[31] Perlman, M. D., (STAT 542: Multivariate Statistical Analysis, (2007), University of Washington Seattle, Washington), (On-Line Class Notes)
[32] Pirkl, R.; Remley, K.; Lötbäck Patané, C., Reverberation chamber measurement correlation, IEEE Trans. Electromagn. Compat., 54, 3, 533-545, (2012)
[33] Pyeon, D.; Newton, M.; Lambert, P.; Den Boon, J.; Sengupta, S.; Marsit, C.; Woodworth, C.; Connor, J.; Haugen, T.; Smith, E.; Kelsey, K.; Turek, L.; Ahlquist, P., Fundamental differences in cell cycle deregulation in human papillomavirus-positive and human papillomavirus-negative head/neck and cervical cancers, Cancer Res., 67, 10, 4605-4619, (2007)
[34] Rajaratnam, B.; Massam, H.; Carvalho, C. M., Flexible covariance estimation in graphical Gaussian models, Ann. Statist., 36, 6, 2818-2849, (2008) · Zbl 1168.62054
[35] Ribes, A.; Azaïs, J.-M.; Planton, S., Adaptation of the optimal fingerprint method for climate change detection using a well-conditioned covariance matrix estimate, Clim. Dynam., 33, 5, 707-722, (2009)
[36] Roll, R.; Ross, S. A., An empirical investigation of the arbitrage pricing theory, J. Finance, 35, 1073-1103, (1980)
[37] Sætrom, J.; Hove, J.; Skjervheim, J.-A.; Vabø, J., Improved uncertainty quantification in the ensemble Kalman filter using statistical model-selection techniques, SPE J., 17, 1, 152-162, (2012)
[38] Silverstein, J. W., Strong convergence of the empirical distribution of eigenvalues of large-dimensional random matrices, J. Multivariate Anal., 55, 331-339, (1995) · Zbl 0851.62015
[39] Silverstein, J. W.; Bai, Z. D., On the empirical distribution of eigenvalues of a class of large-dimensional random matrices, J. Multivariate Anal., 54, 175-192, (1995) · Zbl 0833.60038
[40] Silverstein, J. W.; Choi, S. I., Analysis of the limiting spectral distribution of large-dimensional random matrices, J. Multivariate Anal., 54, 295-309, (1995) · Zbl 0872.60013
[41] C. Stein, Estimation of a covariance matrix, in: Rietz Lecture, 39th Annual Meeting IMS. Atlanta, Georgia, 1975.
[42] Stein, C., Lectures on the theory of estimation of many parameters, J. Math. Sci., 34, 1, 1373-1403, (1986) · Zbl 0593.62049
[43] Tsagaris, T.; Jasra, A.; Adams, N., Robust and adaptive algorithms for online portfolio selection, Quant. Finance, 12, 11, 1651-1662, (2012) · Zbl 1279.91188
[44] Varoquaux, G.; Gramfort, A.; Poline, J.-B.; Thirion, B., Brain covariance selection: better individual functional connectivity models using population prior, (Lafferty, J.; Williams, C. K.I.; Shawe-Taylor, J.; Zemel, R.; Culotta, A., Advances in Neural Information Processing Systems. Vol. 23, (2010), The MIT Press Cambridge), 2334-2342
[45] Wei, Z.; Huang, J.; Hui, Y., Adaptive-beamforming-based multiple targets signal separation, (2011 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), (2011), IEEE), 1-4
[46] J. Yao, A. Kammoun, J. Najim, Estimation of the covariance matrix of large dimensional data, 2012. Preprint arXiv:1201.4672. · Zbl 1393.94504
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.