×

The spectral condition number plot for regularization parameter evaluation. (English) Zbl 1482.62016

Summary: Many modern statistical applications ask for the estimation of a covariance (or precision) matrix in settings where the number of variables is larger than the number of observations. There exists a broad class of ridge-type estimators that employs regularization to cope with the subsequent singularity of the sample covariance matrix. These estimators depend on a penalty parameter and choosing its value can be hard, in terms of being computationally unfeasible or tenable only for a restricted set of ridge-type estimators. Here we introduce a simple graphical tool, the spectral condition number plot, for informed heuristic penalty parameter assessment. The proposed tool is computationally friendly and can be employed for the full class of ridge-type covariance (precision) estimators.

MSC:

62-08 Computational methods for problems pertaining to statistics
62A09 Graphical methods in statistics
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Anderson, E.; Bai, Z.; Bischof, C.; Blackford, S.; Demmel, J.; Dongarra, J.; Du Croz, J.; Greenbaum, A.; Hammarling, S.; McKenney, A.; Sorensen, D., LAPACK users’ guide (1999), Philadelphia: Society for Industrial and Applied Mathematics, Philadelphia · Zbl 0934.65030
[2] Bien, J.; Tibshirani, R., Sparse estimation of a covariance matrix, Biometrika, 98, 807-820 (2011) · Zbl 1228.62063
[3] Bilgrau AE, Peeters CFW, Eriksen PS, Boegsted M, van Wieringen WN (2015) Targeted fused ridge estimation of inverse covariance matrices from multiple high-dimensional data classes. Technical report. arXiv:1509.07982 [stat.ME]
[4] Boyle, EA; Li, YI; Pritchard, JK, An expanded view of complex traits: from polygenic to omnigenic, Cell, 169, 1177-1186 (2017)
[5] Brent, RP, An algorithm with guaranteed convergence for finding a zero of a function, Comput J, 14, 422-425 (1971) · Zbl 0231.65046
[6] Cattell, RB, The scree test for the number of factors, Multivar Behav Res, 1, 245-276 (1966)
[7] Cerami, E.; Gao, J.; Dogrusoz, U.; Gross, BE; Sumer, SO; Aksoy, BA; Jacobsen, A.; Byrne, CJ; Heuer, ML; Larsson, E.; Antipin, Y.; Reva, B.; Goldberg, AP; Sander, C.; Schultz, N., The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data, Cancer Discov, 2, 401-404 (2012)
[8] Cheney, W.; Kincaid, D., Numerical computing and mathematics (2008), Belmont: Thomson Brooks, Belmont
[9] Chi, EC; Lange, K., Stable estimation of a covariance matrix guided by nuclear norm penalties, Comput Stat Data Anal, 80, 117-128 (2014) · Zbl 1506.62043
[10] Colvetti, D.; Reichel, L.; Sorensen, DC, An implicitely restarted Lanczos method for large symmetric eigenvalue problems, Electron Trans Numer Anal, 2, 1-21 (1994) · Zbl 0809.65030
[11] D’Amato, C.; Rosa, R.; Marciano, R.; D’Amato, V.; Formisano, L.; Nappi, L.; Raimondo, L.; Di Mauro, C.; Servetto, A.; Fulciniti, F.; Cipolletta, A.; Bianco, C.; Ciardiello, F.; Veneziani, BM; De Placido, S.; Bianco, R., Inhibition of Hedgehog signalling by NVP-LDE225 (Erismodegib) interferes with growth and invasion of human renal cell carcinoma cells, Br J Cancer, 111, 1168-1179 (2014)
[12] Daniels, MJ; Kass, RE, Shrinkage estimators for covariance matrices, Biometrics, 57, 1173-1184 (2001) · Zbl 1209.62132
[13] Demmel, JW, On condition numbers and the distance to the nearest ill-posed problem, Numer Math, 51, 251-289 (1987) · Zbl 0597.65036
[14] Devlin, SJ; Gnanadesikan, R.; Kettenring, JR, Robust estimation and outlier detection with correlation coefficients, Biometrika, 62, 531-545 (1975) · Zbl 0321.62053
[15] Dormoy, V.; Danilin, S.; Lindner, V.; Thomas, L.; Rothhut, S.; Coquard, C.; Helwig, JJ; Jacqmin, D.; Lang, H.; Massfelder, T., The sonic hedgehog signaling pathway is reactivated in human renal cell carcinoma and plays orchestral role in tumor growth, Mol Cancer, 8, 123 (2009)
[16] Eddelbuettel, D., Seamless R and C++ integration with Rcpp (2013), New York: Springer, New York · Zbl 1283.62001
[17] Eddelbuettel, D.; François, R., Rcpp: seamless R and C++ integration, J Stat Softw, 40, 8, 1-18 (2011)
[18] Fisher, TJ; Sun, X., Improved Stein-type shrinkage estimators for the high-dimensional multivariate normal covariance matrix, Comput Stat Data Anal, 55, 1909-1918 (2011) · Zbl 1328.62336
[19] Friedman, J.; Hastie, T.; Tibshirani, R., Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9, 432-441 (2008) · Zbl 1143.62076
[20] Gao, J.; Aksoy, BA; Dogrusoz, U.; Dresdner, G.; Gross, B.; Sumer, SO; Sun, Y.; Jacobsen, A.; Sinha, R.; Larsson, E.; Cerami, E.; Sander, C.; Schultz, N., Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal, Sci Signal, 6, pl1 (2013)
[21] Gentle, JE, Matrix algebra: theory, computations, and applications in statistics (2007), New York: Springer, New York · Zbl 1133.15001
[22] Ha, MJ; Sun, W., Partial correlation matrix estimation using ridge penalty followed by thresholding and re-estimation, Biometrics, 70, 765-773 (2014)
[23] Haff, LR, Empirical Bayes estimation of the multivariate normal covariance matrix, Ann Stat, 8, 586-597 (1980) · Zbl 0441.62045
[24] Haff, LR, The variational form of certain Bayes estimators, Ann Stat, 19, 1163-1190 (1991) · Zbl 0739.62046
[25] Higham, DJ, Condition numbers and their condition numbers, Linear Algebra Appl, 214, 193-213 (1995) · Zbl 0816.15004
[26] Hoerl, AE; Kennard, R., Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 12, 55-67 (1970) · Zbl 0202.17205
[27] IEEE Computer Society (2008) IEEE standard for floating-point arithmetic. IEEE Std 754-2008, pp 1-70
[28] Jacobsen A (2015) cgdsr: R-based API for accessing the MSKCC Cancer Genomics Data Server (CGDS). R package version 1.2.5. http://CRAN.R-project.org/package=cgdsr. Accessed 13 Apr 2019
[29] Kanehisa, M.; Goto, S., KEGG: Kyoto encyclopedia of genes and genomes, Nucl Acids Res, 28, 1, 27-30 (2000)
[30] Ledoit, O.; Wolf, M., Improved estimation of the covariance matrix of stock returns with an application to portfolio selection, J Empir Finance, 10, 603-621 (2003)
[31] Ledoit, O.; Wolf, M., Honey, I shrunk the sample covariance matrix, J Portf Manag, 30, 110-119 (2004)
[32] Ledoit, O.; Wolf, M., A well-conditioned estimator for large-dimensional covariance matrices, J Multivar Anal, 88, 365-411 (2004) · Zbl 1032.62050
[33] LeVeque, RJ, Finite difference methods for ordinary and partial differential equations: steady state and time dependent problems (2007), Philadelphia: Society for Industrial and Applied Mathematics (SIAM), Philadelphia · Zbl 1127.65080
[34] Lian, H., Shrinkage tuning parameter selection in precision matrices estimation, J Stat Plan Inference, 141, 2839-2848 (2011) · Zbl 1213.62099
[35] Lin, S.; Perlman, M.; Krishnaiah, PR, A Monte Carlo comparison of four estimators of a covariance matrix, Multivariate analysis, 411-429 (1985), Amsterdam: North Holland, Amsterdam · Zbl 0593.62051
[36] Mahalanobis, PC, On the generalised distance in statistics, Proc Natl Inst Sci India, 2, 49-55 (1936) · Zbl 0015.03302
[37] Peeters CFW, Bilgrau AE, van Wieringen WN (2019) rags2ridges: Ridge estimation of precision matrices from high-dimensional data. R package version 2.2.1. http://cran.r-project.org/package=rags2ridges. Accessed 13 Apr 2019
[38] Pourahmadi, M., High-dimensional covariance estimation (2013), Hoboken: Wiley, Hoboken · Zbl 1276.62031
[39] Qiu Y, Mei J (2019) RSpectra: solvers for large-scale eigenvalue and SVD problems. R package version 0.14-0. https://CRAN.R-project.org/package=RSpectra. Accessed 13 Apr 2019
[40] R Development Core Team (2011) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/. ISBN 3-900051-07-0. Accessed 13 Apr 2019
[41] Rousseeuw, PJ, Least median of squares regression, J Am Stat Assoc, 79, 871-880 (1984) · Zbl 0547.62046
[42] Schäfer, J.; Strimmer, K., A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics, Stat Appl Genet Mol Biol, 4, art. 32 (2005)
[43] Shuch, B.; Amin, A.; Armstrong, AJ; Eble, JN; Ficarra, V.; Lopez-Beltran, A.; Martignoni, G.; Rini, BI; Kutikov, A., Understanding pathologic variants of renal cell carcinoma: distilling therapeutic opportunities from biologic complexity, Eur Urol, 67, 85-97 (2015)
[44] Stec, R.; Grala, B.; Ma̧czewski, M.; Bodnar, L.; Szczylik, C., Chromophobe renal cell cancer-review of the literature and potential methods of treating metastatic disease, J Exp Clin Cancer Res, 28, 134 (2009)
[45] Stein C (1975) Estimation of a covariance matrix. Rietz Lecture. 39th Annual Meeting IMS. Atlanta, Georgia
[46] Stein, C., Lectures on the theory of estimation of many parameters, J Math Sci, 34, 1373-1403 (1986) · Zbl 0593.62049
[47] Subramanya, AR; Ellison, DH, Distal convoluted tubule, Clin J Am Soc Nephrol, 9, 2147-2163 (2014)
[48] The Cancer Genome Atlas Research Network, Comprehensive molecular characterization of clear cell renal cell carcinoma, Nature, 499, 43-49 (2013)
[49] Tukey, JW, Exploratory data analysis (1977), Boston: Addison-Wesley, Boston · Zbl 0409.62003
[50] Turing, AM, Rounding-off errors in matrix processes, Q J Mech Appl Math, 1, 287-308 (1948) · Zbl 0033.28501
[51] van Wieringen, WN; Peeters, CFW, Ridge estimation of inverse covariance matrices from high-dimensional data, Comput Stat Data Anal, 103, 284-303 (2016) · Zbl 1466.62204
[52] Von Neumann, J.; Goldstine, HH, Numerical inverting of matrices of high order, Bull Am Math Soc, 53, 1021-1099 (1947) · Zbl 0031.31402
[53] Vujačić, I.; Abbruzzo, A.; Wit, EC, A computationally fast alternative to cross-validation in penalized Gaussian graphical models, J Stat Comput Simul, 85, 3628-3640 (2015) · Zbl 1510.62077
[54] Warton, DI, Penalized normal likelihood and ridge regularization of correlation and covariance matrices, J Am Stat Assoc, 103, 340-349 (2008) · Zbl 1471.62362
[55] Whittaker, J., Graphical models in applied multivariate statistics (1990), Chichester: Wiley, Chichester · Zbl 0732.62056
[56] Won, JH; Lim, J.; Kim, SJ; Rajaratnam, B., Condition-number-regularized covariance estimation, J R Stat Soc Ser B, 75, 427-450 (2013) · Zbl 1411.62146
[57] Yang, R.; Berger, JO, Estimation of a covariance matrix using the reference prior, Ann Stat, 22, 1195-1211 (1994) · Zbl 0819.62013
[58] Yuan, KH; Chan, W., Structural equation modeling with near singular covariance matrices, Comput Stat Data Anal, 52, 4842-4858 (2008) · Zbl 1452.62427
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.