Sparse estimation of large covariance matrices via a nested Lasso penalty. (English) Zbl 1137.62338

Summary: The paper proposes a new covariance estimator for large covariance matrices when the variables have a natural ordering. Using the Cholesky decomposition of the inverse, we impose a banded structure on the Cholesky factor, and select the bandwidth adaptively for each row of the Cholesky factor, using a novel penalty we call nested Lasso. This structure has more flexibility than regular banding, but, unlike regular Lasso applied to the entries of the Cholesky factor, results in a sparse estimator for the inverse of the covariance matrix. An iterative algorithm for solving the optimization problem is developed. The estimator is compared to a number of other covariance estimators and is shown to do best, both in simulations and on a real data example. Simulations show that the margin by which the estimator outperforms its competitors tends to increase with dimension.


62H12 Estimation in multivariate analysis
62F30 Parametric inference under constraints
15A09 Theory of matrix inversion and generalized inverses
65C60 Computational problems in statistics (MSC2010)


Full Text: DOI arXiv


[1] Adam, B., Qu, Y., Davis, J., Ward, M., Clements, M., Cazares, L., Semmes, O., Schellhammer, P., Yasui, Y., Feng, Z. and Wright, G. (2002). Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men., Cancer Research 62 3609-3614.
[2] Anderson, T. W. (1958)., An Introduction to Multivariate Statistical Analysis . Wiley, New York. · Zbl 0083.14601
[3] Banerjee, O., d’Aspremont, A. and El Ghaoui, L. (2006). Sparse covariance selection via robust maximum likelihood estimation. In, Proceedings of ICML .
[4] Bickel, P. J. and Levina, E. (2004). Some theory for Fisher’s linear discriminant function, “naive Bayes,” and some alternatives when there are many more variables than observations., Bernoulli 10 989-1010. · Zbl 1064.62073
[5] Bickel, P. J. and Levina, E. (2007). Regularized estimation of large covariance matrices., Ann. Statist. · Zbl 1132.62040
[6] Dempster, A. (1972). Covariance selection., Biometrics 28 157-175.
[7] Dey, D. K. and Srinivasan, C. (1985). Estimation of a covariance matrix under Stein’s loss., Ann. Statist. 13 1581-1591. · Zbl 0582.62042
[8] Diggle, P. and Verbyla, A. (1998). Nonparametric estimation of covariance structure in longitudinal data., Biometrics 54 401-415. · Zbl 1058.62600
[9] Djavan, B., Zlotta, A., Kratzik, C., Remzi, M., Seitz, C., Schulman, C. and Marberger, M. (1999). Psa, psa density, psa density of transition zone, free/total psa ratio, and psa velocity for early detection of prostate cancer in men with serum psa 2.5 to 4.0 ng/ml., Urology 54 517-522.
[10] Fan, J., Fan, Y. and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model., J. Econometrics . · Zbl 1429.62185
[11] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties., J. Amer. Statist. Assoc. 96 1348-1360. JSTOR: · Zbl 1073.62547
[12] Friedman, J. (1989). Regularized discriminant analysis., J. Amer. Statist. Assoc. 84 165-175. JSTOR:
[13] Friedman, J., Hastie, T., Höfling, H. G. and Tibshirani, R. (2007). Pathwise coordinate optimization., Ann. Appl. Statist. 1 302-332. · Zbl 1378.90064
[14] Fu, W. (1998). Penalized regressions: The bridge versus the lasso., J. Comput. Graph. Statist. 7 397-416. JSTOR:
[15] Furrer, R. and Bengtsson, T. (2007). Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants., J. Multivariate Anal. 98 227-255. · Zbl 1105.62091
[16] Haff, L. R. (1980). Empirical Bayes estimation of the multivariate normal covariance matrix., Ann. Statist. 8 586-597. · Zbl 0441.62045
[17] Hastie, T., Tibshirani, R. and Friedman, J. (2001)., The Elements of Statistical Learning . Springer, Berlin. · Zbl 0973.62007
[18] Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems., Technometrics 12 55-67. · Zbl 0202.17205
[19] Huang, J., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood., Biometrika 93 85-98. · Zbl 1152.62346
[20] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis., Ann. Statist. 29 295-327. · Zbl 1016.62078
[21] Johnstone, I. M. and Lu, A. Y. (2007). Sparse principal components analysis., J. Amer. Statist. Assoc. · Zbl 1388.62174
[22] Ledoit, O. and Wolf, M. (2003). A well-conditioned estimator for large-dimensional covariance matrices., J. Multivariate Anal. 88 365-411. · Zbl 1032.62050
[23] Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979)., Multivariate Analysis . Academic Press, New York. · Zbl 0432.62029
[24] Pannek, J. and Partin, A. (1998). The role of psa and percent free psa for staging and prognosis prediction in clinically localized prostate cancer., Semin. Urol. Oncol. 16 100-105.
[25] Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation., Biometrika 86 677-690. JSTOR: · Zbl 0949.62066
[26] Smith, M. and Kohn, R. (2002). Parsimonious covariance matrix estimation for longitudinal data., J. Amer. Statist. Assoc. 97 1141-1153. JSTOR: · Zbl 1041.62044
[27] Stamey, T., Johnstone, I., McNeal, J., Lu, A. and Yemoto, C. (2002). Preoperative serum prostate specific antigen levels between 2 and 22 ng/ml correlate poorly with post-radical prostatectomy cancer morphology: Prostate specific antigen cure rates appear constant between 2 and 9 ng/ml., J. Urol. 167 103-111.
[28] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., J. Roy. Statist. Soc. Ser. B 58 267-288. JSTOR: · Zbl 0850.62538
[29] Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2003). Class prediction by nearest shrunken centroids, with applications to DNA microarrays., Statist. Sci. 18 104-117. · Zbl 1048.62109
[30] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso., J. Roy. Statist. Soc. Ser. B 67 91-108. JSTOR: · Zbl 1060.62049
[31] Wong, F., Carter, C. and Kohn, R. (2003). Efficient estimation of covariance selection models., Biometrika 90 809-830. · Zbl 1436.62346
[32] Wu, W. B. and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data., Biometrika 90 831-844. · Zbl 1436.62347
[33] Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model., Biometrika 94 19-35. · Zbl 1142.62408
[34] Zhao, P., Rocha, G. and Yu, B. (2006). Grouped and hierarchical model selection through composite absolute penalties. Technical Report 703, Dept. Statistics, UC, Berkeley.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.