×

On the “degrees of freedom” of the lasso. (English) Zbl 1126.62061

Summary: We study the effective degrees of freedom of the lasso in the framework of C. M. Stein’s [Ann. Stat. 9, 1135–1151 (1981; Zbl 0476.62035)] unbiased risk estimation (SURE). We show that the number of nonzero coefficients is an unbiased estimate for the degrees of freedom of the lasso – a conclusion that requires no special assumptions on the predictors. In addition, the unbiased estimator is shown to be asymptotically consistent. With these results on hand, various model selection criteria, \(C_p\), AIC and BIC, are available, which, along with the least angle regression (LARS) algorithm, provide a principled and efficient approach to obtaining the optimal lasso fit with the computational effort of a single ordinary least-squares fit.

MSC:

62J05 Linear regression; mixed models
90C46 Optimality conditions and duality in mathematical programming
65C60 Computational problems in statistics (MSC2010)
62J07 Ridge regression; shrinkage estimators (Lasso)

Citations:

Zbl 0476.62035

Software:

ElemStatLearn
PDF BibTeX XML Cite
Full Text: DOI arXiv Euclid

References:

[1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (B. N. Petrov and F. Csáki, eds.) 267–281. Académiai Kiadó, Budapest. · Zbl 0283.62006
[2] Bühlmann, P. and Yu, B. (2005). Boosting, model selection, lasso and nonnegative garrote. Technical report, ETH Zürich.
[3] Donoho, D. and Johnstone, I. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Amer. Statist. Assoc. 90 1200–1224. JSTOR: · Zbl 0869.62024
[4] Efron, B. (2004). The estimation of prediction error: Covariance penalties and cross-validation (with discussion). J. Amer. Statist. Assoc. 99 619–642. · Zbl 1117.62324
[5] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407–499. · Zbl 1091.62054
[6] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360. JSTOR: · Zbl 1073.62547
[7] Fan, J. and Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. In Proc. International Congress of Mathematicians 3 595–622. European Math. Soc., Zürich. · Zbl 1117.62137
[8] Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961. · Zbl 1092.62031
[9] Gunter, L. and Zhu, J. (2007). Efficient computation and model selection for the support vector regression. Neural Computation 19 1633–1655. · Zbl 1119.68150
[10] Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models . Chapman and Hall, London. · Zbl 0747.62061
[11] Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning ; Data Mining , Inference and Prediction . Springer, New York. · Zbl 0973.62007
[12] Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356–1378. · Zbl 1105.62357
[13] Mallows, C. (1973). Some comments on \(C_P\). Technometrics 15 661–675. · Zbl 0269.62061
[14] Meyer, M. and Woodroofe, M. (2000). On the degrees of freedom in shape-restricted regression. Ann. Statist. 28 1083–1104. · Zbl 1105.62340
[15] Osborne, M., Presnell, B. and Turlach, B. (2000). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–403. · Zbl 0962.65036
[16] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464. · Zbl 0379.62005
[17] Shao, J. (1997). An asymptotic theory for linear model selection (with discussion). Statist. Sinica 7 221–264. · Zbl 1003.62527
[18] Shen, X. and Huang, H.-C. (2006). Optimal model assessment, selection and combination. J. Amer. Statist. Assoc. 101 554–568. · Zbl 1119.62306
[19] Shen, X., Huang, H.-C. and Ye, J. (2004). Adaptive model selection and assessment for exponential family distributions. Technometrics 46 306–317.
[20] Shen, X. and Ye, J. (2002). Adaptive model selection. J. Amer. Statist. Assoc. 97 210–221. JSTOR: · Zbl 1073.62509
[21] Stein, C. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9 1135–1151. · Zbl 0476.62035
[22] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288. JSTOR: · Zbl 0850.62538
[23] Yang, Y. (2005). Can the strengths of AIC and BIC be shared?—A conflict between model identification and regression estimation. Biometrika 92 937–950. · Zbl 1151.62301
[24] Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc. 93 120–131. JSTOR: · Zbl 0920.62056
[25] Zhao, P., Rocha, G. and Yu, B. (2006). Grouped and hierarchical model selection through composite absolute penalties. Technical report, Dept. Statistics, Univ. California, Berkeley.
[26] Zou, H. (2005). Some perspectives of sparse statistical modeling. Ph.D. dissertation, Dept. Statistics, Stanford Univ.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.