Model selection and estimation in regression with grouped variables. (English) Zbl 1141.62030

Summary: We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor analysis-of-variance problem as the most important and well-known example. Instead of selecting factors by stepwise backward elimination, we focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection. The lasso, the LARS algorithm and the non-negative garrotte are recently proposed regression methods that can be used to select individual variables. We study and propose efficient algorithms for the extensions of these methods for factor selection and show that these extensions give superior performance to the traditional stepwise backward elimination method in factor selection problems. We study the similarities and the differences between these methods. Simulations and real examples are used to illustrate the methods.


62G08 Nonparametric regression and quantile regression
62J10 Analysis of variance and covariance (ANOVA)
65C60 Computational problems in statistics (MSC2010)


Full Text: DOI


[1] S. Bakin (1999 ) Adaptive regression and model selection in data mining problems.PhD Thesis. Australian National University, Canberra.
[2] Breiman L., Technometrics 37 pp 373– (1995)
[3] DOI: 10.1214/009053604000000067 · Zbl 1091.62054 · doi:10.1214/009053604000000067
[4] DOI: 10.1198/016214501753382273 · Zbl 1073.62547 · doi:10.1198/016214501753382273
[5] Foster D. P., Ann. Statist. 22 pp 1947– (1994)
[6] Fu W. J., J. Comput. Graph. Statist. 7 pp 397– (1999)
[7] DOI: 10.1093/biomet/87.4.731 · Zbl 1029.62008 · doi:10.1093/biomet/87.4.731
[8] George E. I., J. Am. Statist. Ass. 88 pp 881– (1993)
[9] Hosmer D. W., Applied Logistic Regression (1989) · Zbl 0967.62045
[10] Lin Y., Technical Report 1072 (2003)
[11] Rosset S., Technical Report (2004)
[12] DOI: 10.1198/016214502753479356 · Zbl 1073.62509 · doi:10.1198/016214502753479356
[13] Tibshirani R., J. R. Statist. Soc. 58 pp 267– (1996)
[14] Yuan M., Statistics Discussion Paper 2005-25 (2005)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.