×

zbMATH — the first resource for mathematics

Robust groupwise least angle regression. (English) Zbl 06918715
Summary: Many regression problems exhibit a natural grouping among predictor variables. Examples are groups of dummy variables representing categorical variables, or present and lagged values of time series data. Since model selection in such cases typically aims for selecting groups of variables rather than individual covariates, an extension of the popular least angle regression (LARS) procedure to groupwise variable selection is considered. Data sets occurring in applied statistics frequently contain outliers that do not follow the model or the majority of the data. Therefore a modification of the groupwise LARS algorithm is introduced that reduces the influence of outlying data points. Simulation studies and a real data example demonstrate the excellent performance of groupwise LARS and, when outliers are present, its robustification.

MSC:
62-XX Statistics
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Alfons, A., 2014a. robustHD: robust methods for high-dimensional data. R package version 0.5.0.
[2] Alfons, A., 2014b. simFrame: simulation framework. R package version 0.5.3.
[3] Alfons, A.; Baaske, W.; Filzmoser, P.; Mader, W.; Wieser, R., Robust variable selection with application to quality of life research, Stat. Methods Appl., 20, 1, 65-82, (2011) · Zbl 1333.62168
[4] Alfons, A.; Croux, C.; Gelper, S., Sparse least trimmed squares regression for analyzing high-dimensional large data sets, Ann. Appl. Stat., 7, 1, 226-248, (2013) · Zbl 1454.62123
[5] Alfons, A.; Templ, M.; Filzmoser, P., An object-oriented framework for statistical simulation: the R package simframe, J. Stat. Softw., 37, 3, 1-36, (2010)
[6] Breheny, P.; Huang, J., Penalized methods for bi-level variable selection, Stat. Interface, 2, 3, 369-380, (2009) · Zbl 1245.62034
[7] Breiman, L., Better subset regression using the nonnegative garrote, Technometrics, 37, 4, 373-384, (1995) · Zbl 0862.62059
[8] Chen, X., Wang, Z., McKeown, M., 2010. FMRI group studies of brain connectivity via a group robust lasso, in: 2010 IEEE International Conference on Image Processing, Hong Kong, pp. 589-592.
[9] Dudoit, S.; Fridlyand, J.; Speed, T., Comparison of discrimination methods for the classification of tumors using gene expression data, J. Amer. Statist. Assoc., 97, 457, 77-87, (2001) · Zbl 1073.62576
[10] Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R., Least angle regression, Ann. Statist., 32, 2, 407-499, (2004) · Zbl 1091.62054
[11] Hastie, T., Efron, B., 2013. lars: least angle regression, lasso and forward stagewise. R package version 1.2.
[12] Khan, J.; Van Aelst, S.; Zamar, R., Robust linear model selection based on least angle regression, J. Amer. Statist. Assoc., 102, 480, 1289-1299, (2007) · Zbl 1332.62240
[13] Khan, J.; Van Aelst, S.; Zamar, R., Fast robust estimation of prediction error based on resampling, Comput. Statist. Data Anal., 54, 12, 3121-3130, (2010) · Zbl 1284.62201
[14] Maronna, R.; Martin, D.; Yohai, V., Robust statistics: theory and methods, (2006), John Wiley & Sons Chichester · Zbl 1094.62040
[15] McCann, L.; Welsch, R., Robust variable selection using least angle regression and elemental set sampling, Comput. Statist. Data Anal., 52, 1, 249-257, (2007) · Zbl 1452.62504
[16] Meier, L.; van de Geer, S.; Brühlmann, P., The group lasso for logistic regression, J. R. Stat. Soc. Ser. B, 70, 1, 53-71, (2008) · Zbl 1400.62276
[17] R: A language and environment for statistical computing, (2014), R Foundation for Statistical Computing Vienna, Austria
[18] Ronchetti, E.; Field, C.; Blanchard, W., Robust linear model selection by cross-validation, J. Amer. Statist. Assoc., 92, 439, 1017-1023, (1997) · Zbl 1067.62551
[19] Rousseeuw, P.; van Zomeren, B., Unmasking multivariate outliers and leverage points, J. Amer. Statist. Assoc., 85, 411, 633-639, (1990)
[20] Salibian-Barrera, M.; Van Aelst, S., Robust model selection using fast and robust bootstrap, Comput. Statist. Data Anal., 52, 12, 5121-5135, (2008) · Zbl 1452.62509
[21] Schwarz, G., Estimating the dimension of a model, Ann. Statist., 6, 2, 461-464, (1978) · Zbl 0379.62005
[22] Tibshirani, R., Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B, 58, 1, 267-288, (1996) · Zbl 0850.62538
[23] Tibshirani, R.; Hastie, T.; Narasimhan, B.; Chu, G., Class prediction by nearest shrunken centroids, with applications to DNA microarrays, Statist. Sci., 18, 1, 104-117, (2003) · Zbl 1048.62109
[24] Witten, D.; Tibshirani, R.; Hastie, T., A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis, Biostatistics, 10, 3, 515-534, (2009)
[25] Yohai, V., High breakdown-point and high efficiency robust estimates for regression, Ann. Statist., 15, 20, 642-656, (1987) · Zbl 0624.62037
[26] Yuan, M.; Lin, Y., Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B, 68, 1, 49-67, (2006) · Zbl 1141.62030
[27] Zhao, P.; Rocha, G.; Yu, B., The composite absolute penalties family for grouped and hierarchical variable selection, Ann. Statist., 37, 6, 3468-3497, (2009) · Zbl 1369.62164
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.