×

Delete or merge regressors for linear model selection. (English) Zbl 1323.62025

Summary: We consider a problem of linear model selection in the presence of both continuous and categorical predictors. Feasible models consist of subsets of numerical variables and partitions of levels of factors. A new algorithm called delete or merge regressors (DMR) is presented which is a stepwise backward procedure involving ranking the predictors according to squared t-statistics and choosing the final model minimizing BIC. We prove consistency of DMR when the number of predictors tends to infinity with the sample size and describe a simulation study using a pertaining R package. The results indicate significant advantage in time complexity and selection accuracy of our algorithm over Lasso-based methods described in the literature. Moreover, a version of DMR for generalized linear models is proposed.

MSC:

62F07 Statistical ranking and selection procedures
62J07 Ridge regression; shrinkage estimators (Lasso)
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Bondell, H. D. and Reich, B. J. (2009). Simultaneous factor selection and collapsing levels in ANOVA., Biometrics 65 169-177. · Zbl 1159.62048
[2] Caliński, T. and Corsten, L. (1985). Clustering means in ANOVA by simultaneous testing., Biometrics 39-48.
[3] Chen, J. and Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces., Biometrika 95 759-771. · Zbl 1437.62415
[4] Ciampi, A., Lechevallier, Y., Limas, M. C. and Marcos, A. G. (2008). Hierarchical clustering of subpopulations with a dissimilarity based on the likelihood ratio statistic: application to clustering massive data sets., Pattern Analysis and Applications 11 199-220. · Zbl 05536403
[5] Dayton, C. M. (2003). Information criteria for pairwise comparisons., Psychological Methods 8 61-71.
[6] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression., Annals of Statistics 32 407-499. · Zbl 1091.62054
[7] Gertheiss, J. and Tutz, G. (2010). Sparse modeling of categorial explanatory variables., Annals of Applied Statistics 4 2150-2180. · Zbl 1220.62092
[8] Oelker, M.-R., Gertheiss, J. and Tutz, G. (2014). Regularization and model selection with categorical predictors and effect modifiers in generalized linear models., Statistical Modelling 14 157-177.
[9] Pokarowski, P. and Mielniczuk, J. (2015). Combined l_1 and greedy l_0 penalized least squares for linear model selection., Journal of Machine Learning Research 16(5) . · Zbl 1360.62401
[10] Porreca, R. and Ferrari-Trecate, G. (2010). Partitioning datasets based on equalities among parameters., Automatica 46 460-465. · Zbl 1205.93159
[11] Scott, A. and Knott, M. (1974). A cluster analysis method for grouping means in the analysis of variance., Biometrics 507-512. · Zbl 0284.62044
[12] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., Journal of the Royal Statistical Society. Series B (Methodological) 267-288. · Zbl 0850.62538
[13] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2004). Sparsity and smoothness via the fused lasso., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 91-108. · Zbl 1060.62049
[14] Tukey, J. W. (1949). Comparing individual means in the analysis of variance., Biometrics 99-114.
[15] Zhang, T. (2010). Analysis of multi-stage convex relaxation for sparse regularization., Journal of Machine Learning Research 11 1081-1107. · Zbl 1242.68262
[16] Zheng, X. and Loh, W.-Y. (1995). Consistent variable selection in linear models., Journal of the American Statistical Association 90 151-156. · Zbl 0818.62060
[17] Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models., Annals of Statistics 36 1509. · Zbl 1142.62027
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.