An improved model averaging scheme for logistic regression. (English) Zbl 1163.62029

Summary: Recently, penalized regression methods have attracted much attention in the statistical literature. We argue that such methods can be improved for the purposes of prediction by utilizing model averaging ideas. We propose a new algorithm that combines penalized regression with model averaging for improved prediction. We also discuss the issue of model selection versus model averaging and propose a diagnostic based on the notion of generalized degrees of freedom. The proposed methods are studied using both simulated and real data.


62G08 Nonparametric regression and quantile regression
65C60 Computational problems in statistics (MSC2010)
62G99 Nonparametric inference
Full Text: DOI Link


[1] Burnham, K. P.; Anderson, D. R., Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (2002), Springer-Verlag: Springer-Verlag New York · Zbl 1005.62007
[2] Claeskens, G.; Hjort, N. L., Model Selection and Model Averaging (2008), Cambridge University Press: Cambridge University Press Cambridge · Zbl 1166.62001
[3] Hoerl, A. E.; Kennard, R. W., Ridge regression: Biased estimation for nonorthogonal problems, Technometrics, 12, 55-67 (1970) · Zbl 0202.17205
[4] Tibshirani, R., Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society B, 58, 267-288 (1996) · Zbl 0850.62538
[5] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 1348-1360 (2001) · Zbl 1073.62547
[6] Efron, B.; Hastie, T.; Johnstone, T.; Tibshirani, R., Least angle regression (with discussion), Annals of Statistics, 32, 407-499 (2004) · Zbl 1091.62054
[7] Yang, Y., Adaptive regression by mixing, Journal of American Statistical Association, 96, 574-588 (2001) · Zbl 1018.62033
[8] Yuan, Z.; Ghosh, D., Combining logistic regression models for multiple biomarkers, Biometrics, 64, 431-439 (2008) · Zbl 1137.62404
[9] Ye, J., On measuring and correcting the effects of data mining and model selection, Journal of American Statistical Association, 93, 120-131 (1998) · Zbl 0920.62056
[10] Akaike, H., Information theory and an extension of the maximum likelihood principle, (Petrov, B. N.; Csaki, F., Proc. 2nd Int. Symp. Info. Theory (1973), Akademia Kiado: Akademia Kiado Budapest), 267-281 · Zbl 0283.62006
[11] Yuan, Z.; Yang, Y., Combining linear regression models: When and how?, Journal of American Statistical Association, 100, 1202-1214 (2005) · Zbl 1117.62454
[12] Frank, I. E.; Freidman, J. H., A statistical view of some chemometrics regression tools, Technometrics, 35, 109-148 (1993) · Zbl 0775.62288
[13] Gibbons, D. G., A simulation study of some ridge estimators, Journal of the American Statistical Association, 76, 131-139 (1981) · Zbl 0452.62055
[14] Shen, X.; Huang, H.; Ye, J., Adaptive model selection and assessment for exponential family distributions, Technometrics, 46, 306-317 (2004)
[15] Golub, G. H.; Heath, M.; Wahba, G., Generalized cross-validation as a method for choosing a good ridge parameter, Technometrics, 21, 215-223 (1979) · Zbl 0461.62059
[16] Barron, A.; Birgé, L.; Massart, P., Risk bounds for model selection by penalization, Probability Theory and Related Fields, 113, 301-413 (1999) · Zbl 0946.62036
[17] Danilov, D.; Magnus, J. R., On the harm that ignoring pretesting can cause, Journal of Econometrics, 122, 27-46 (2004) · Zbl 1282.91257
[18] Leeb, H.; Pötscher, B. M., Can one estimate the conditional distribution of post-model-selection estimators?, Annals of Statistics, 34, 254-259 (2006)
[19] Leeb, H.; Pötscher, B. M., Can one estimate the unconditional distribution of post-model-selection estimators?, Econometric Theory, 24, 338-376 (2008) · Zbl 1284.62152
[20] Hjort, N. L.; Claeskens, G., Frequentist model average estimators (with discussion), Journal of the American Statistical Association, 98, 879-899 (2003) · Zbl 1047.62003
[21] Claeskens, G.; Croux, C.; Van Kerckhoven, J., Variable selection for logistic regression using a prediction focussed information criterion, Biometrics, 62, 972-979 (2006) · Zbl 1116.62073
[22] Obenchain, R. L., Good and optimal ridge estimators, Annals of Statistics, 6, 1111-1121 (1978) · Zbl 0384.62059
[23] Breiman, L., Heuristics of instability and stabilization in model selection, The Annals of Statistics, 24, 2350-2383 (1996) · Zbl 0867.62055
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.