Bootstrap model selection. (English) Zbl 0869.62030

Summary: In a regression problem, typically there are \(p\) explanatory variables possibly related to a response variable, and we wish to select a subset of the \(p\) explanatory variables to fit a model between these variables and the response. A bootstrap variable/model selection procedure is to select the subset of variables by minimizing bootstrap estimates of the prediction error, where the bootstrap estimates are constructed based on a data set of size \(n\). Although the bootstrap estimates have good properties, this bootstrap selection procedure is inconsistent in the sense that the probability of selecting the optimal subset of variables does not converge to 1 as \(n\to\infty\). This inconsistency can be rectified by modifying the sampling method used in drawing bootstrap observations. For bootstrapping pairs (response, explanatory variable), it is found that instead of drawing \(n\) bootstrap observations (a customary bootstrap sampling plan), much less bootstrap observations should be sampled: The bootstrap selection procedure becomes consistent if we draw \(m\) bootstrap observations with \(m\to\infty\) and \(m/n\to 0\).
For bootstrapping residuals, we modify the bootstrap sampling procedure by increasing the variability among the bootstrap observations. The consistency of the modified bootstrap selection procedures is established in various situations, including linear models, nonlinear models, generalized linear models, and autoregressive time series. The choice of the bootstrap sample size \(m\) and some computational issues are also discussed. Some empirical results are presented.


62G09 Nonparametric statistical resampling methods
62J02 General nonlinear regression
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
Full Text: DOI