Linear model selection by cross-validation.

*(English)*Zbl 0773.62051Summary: We consider the problem of selecting a model having the best predictive ability among a class of linear models. The popular leave-one-out cross- validation method, which is asymptotically equivalent to many other model selection methods such as the Akaike information criterion (AIC), the \(C_ p\), and the bootstrap, is asymptotically inconsistent in the sense that the probability of selecting the model with the best predictive ability does not converge to 1 as the total number of observations \(n\to\infty\).

We show that the inconsistency of the leave-one-out cross-validation can be rectified by using a leave-\(n_ v\)-out cross-validation with \(n_ v\), the number of observations reserved for validation, satisfying \(n_ v/n\to 1\) as \(n\to\infty\). This is a somewhat shocking discovery, because \(n_ v/n\to 1\) is totally opposite to the popular leave-one-out recipe in cross-validation. Motivations, justifications, and discussions of some practical aspects of the use of the leave-\(n_ v\)-out cross-validation method are provided, and results from a simulation study are presented.

We show that the inconsistency of the leave-one-out cross-validation can be rectified by using a leave-\(n_ v\)-out cross-validation with \(n_ v\), the number of observations reserved for validation, satisfying \(n_ v/n\to 1\) as \(n\to\infty\). This is a somewhat shocking discovery, because \(n_ v/n\to 1\) is totally opposite to the popular leave-one-out recipe in cross-validation. Motivations, justifications, and discussions of some practical aspects of the use of the leave-\(n_ v\)-out cross-validation method are provided, and results from a simulation study are presented.

##### MSC:

62J99 | Linear inference, regression |

65C99 | Probabilistic methods, stochastic differential equations |