zbMATH — the first resource for mathematics

A mixture approach to universal model selection. Revised edition. (English) Zbl 0928.62033
Paris: ENS Éditions. 19 p. (1997).
Summary: We build a model selection algorithm which has a mean Kullback risk upper bounded by the mean risk for the best model applied to half the sample augmented by an explicit penalty term of order one over the sample size. This estimator is not a “true” selection rule, but instead an adaptive convex combination of the models. This mixture approach is inspired by the context tree weighting method of information theory by F. M. J. Willems, Y. M. Shtarkov and T. J. Tjalkens [IEEE Trans. Inf. Theory 42, No. 5, 1540-1520 (1996; Zbl 0860.94016); ibid. 41, No. 3, 653-664 (1995; Zbl 0837.94011)], which is a “universal” data compression algorithm for stationary binary sources. Our algorithm is “universal” with respect to the statistical mean Kullback risk in the sense that it is almost optimal for any exchangeable sample distribution.
We give an application to the estimation of a probability measure by adaptive histograms. We detail a factorised computation of the mixture which weights \(M\) models performing a number of operations of order \(\log(M)\), when the subdivisions underlying the histograms have nested cells. The end of the paper shows that in the case of a dense family of models, a true selection rule can be built in a second step. We give an upper bound for the mean square Hellinger distance of the estimator from the sample distribution, which tends to zero at the optimal rate, up to an explicit multiplicative constant, in the case when the square Hellinder distance and the Kullback divergence are comparable.

62G08 Nonparametric regression and quantile regression
62B10 Statistical aspects of information-theoretic topics
Full Text: Link