Concentration inequalities and model selection. Ecole d’Eté de Probabilités de Saint-Flour XXXIII – 2003.

*(English)*Zbl 1170.60006
Lecture Notes in Mathematics 1896. Berlin: Springer (ISBN 978-3-540-48497-4/pbk; 978-3-540-48503-2/ebook). xiv, 337 p. (2007).

From the text: Since the impressive works of Talagrand, concentration inequalities have been recognized as fundamental tools in several domains such as geometry of Banach spaces or random combinatorics. They also turn out to be essential tools to develop a non-asymptotic theory in statistics, exactly as the central limit theorem and large deviations are known to play a central part in the asymptotic theory. An overview of a non-asymptotic theory for model selection is given here and some selected applications to variable selection, change points detection and statistical learning are discussed. This volume reflects the content of the course given by P. Massart in St. Flour in 2003. It is mostly self-contained and accessible to graduate students.

Model selection is a classical topic in statistics. The idea of selecting a model via penalizing a log-likelihood type criterion goes back to the early seventies with the pioneering works of Mallows and Akaike. One can find many consistency results in the literature for such criteria. These results are asymptotic in the sense that one deals with a given number of models and the number of observations tends to infinity. We shall give an overview of a non-asymptotic theory for model selection which has emerged during these last ten years. In various contexts of function estimation it is possible to design penalized loglikelihood type criteria with penalty terms depending not only on the number of parameters defining each model (as for the classical criteria) but also on the complexity of the whole collection of models to be considered. The performance of such a criterion is analyzed via non-asymptotic risk bounds for the corresponding penalized estimator which express that it performs almost as well as if the ‘best model’ (i.e., with minimal risk) were known. For practical relevance of these methods, it is desirable to get a precise expression of the penalty terms involved in the penalized criteria on which they are based. This is why this approach heavily relies on concentration inequalities, the prototype being Talagrand’s inequality for empirical processes. Our purpose will be to give an account of the theory and discuss some selected applications such as variable selection or change point detection.

Contents: 1. Introduction; 2. Exponential and information inequalities; 3. Gaussian processes; 4. Gaussian model selection; 5. Concentration inequalities; 6. Maximal inequalities; 7. Density estimation via model selection; 8. Statistical learning.

See also [A. Dembo and T. Funaki, Lectures on probability theory and statistics, Lect. Notes Math. 1869, Springer, Berlin, (2005; Zbl 1084.60005)].

Model selection is a classical topic in statistics. The idea of selecting a model via penalizing a log-likelihood type criterion goes back to the early seventies with the pioneering works of Mallows and Akaike. One can find many consistency results in the literature for such criteria. These results are asymptotic in the sense that one deals with a given number of models and the number of observations tends to infinity. We shall give an overview of a non-asymptotic theory for model selection which has emerged during these last ten years. In various contexts of function estimation it is possible to design penalized loglikelihood type criteria with penalty terms depending not only on the number of parameters defining each model (as for the classical criteria) but also on the complexity of the whole collection of models to be considered. The performance of such a criterion is analyzed via non-asymptotic risk bounds for the corresponding penalized estimator which express that it performs almost as well as if the ‘best model’ (i.e., with minimal risk) were known. For practical relevance of these methods, it is desirable to get a precise expression of the penalty terms involved in the penalized criteria on which they are based. This is why this approach heavily relies on concentration inequalities, the prototype being Talagrand’s inequality for empirical processes. Our purpose will be to give an account of the theory and discuss some selected applications such as variable selection or change point detection.

Contents: 1. Introduction; 2. Exponential and information inequalities; 3. Gaussian processes; 4. Gaussian model selection; 5. Concentration inequalities; 6. Maximal inequalities; 7. Density estimation via model selection; 8. Statistical learning.

See also [A. Dembo and T. Funaki, Lectures on probability theory and statistics, Lect. Notes Math. 1869, Springer, Berlin, (2005; Zbl 1084.60005)].