The authors develop performance bounds for criteria of model selection, using recent theory for sieves. The model selection criteria are based on an empirical loss or contrast function with an added penalty term roughly proportional to the number of parameters needed to describe the model divided by the number of observations. Most of the presented examples involve density or regression estimation settings, and the authors focus on the problem of estimating the unknown density or regression function.
It is shown that the quadratic risk of the minimum penalized empirical contrast estimator is bounded by an index of the accuracy of the sieve. The connection between model selection via penalization and adaptation in the minimax sense is pointed out. Such illustrations of the introduced method as penalized maximum likelihood, projection or least squares estimation are provided. The models involve commonly used finite dimensional expansions such as piecewise polynomials with fixed or variable knots, trigonometric polynomials, wavelets, neural nets, and related nonlinear expansions defined by superposition of ridge functions.