Best choices for regularization parameters in learning theory: on the bias-variance problem. (English) Zbl 1057.68085

This paper contains work that extends the results of [Bull. Am. Math. Soc., New Ser. 39, 1–49 (2002; Zbl 0983.68162)] by the same authors. The main quantity investigated is the statistical risk of the regularized least-square estimator \(\widehat{f}_{\gamma}\) minimizing \[ \frac{1}{m}\sum_{i=1}^m(f(x_i)-y_i)^2 + \gamma \|f\|_K^2 \] over the functions \(f : X \to Y\) in a reproducing kernel Hilbert space \({\mathcal H}_K\), where \[ (x_1,y_1),\dots,(x_m,y_m) \] is an i.i.d. sample drawn from a distribution \(\rho\) on \(X \times Y\). Following the well-known bias-variance decomposition, the risk of \(\widehat{f}_{\gamma}\) is bounded by the sum of the sample error and the approximation error (plus the ineliminable Bayes error term). The main result of the paper shows how to choose the regularization term \(\gamma\) so that the sample error and the approximation error both vanish as the sample size \(m\) tends to infinity provided the Bayes-optimal regressor is bounded and square-integrable with respect to the marginal \(\rho_X\), and \({\mathcal H}_K\) has infinite dimension. Similar results for regularized binary classification have also been proven by I. Steinwart [J. Mach. Learn. Res. 2, 67–93 (2002; Zbl 1009.68143)].


68T05 Learning and adaptive systems in artificial intelligence
62J10 Analysis of variance and covariance (ANOVA)
Full Text: DOI