Boosting with the \(L_ 2\) loss: Regression and classification.

*(English)*Zbl 1041.62029Summary: This article investigates a computationally simple variant of boosting, \(L_2\)Boost, which is constructed from a functional gradient descent algorithm with the \(L_2\)-loss function. Like other boosting algorithms, \(L_2\)Boost uses many times in an iterative fashion a prechosen fitting method, called the learner. Based on the explicit expression of refitting of residuals of \(L_2\)Boost, the case with (symmetric) linear learners is studied in detail in both regression and classification.

In particular, with the boosting iteration \(m\) working as the smoothing or regularization parameter, a new exponential bias-variance trade-off is found with the variance (complexity) term increasing very slowly as \(m\) tends to infinity. When the learner is a smoothing spline, an optimal rate of convergence result holds for both regression and classification and the boosted smoothing spline even adapts to higher-order, unknown smoothness. Moreover, a simple expansion of a (smoothed) 0-1 loss function is derived to reveal the importance of the decision boundary, bias reduction, and impossibility of an additive bias-variance decomposition in classification.

Finally, simulation and real dataset results are obtained to demonstrate the attractiveness of \(L_2\)Boost. In particular, we demonstrate that \(L_2\)Boosting with a novel component-wise cubic smoothing spline is both practical and effective in the presence of high-dimensional predictors.

In particular, with the boosting iteration \(m\) working as the smoothing or regularization parameter, a new exponential bias-variance trade-off is found with the variance (complexity) term increasing very slowly as \(m\) tends to infinity. When the learner is a smoothing spline, an optimal rate of convergence result holds for both regression and classification and the boosted smoothing spline even adapts to higher-order, unknown smoothness. Moreover, a simple expansion of a (smoothed) 0-1 loss function is derived to reveal the importance of the decision boundary, bias reduction, and impossibility of an additive bias-variance decomposition in classification.

Finally, simulation and real dataset results are obtained to demonstrate the attractiveness of \(L_2\)Boost. In particular, we demonstrate that \(L_2\)Boosting with a novel component-wise cubic smoothing spline is both practical and effective in the presence of high-dimensional predictors.

##### MSC:

62G08 | Nonparametric regression and quantile regression |

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |