×

Analyzing bagging. (English) Zbl 1029.62037

Summary: Bagging is one of the most effective computationally intensive procedures to improve on unstable estimators or classifiers, useful especially for high dimensional data set problems. Here we formalize the notion of instability and derive theoretical results to analyze the variance reduction effect of bagging (or variants thereof) in mainly hard decision problems, which include estimation after testing in regression and decision trees for regression functions and classifiers. Hard decisions create instability, and bagging is shown to smooth such hard decisions, yielding smaller variance and mean squared error.
With theoretical explanations, we motivate subagging based on subsampling as an alternative aggregation scheme. It is computationally cheaper but still shows approximately the same accuracy as bagging. Moreover, our theory reveals improvements in first order and in line with simulation studies. In particular, we obtain an asymptotic limiting distribution at the cube-root rate for the split point when fitting piecewise constant functions. Denoting sample size by n, it follows that in a cylindric neighborhood of diameter \(n^{-1/3}\) of the theoretically optimal split point, the variance and mean squared error reduction of subagging can be characterized analytically. Because of the slow rate, our reasoning also provides an explanation on the global scale for the whole covariate space in a decision tree with finitely many splits.

MSC:

62G08 Nonparametric regression and quantile regression
62G09 Nonparametric statistical resampling methods
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T10 Pattern recognition, speech recognition

References:

[1] AMIT, Y. and GEMAN, D. (1997). Shape quantization and recognition with randomized trees. Neural Computation 9 1545-1588.
[2] BAUER, E. and KOHAVI, R. (1999). An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Machine Learning 36 105-139.
[3] BICKEL, P. J., GÖTZE, F. and VAN ZWET, W. R. (1997). Resampling fewer than n observations: gains, losses, and remedies for losses. Statist. Sinica 7 1-32. · Zbl 0927.62043
[4] BREIMAN, L. (1996a). Bagging predictors. Machine Learning 24 123-140. · Zbl 0858.68080
[5] BREIMAN, L. (1996b). Heuristics of instability and stabilization in model selection. Ann. Statist. 24 2350-2383. · Zbl 0867.62055 · doi:10.1214/aos/1032181158
[6] BREIMAN, L., FRIEDMAN, J. H., OLSHEN, R. A. and STONE, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA. · Zbl 0541.62042
[7] BÜHLMANN, P. and YU, B. (2000). Discussion of ”Additive logistic regression: a statistical view of boosting,” by J. Friedman, T. Hastie and R. Tibshirani. Ann. Statist. 28 377-386.
[8] BUJA, A. and STUETZLE, W. (2000a). The effect of bagging on variance, bias, and mean squared error. Preprint, AT&T Labs-Research.
[9] BUJA, A. and STUETZLE, W. (2000b). Smoothing effects of bagging. Preprint, AT&T LabsResearch.
[10] CHAN, K. S. and TSAY, R. S. (1998). Limiting properties of the least squares estimator of a continuous threshold autoregressive model. Biometrika 85 413-426. JSTOR: · Zbl 0938.62089 · doi:10.1093/biomet/85.2.413
[11] DIETTERICH, T. G. (1996). Editorial. Machine Learning 24 91-93.
[12] FREEDMAN, D. A. (1981). Bootstrapping regression models. Ann. Statist. 9 1218-1228. · Zbl 0449.62046 · doi:10.1214/aos/1176345638
[13] FREUND, Y. and SCHAPIRE, R. E. (1998). Discussion of ”Arcing classifiers,” by L. Breiman. Ann. Statist. 26 824-832. · Zbl 0934.62064 · doi:10.1214/aos/1024691079
[14] FRIEDMAN, J. H. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1-141. · Zbl 0765.62064 · doi:10.1214/aos/1176347963
[15] FRIEDMAN, J. H. and HALL, P. (2000). On bagging and nonlinear estimation. · Zbl 1104.62047 · doi:10.1016/j.jspi.2006.06.002
[16] GINÉ, E. and ZINN, J. (1990). Bootstrapping general empirical measures. Ann. Probab. 18 851-869. · Zbl 0706.62017 · doi:10.1214/aop/1176990862
[17] GROENEBOOM, P. (1989). Brownian motion with a parabolic drift and Airy functions. Probab. Theory Related Fields 81 79-109. · doi:10.1007/BF00343738
[18] HASTIE, T., TIBSHIRANI, R. and BUJA, A. (1994). Flexible discriminant analysis by optimal scoring. J. Amer. Statist. Assoc. 89 1255-1270. JSTOR: · Zbl 0812.62067 · doi:10.2307/2290989
[19] KIM, J. and POLLARD, D. (1990). Cube root asy mptotics. Ann. Statist. 18 191-219. · Zbl 0703.62063 · doi:10.1214/aos/1176347498
[20] LOH, W.-Y. and SHIH, Y.-S. (1997). Split selection methods for classification trees. Statist. Sinica 7 815-840. · Zbl 1067.62545
[21] POLLARD, D. (1990). Empirical Processes: Theory and Applications. IMS, Hay ward, CA. · Zbl 0741.60001
[22] SERFLING, R. J. (1980). Approximation Theorems of Mathematical Statistics. Wiley, New York. · Zbl 0538.62002
[23] BERKELEY, CA 94720-3860 E-MAIL: biny u@stat.berkeley.edu
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.