##
**On bagging and nonlinear estimation.**
*(English)*
Zbl 1104.62047

Summary: We propose an elementary model for the way in which stochastic perturbations of a statistical objective function, such as a negative log-likelihood, produce excessive nonlinear variation of the resulting estimator. The theory for the model is transparently simple, and is used to provide new insight into the main factors that affect performance of bagging. In particular, it is shown that if the perturbations are sufficiently symmetric then bagging will not significantly increase bias; and if the perturbations also offer opportunities for cancellation then bagging will reduce variance.

For the first property it is sufficient that the third derivative of a perturbation vanishes locally, and for the second, that second and fourth derivatives have opposite signs. Functions that satisfy these conditions resemble sinusoids. Therefore, our results imply that bagging will reduce the nonlinear variation, as measured by either variance or mean-squared error, produced in an estimator by sinusoid-like, stochastic perturbations of the objective function. Analysis of our simple model also suggests relationships between the results obtained using different with-replacement and without-replacement bagging schemes. We simulate regression trees in settings that are far more complex than those explicitly addressed by the model, and find that these relationships are generally borne out.

For the first property it is sufficient that the third derivative of a perturbation vanishes locally, and for the second, that second and fourth derivatives have opposite signs. Functions that satisfy these conditions resemble sinusoids. Therefore, our results imply that bagging will reduce the nonlinear variation, as measured by either variance or mean-squared error, produced in an estimator by sinusoid-like, stochastic perturbations of the objective function. Analysis of our simple model also suggests relationships between the results obtained using different with-replacement and without-replacement bagging schemes. We simulate regression trees in settings that are far more complex than those explicitly addressed by the model, and find that these relationships are generally borne out.

### MSC:

62G09 | Nonparametric statistical resampling methods |

62E20 | Asymptotic distribution theory in statistics |

62F40 | Bootstrap, jackknife and other resampling methods |

62G05 | Nonparametric estimation |

62F10 | Point estimation |

### Keywords:

bias; bootstrap; half-sampling; regression tree; variance reduction; with-replacement sampling; without-replacement sampling
PDF
BibTeX
XML
Cite

\textit{J. H. Friedman} and \textit{P. Hall}, J. Stat. Plann. Inference 137, No. 3, 669--683 (2007; Zbl 1104.62047)

### References:

[1] | Breiman, L., Bagging predictors, Mach. learning, 24, 123-140, (1996) · Zbl 0858.68080 |

[2] | Breiman, L., 1999. Using adaptive bagging to debias regressions. Technical Report No. 547, Department of Statistics, University of California, Berkeley. · Zbl 1052.68109 |

[3] | Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J., Classification and regression trees, (1984), Wadsworth Belmont, CA · Zbl 0541.62042 |

[4] | Bühlmann, P.; Yu, B., Analyzing bagging, Ann. statist., 30, 927-961, (2000) · Zbl 1029.62037 |

[5] | Buja, A., Steutzle, W., 2000a. Bagging does not always decrease mean squared error. Manuscript. |

[6] | Buja, A., Steutzle, W., 2000b. Smoothing effects of bagging. Manuscript. |

[7] | Efron, B., The jackknife, the bootstrap and other resampling plans, (1982), SIAM Philadelphia · Zbl 0496.62036 |

[8] | Hartigan, J.A., Using subsample values as typical values, J. amer. statist. assoc., 64, 1303-1317, (1969) |

[9] | Hartigan, J.A., Error analysis by replaced samples, J. roy. statist. soc. ser. B, 33, 98-110, (1971) · Zbl 0225.62045 |

[10] | Mahalanobis, P.C., Report on the bihar crop survey: Rabi season 1943-1944, Sankhyā, 7, 269-280, (1946) |

[11] | McCarthy, P.J., 1966. Replication (an approach to the analysis of data from complex surveys). National Center for Health Statistics. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.