×

Bias-corrected random forests in regression. (English) Zbl 1514.62010

Summary: It is well known that random forests reduce the variance of the regression predictors compared to a single tree, while leaving the bias unchanged. In many situations, the dominating component in the risk turns out to be the squared bias, which leads to the necessity of bias correction. In this paper, random forests are used to estimate the regression function. Five different methods for estimating bias are proposed and discussed. Simulated and real data are used to study the performance of these methods. Our proposed methods are significantly effective in reducing bias in regression context.

MSC:

62-08 Computational methods for problems pertaining to statistics
68T05 Learning and adaptive systems in artificial intelligence

Software:

randomForest
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Biau, G. and Devroye, L. 2010. On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification. J. Multivariate Anal., 101: 2499-2518. · Zbl 1198.62048 · doi:10.1016/j.jmva.2010.06.019
[2] Breiman, L. 1996. Bagging predictors. Mach. Learn., 24: 123-140. · Zbl 0858.68080 · doi:10.1007/BF00058655
[3] Breiman, L. 1998. Arcing classifiers, discussion paper. Ann. Statist., 26: 801-824. · Zbl 0934.62064 · doi:10.1214/aos/1024691079
[4] Breiman, L. 1999. “Using adaptive bagging to debias regressions”. University of California at Berkeley. Tech. Rep. 547 February · Zbl 1052.68109
[5] Breiman, L. 2001. Random forests. Mach. Learn., 45: 5-32. · Zbl 1007.68152 · doi:10.1023/A:1010933404324
[6] Brence, M. J.R. and Brown, D. E. 2006. “Improving the robust random forest regression algorithm”. Systems and Information Engineering Technical Papers, University of Virginia.
[7] Friedman, J. 1991. Multivariate adaptive regression splines. Ann. Statist., 19: 1-67. · Zbl 0765.62064 · doi:10.1214/aos/1176347963
[8] Geman, S., Bienenstock, E. and Doursat, R. 1992. Neural networks and the bias/variance dilemma. Neural Comput., 4: 1-58. · doi:10.1162/neco.1992.4.1.1
[9] Liaw, A. and Wiener, M. 2006. Classification and regression by randomforest. R News, 2: 18-22.
[10] Lin, Y. and Jeon, Y. 2006. Random forests and adaptive nearest neighbors. J. Amer. Statist. Assoc., 101: 578-590. · Zbl 1119.62304
[11] Segal, M. R. 2004. Machine learning benchmarks and random forest regression. Available at http://repositories.cdlib.org/cbmb/benchrfregnhttp://repositories.cdlib.org/cbmb/benchrfregn
[12] Strobl, C., Boulesteix, A. L., Zeileis, A. and Hothorn, T. 2007. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform., 8: 25 · doi:10.1186/1471-2105-8-25
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.