×

zbMATH — the first resource for mathematics

On the asymptotics of random forests. (English) Zbl 1337.62063
Summary: The last decade has witnessed a growing interest in random forest models which are recognized to exhibit good practical performance, especially in high-dimensional settings. On the theoretical side, however, their predictive power remains largely unexplained, thereby creating a gap between theory and practice. In this paper, we present some asymptotic results on random forests in a regression framework. Firstly, we provide theoretical guarantees to link finite forests used in practice (with a finite number \(M\) of trees) to their asymptotic counterparts (with \(M = \infty\)). Using empirical process theory, we prove a uniform central limit theorem for a large class of random forest estimates, which holds in particular for L. Breiman’s original forests [Mach. Learn. 45, No. 1, 5–32 (2001; Zbl 1007.68152)]. Secondly, we show that infinite forest consistency implies finite forest consistency and thus, we state the consistency of several infinite forests. In particular, we prove that \(q\) quantile forests – close in spirit to Breiman’s [loc. cit.] forests but easier to study – are able to combine inconsistent trees to obtain a final consistent prediction, thus highlighting the benefits of random forests compared to single trees.

MSC:
62G05 Nonparametric estimation
62G20 Asymptotic properties of nonparametric inference
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Biau, G., Analysis of a random forests model, J. Mach. Learn. Res., 13, 1063-1095, (2012) · Zbl 1283.62127
[2] Biau, G.; Devroye, L.; Lugosi, G., Consistency of random forests and other averaging classifiers, J. Mach. Learn. Res., 9, 2015-2033, (2008) · Zbl 1225.62081
[3] Bongiorno, E. G.; Goia, A.; Salinelli, E.; Vieu, P., Contributions in infinite-dimensional statistics and related topics, (2014), Società Editrice Esculapio
[4] Boucheron, S.; Lugosi, G.; Massart, P., Concentration inequalities: A nonasymptotic theory of independence, (2013), Oxford University Press · Zbl 1279.60005
[5] Breiman, L., Bagging predictors, Mach. Learn., 24, 123-140, (1996) · Zbl 0858.68080
[6] Breiman, L., Some infinity theory for predictor ensembles. technical report 577, (2000), UC Berkeley
[7] Breiman, L., Random forests, Mach. Learn., 45, 5-32, (2001) · Zbl 1007.68152
[8] Breiman, L.; Friedman, J.; Olshen, R. A.; Stone, C. J., Classification and regression trees, (1984), Chapman & Hall New York · Zbl 0541.62042
[9] Clémençon, S.; Depecker, M.; Vayatis, N., Ranking forests, J. Mach. Learn. Res., 14, 39-73, (2013) · Zbl 1307.68065
[10] Cutler, A.; Zhao, G., Pert-perfect random tree ensembles, Comput. Sci. Stat., 33, 490-497, (2001)
[11] M. Denil, D. Matheson, N. de Freitas, Consistency of online random forests, 2013. arXiv:1302.4853.
[12] Devroye, L.; Györfi, L.; Lugosi, G., A probabilistic theory of pattern recognition, (1996), Springer New York · Zbl 0853.68150
[13] Díaz-Uriarte, R.; Alvarez de Andrés, S., Gene selection and classification of microarray data using random forest, BMC Bioinform., 7, 1-13, (2006)
[14] Dietterich, T. G.; Kong, E. B., Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. technical report, (1995), Department of Computer Science Oregon State University
[15] Ferraty, F.; Vieu, P., Nonparametric functional data analysis: theory and practice, (2006), Springer Science & Business Media · Zbl 1119.62046
[16] R. Genuer, J.-M. Poggi, C. Tuleau, Random forests: some methodological insights, 2008. arXiv:0811.3619.
[17] Geurts, P.; Ernst, D.; Wehenkel, L., Extremely randomized trees, Mach. Learn., 63, 3-42, (2006) · Zbl 1110.68124
[18] B. Gregorutti, B. Michel, P. Saint-Pierre, Grouped variable importance with random forests and application to multivariate functional data analysis, 2014. arXiv:1411.4170.
[19] Györfi, L.; Kohler, M.; Krzyżak, A.; Walk, H., A distribution-free theory of nonparametric regression, (2002), Springer New York · Zbl 1021.62024
[20] Hamza, M.; Laroque, D., An empirical comparison of ensemble methods based on classification trees, J. Stat. Comput. Simul., 75, 629-643, (2005) · Zbl 1075.62051
[21] Ho, T., The random subspace method for constructing decision forests, Pattern Anal. Mach. Intell., 20, 8, 832-844, (1998)
[22] Horváth, L.; Kokoszka, P., Inference for functional data with applications, (2012), Springer Science & Business Media · Zbl 1279.62017
[23] Ishwaran, H., The effect of splitting on random forests, Mach. Learn., 1-44, (2013)
[24] Ishwaran, H.; Kogalur, U. B., Consistency of random survival forests, Statist. Probab. Lett., 80, 1056-1064, (2010) · Zbl 1190.62177
[25] Ishwaran, H.; Kogalur, U. B.; Blackstone, E. H.; Lauer, M. S., Random survival forest, Ann. Appl. Stat., 2, 841-860, (2008) · Zbl 1149.62331
[26] B. Lakshminarayanan, D.M. Roy, Y.W. Teh, Mondrian forests: Efficient online random forests, 2014. arXiv:1406.2673.
[27] Liaw, A.; Wiener, M., Classification and regression by randomforest, R News, 2, 18-22, (2002)
[28] Meinshausen, N., Quantile regression forests, J. Mach. Learn. Res., 7, 983-999, (2006) · Zbl 1222.68262
[29] L. Mentch, G. Hooker, Ensemble trees and clts: Statistical inference for supervised learning, 2014. arXiv:1404.6473.
[30] Poggi, J.-M.; Tuleau, C., Classification supervisée en grande dimension. application à l’agrément de conduite automobile, Rev. Stat. Appl., 54, 41-60, (2006)
[31] Qi, Y., Ensemble machine learning, 307-323, (2012), Springer, Chapter Random forest for bioinformatics
[32] Ramsay, J. O.; Silverman, B. W., Functional data analysis, (2005), Springer · Zbl 0882.62002
[33] Rogez, G.; Rihan, J.; Ramalingam, S.; Orrite, C.; Torr, P. H., Randomized trees for human pose detection, (IEEE Conference on Computer Vision and Pattern Recognition, 2008, CVPR 2008, (2008), IEEE), 1-8
[34] E. Scornet, G. Biau, J.-P. Vert, Consistency of random forests, 2014. arXiv:1405.2881. · Zbl 1317.62028
[35] Stone, C. J., Consistent nonparametric regression, Ann. Statist., 5, 595-645, (1977) · Zbl 0366.62051
[36] van der Vaart, Aad W.; Wellner, Jon A., Weak convergence and empirical processes: with applications to statistics, (1996), Springer New York · Zbl 0862.60002
[37] S. Wager, Asymptotic theory for random forests, 2014. arXiv:1405.0352.
[38] Wager, S.; Hastie, T.; Efron, B., Confidence intervals for random forests: the jackknife and the infinitesimal jackknife, J. Mach. Learn. Res., 15, 1625-1651, (2014) · Zbl 1319.62132
[39] Zhu, R.; Zeng, D.; Kosorok, M. R., Reinforcement learning trees. technical report, (2012), University of North Carolina
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.