zbMATH — the first resource for mathematics

A random forest guided tour. (English) Zbl 1402.62133
Summary: The random forest algorithm, proposed by L. Breiman [Mach. Learn. 45, No. 1, 5–32 (2001; Zbl 1007.68152)], has been extremely successful as a general-purpose classification and regression method. The approach, which combines several randomized decision trees and aggregates their predictions by averaging, has shown excellent performance in settings where the number of variables is much larger than the number of observations. Moreover, it is versatile enough to be applied to large-scale problems, is easily adapted to various ad hoc learning tasks, and returns measures of variable importance. The present article reviews the most recent theoretical and methodological developments for random forests. Emphasis is placed on the mathematical forces driving the algorithm, with special attention given to the selection of parameters, the resampling mechanism, and variable importance measures. This review is intended to provide non-experts easy access to the main ideas.

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G08 Nonparametric regression and quantile regression
62G09 Nonparametric statistical resampling methods
68T05 Learning and adaptive systems in artificial intelligence
68R10 Graph theory (including graph drawing) in computer science
62-02 Research exposition (monographs, survey articles) pertaining to statistics
Full Text: DOI
[1] Amaratunga, D; Cabrera, J; Lee, Y-S, Enriched random forests, Bioinformatics, 24, 2010-2014, (2008)
[2] Amit, Y; Geman, D, Shape quantization and recognition with randomized trees, Neural Comput, 9, 1545-1588, (1997)
[3] Archer, KJ; Kimes, RV, Empirical characterization of random forest variable importance measures, Comput Stat Data Anal, 52, 2249-2260, (2008) · Zbl 1452.62027
[4] Arlot S, Genuer R (2014) Analysis of purely random forests bias. arXiv:1407.3939 · Zbl 1402.62131
[5] Auret, L; Aldrich, C, Empirical comparison of tree ensemble variable importance measures, Chemom Intell Lab Syst, 105, 157-170, (2011)
[6] Bai, Z-H; Devroye, L; Hwang, H-K; Tsai, T-H, Maxima in hypercubes, Random Struct Algorithms, 27, 290-309, (2005) · Zbl 1080.60007
[7] Banerjee, M; McKeague, IW, Confidence sets for split points in decision trees, Ann Stat, 35, 543-574, (2007) · Zbl 1117.62037
[8] Barndorff-Nielsen, O; Sobel, M, On the distribution of the number of admissible points in a vector random sample, Theory Probab Appl, 11, 249-269, (1966) · Zbl 0278.60007
[9] Bengio, Y, Learning deep architectures for AI, Found Trends Mach Learn, 2, 1-127, (2009) · Zbl 1192.68503
[10] Bernard, S; Heutte, L; Adam, S; Huang, D-S (ed.); Wunsch, DC (ed.); Levine, DS (ed.); Jo, K-H (ed.), Forest-RK: a new random forest induction method, 430-437, (2008), Berlin
[11] Bernard, S; Adam, S; Heutte, L, Dynamic random forests, Pattern Recognit Lett, 33, 1580-1586, (2012)
[12] Biau, G, Analysis of a random forests model, J Mach Learn Res, 13, 1063-1095, (2012) · Zbl 1283.62127
[13] Biau, G; Devroye, L, On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification, J Multivar Anal, 101, 2499-2518, (2010) · Zbl 1198.62048
[14] Biau, G; Devroye, L, Cellular tree classifiers, Electron J Stat, 7, 1875-1912, (2013) · Zbl 1293.62067
[15] Biau, G; Devroye, L; Lugosi, G, Consistency of random forests and other averaging classifiers, J Mach Learn Res, 9, 2015-2033, (2008) · Zbl 1225.62081
[16] Biau, G; Cérou, F; Guyader, A, On the rate of convergence of the bagged nearest neighbor estimate, J Mach Learn Res, 11, 687-712, (2010) · Zbl 1242.62025
[17] Boulesteix, A-L; Janitza, S; Kruppa, J; König, IR, Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics, Wiley Interdiscip Rev Data Mining Knowl Discov, 2, 493-507, (2012)
[18] Breiman, L, Bagging predictors, Mach Learn, 24, 123-140, (1996) · Zbl 0858.68080
[19] Breiman L (2000a) Some infinity theory for predictor ensembles. Technical Report 577, University of California, Berkeley
[20] Breiman, L, Randomizing outputs to increase prediction accuracy, Mach Learn, 40, 229-242, (2000) · Zbl 0962.68143
[21] Breiman, L, Random forests, Mach Learn, 45, 5-32, (2001) · Zbl 1007.68152
[22] Breiman L (2003a) Setting up, using, and understanding random forests V3.1. https://www.stat.berkeley.edu/ breiman/Using_random_forests_V3.1.pdf
[23] Breiman L (2003b) Setting up, using, and understanding random forests V4.0. https://www.stat.berkeley.edu/ breiman/Using_random_forests_v4.0.pdf
[24] Breiman L (2004) Consistency for a simple model of random forests. Technical Report 670, University of California, Berkeley
[25] Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman & Hall/CRC, Boca Raton · Zbl 0541.62042
[26] Bühlmann, P; Yu, B, Analyzing bagging, Ann Stat, 30, 927-961, (2002) · Zbl 1029.62037
[27] Chen C, Liaw A, Breiman L (2004) Using random forest to learn imbalanced data. Technical Report 666, University of California, Berkeley · Zbl 0987.68896
[28] Clémençon, S; Depecker, M; Vayatis, N, Ranking forests, J Mach Learn Res, 14, 39-73, (2013) · Zbl 1307.68065
[29] Clémençon, S; Vayatis, N, Tree-based ranking methods, IEEE Trans Inform Theory, 55, 4316-4336, (2009) · Zbl 1262.68150
[30] Criminisi, A; Shotton, J; Konukoglu, E, Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning, Found Trends Comput Graph Vis, 7, 81-227, (2011) · Zbl 1243.68235
[31] Crookston, NL; Finley, AO, Yaimpute: an R package for \(k\)NN imputation, J Stat Softw, 23, 1-16, (2008)
[32] Cutler, A; Zhao, G, PERT—perfect random tree ensembles, Comput Sci Stat, 33, 490-497, (2001)
[33] Cutler, DR; Edwards, TC; Beard, KH; Cutler, A; Hess, KT; Gibson, J; Lawler, JJ, Random forests for classification in ecology, Ecology, 88, 2783-2792, (2007)
[34] Davies A, Ghahramani Z (2014) The Random Forest Kernel and creating other kernels for big data from random partitions. arXiv:1402.4293
[35] Deng H, Runger G (2012) Feature selection via regularized trees. In: The 2012 international joint conference on neural networks, pp 1-8 · Zbl 1222.68262
[36] Deng, H; Runger, G, Gene selection with guided regularized random forest, Pattern Recognit, 46, 3483-3489, (2013)
[37] Denil M, Matheson D, de Freitas N (2013) Consistency of online random forests. In: International conference on machine learning (ICML)
[38] Denil M, Matheson D, de Freitas N (2014) Narrowing the gap: random forests in theory and in practice. In: International conference on machine learning (ICML) · Zbl 1452.62027
[39] Désir, C; Bernard, S; Petitjean, C; Heutte, L, One class random forests, Pattern Recognit, 46, 3490-3506, (2013)
[40] Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, New York · Zbl 0853.68150
[41] Díaz-Uriarte R, Alvarez de Andrés S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7:1-13 · Zbl 1320.62158
[42] Dietterich TG (2000) Ensemble methods in machine learning. In: Kittler J, Roli F (eds) Multiple classifier systems. Springer, Berlin, pp 1-15 · Zbl 1190.62177
[43] Efron, B, Bootstrap methods: another look at the jackknife, Ann Stat, 7, 1-26, (1979) · Zbl 0406.62024
[44] Efron B (1982) The jackknife, the bootstrap and other resampling plans, vol 38. CBMS-NSF Regional Conference Series in Applied Mathematics, Philadelphia · Zbl 0496.62036
[45] Fink, D; Hochachka, WM; Zuckerberg, B; Winkler, DW; Shaby, B; Munson, MA; Hooker, G; Riedewald, M; Sheldon, D; Kelling, S, Spatiotemporal exploratory models for broad-scale survey data, Ecol Appl, 20, 2131-2147, (2010)
[46] Friedman J, Hastie T, Tibshirani R (2009) The elements of statistical learning, 2nd edn. Springer, New York · Zbl 1273.62005
[47] Genuer, R, Variance reduction in purely random forests, J Nonparametr Stat, 24, 543-562, (2012) · Zbl 1254.62050
[48] Genuer, R; Poggi, J-M; Tuleau-Malot, C, Variable selection using random forests, Pattern Recognit Lett, 31, 2225-2236, (2010)
[49] Geremia E, Menze BH, Ayache N (2013) Spatially adaptive random forests. In: IEEE international symposium on biomedical imaging: from nano to macro, pp 1332-1335 · Zbl 0406.62024
[50] Geurts, P; Ernst, D; Wehenkel, L, Extremely randomized trees, Mach Learn, 63, 3-42, (2006) · Zbl 1110.68124
[51] Gregorutti B, Michel B, Saint Pierre P (2016) Correlation and variable importance in random forests. Stat Comput. doi:10.1007/s11222-016-9646-1 · Zbl 06737690
[52] Guyon, I; Weston, J; Barnhill, S; Vapnik, V, Gene selection for cancer classification using support vector machines, Mach Learn, 46, 389-422, (2002) · Zbl 0998.68111
[53] Györfi L, Kohler M, Krzyżak A, Walk H (2002) A distribution-free theory of nonparametric regression. Springer, New York · Zbl 1021.62024
[54] Ho, T, The random subspace method for constructing decision forests, Pattern Anal Mach Intell, 20, 832-844, (1998)
[55] Hothorn, T; Hornik, K; Zeileis, A, Unbiased recursive partitioning: a conditional inference framework, J Comput Graph Stat, 15, 651-674, (2006)
[56] Howard J, Bowles M (2012) The two most important algorithms in predictive modeling today. In: Strata Conference: Santa Clara. http://strataconf.com/strata2012/public/schedule/detail/22658
[57] Ishioka T (2013) Imputation of missing values for unsupervised data using the proximity in random forests. In: eLmL 2013, The fifth international conference on mobile, hybrid, and on-line learning, pp 30-36. International Academy, Research, and Industry Association
[58] Ishwaran, H, Variable importance in binary regression trees and forests, Electron J Stat, 1, 519-537, (2007) · Zbl 1320.62158
[59] Ishwaran, H, The effect of splitting on random forests, Mach Learn, 99, 75-118, (2013) · Zbl 1320.62015
[60] Ishwaran, H; Kogalur, UB, Consistency of random survival forests, Stat Probab Lett, 80, 1056-1064, (2010) · Zbl 1190.62177
[61] Ishwaran, H; Kogalur, UB; Blackstone, EH; Lauer, MS, Random survival forests, Ann Appl Stat, 2, 841-860, (2008) · Zbl 1149.62331
[62] Ishwaran, H; Kogalur, UB; Chen, X; Minn, AJ, Random survival forests for high-dimensional data, Stat Anal Data Mining ASA Data Sci J, 4, 115-132, (2011)
[63] Jeffrey, D; Sanja, G, Simplified data processing on large clusters, Commun ACM, 51, 107-113, (2008)
[64] Joly, A; Geurts, P; Wehenkel, L; Calders, T (ed.); Esposito, F (ed.); Hüllermeier, E (ed.); Meo, R (ed.), Random forests with random projections of the output space for high dimensional multi-label classification, 607-622, (2014), Berlin
[65] Kim, H; Loh, W-Y, Classification trees with unbiased multiway splits, J Am Stat Assoc, 96, 589-604, (2001)
[66] Kleiner, A; Talwalkar, A; Sarkar, P; Jordan, MI, A scalable bootstrap for massive data, J Royal Stat Soc Ser B (Stat Methodol), 76, 795-816, (2014)
[67] Konukoglu E, Ganz M (2014) Approximate false positive rate control in selection frequency for random forest. arXiv:1410.2838
[68] Kruppa, J; Schwarz, A; Arminger, G; Ziegler, A, Consumer credit risk: individual probability estimates using machine learning, Expert Syst Appl, 40, 5125-5131, (2013)
[69] Kruppa, J; Liu, Y; Biau, G; Kohler, M; König, IR; Malley, JD; Ziegler, A, Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory, Biometr J, 56, 534-563, (2014) · Zbl 1441.62404
[70] Kruppa, J; Liu, Y; Diener, H-C; Holste, T; Weimar, C; König, IR; Ziegler, A, Probability estimation with machine learning methods for dichotomous and multicategory outcome: applications, Biometr J, 56, 564-583, (2014) · Zbl 1441.62405
[71] Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York · Zbl 1306.62014
[72] Kyrillidis A, Zouzias A (2014) Non-uniform feature sampling for decision tree ensembles. In: IEEE international conference on acoustics, speech and signal processing, pp 4548-4552 · Zbl 1119.62304
[73] Lakshminarayanan B, Roy DM, Teh YW (2014) Mondrian forests: efficient online random forests. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems, pp 3140-3148
[74] Latinne, P; Debeir, O; Decaestecker, C; Kittler, J (ed.); Roli, F (ed.), Limiting the number of trees in random forests, 178-187, (2001), Berlin · Zbl 0987.68896
[75] Liaw, A; Wiener, M, Classification and regression by randomforest, R News, 2, 18-22, (2002)
[76] Lin, Y; Jeon, Y, Random forests and adaptive nearest neighbors, J Am Stat Assoc, 101, 578-590, (2006) · Zbl 1119.62304
[77] Louppe G, Wehenkel L, Sutera A, Geurts P (2013) Understanding variable importances in forests of randomized trees. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, pp 431-439 · Zbl 1080.60007
[78] Malley, JD; Kruppa, J; Dasgupta, A; Malley, KG; Ziegler, A, Probability machines: consistent probability estimation using nonparametric learning machines, Methods Inform Med, 51, 74-81, (2012)
[79] Meinshausen, N, Quantile regression forests, J Mach Learn Res, 7, 983-999, (2006) · Zbl 1222.68262
[80] Meinshausen, N, Forest garrote, Electron J Stat, 3, 1288-1304, (2009) · Zbl 1326.62093
[81] Mentch L, Hooker G (2014) A novel test for additivity in supervised ensemble learners. arXiv:1406.1845 · Zbl 1319.62132
[82] Mentch L, Hooker G (2015) Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. J Mach Learn Res (in press) · Zbl 1360.62095
[83] Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA (2011) On oblique random forests. In: Gunopulos D, Hofmann T, Malerba D, Vazirgiannis M (eds) Machine learning and knowledge discovery in databases. Springer, Berlin, pp 453-469 · Zbl 1029.62037
[84] Nadaraya, EA, On estimating regression, Theory Probab Appl, 9, 141-142, (1964) · Zbl 0136.40902
[85] Nicodemus, KK; Malley, JD, Predictor correlation impacts machine learning algorithms: implications for genomic studies, Bioinformatics, 25, 1884-1890, (2009)
[86] Politis DN, Romano JP, Wolf M (1999) Subsampling. Springer, New York · Zbl 0931.62035
[87] Prasad, AM; Iverson, LR; Liaw, A, Newer classification and regression tree techniques: bagging and random forests for ecological prediction, Ecosystems, 9, 181-199, (2006)
[88] Qian, SS; King, RS; Richardson, CJ, Two statistical methods for the detection of environmental thresholds, Ecol Model, 166, 87-97, (2003)
[89] Rieger A, Hothorn T, Strobl C (2010) Random forests with missing values in the covariates. Technical Report 79, University of Munich, Munich
[90] Saffari A, Leistner C, Santner J, Godec M, Bischof H (2009) On-line random forests. In: IEEE 12th international conference on computer vision workshops, pp 1393-1400 · Zbl 1225.62081
[91] Schwarz, DF; König, IR; Ziegler, A, On safari to random jungle: a fast implementation of random forests for high-dimensional data, Bioinformatics, 26, 1752-1758, (2010)
[92] Scornet E (2015a) On the asymptotics of random forests. J Multivar Anal 146:72-83 · Zbl 1337.62063
[93] Scornet E (2015b) Random forests and kernel methods. IEEE Trans Inform Theory 62:1485-1500 · Zbl 1359.94969
[94] Scornet, E; Biau, G; Vert, J-P, Consistency of random forests, Ann Stat, 43, 1716-1741, (2015) · Zbl 1317.62028
[95] Segal, MR, Regression trees for censored data, Biometrics, 44, 35-47, (1988) · Zbl 0707.62224
[96] Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: IEEE conference on computer vision and pattern recognition, pp 1297-1304 · Zbl 0278.60007
[97] Stone, CJ, Consistent nonparametric regression, Ann Stat, 5, 595-645, (1977) · Zbl 0366.62051
[98] Stone, CJ, Optimal rates of convergence for nonparametric estimators, Ann Stat, 8, 1348-1360, (1980) · Zbl 0451.62033
[99] Stone, CJ, Optimal global rates of convergence for nonparametric regression, Ann Stat, 10, 1040-1053, (1982) · Zbl 0511.62048
[100] Strobl, C; Boulesteix, A-L; Kneib, T; Augustin, T; Zeileis, A, Conditional variable importance for random forests, BMC Bioinform, 9, 307, (2008)
[101] Svetnik, V; Liaw, A; Tong, C; Culberson, JC; Sheridan, RP; Feuston, BP, Random forest: a classification and regression tool for compound classification and QSAR modeling, J Chem Inform Comput Sci, 43, 1947-1958, (2003)
[102] Toloşi, L; Lengauer, T, Classification with correlated features: unreliability of feature ranking and solutions, Bioinformatics, 27, 1986-1994, (2011) · Zbl 1235.93089
[103] Truong AKY (2009) Fast growing and interpretable oblique trees via logistic regression models. PhD thesis, University of Oxford, Oxford
[104] Varian, H, Big data: new tricks for econometrics, J Econ Perspect, 28, 3-28, (2014)
[105] Wager S (2014) Asymptotic theory for random forests. arXiv:1405.0352
[106] Wager, S; Hastie, T; Efron, B, Confidence intervals for random forests: the jackknife and the infinitesimal jackknife, J Mach Learn Res, 15, 1625-1651, (2014) · Zbl 1319.62132
[107] Watson GS (1964) Smooth regression analysis. Sankhy\(\bar{a}\) Ser A 26:359-372 · Zbl 0137.13002
[108] Welbl, J; Jiang, X (ed.); Hornegger, J (ed.); Koch, R (ed.), Casting random forests as artificial neural networks and profiting from it, 765-771, (2014), Berlin
[109] Winham, SJ; Freimuth, RR; Biernacka, JM, A weighted random forests approach to improve predictive performance, Stat Anal Data Mining ASA Data Sci J, 6, 496-505, (2013) · Zbl 1281.62238
[110] Yan, D; Chen, A; Jordan, MI, Cluster forests, Comput Stat Data Anal, 66, 178-192, (2013) · Zbl 06958982
[111] Yang F, Wang J, Fan G (2010) Kernel induced random survival forests. arXiv:1008.3952
[112] Yi Z, Soatto S, Dewan M, Zhan Y (2012) Information forests. In: 2012 information theory and applications workshop, pp 143-146
[113] Zhu R, Zeng D, Kosorok MR (2015) Reinforcement learning trees. J Am Stat Assoc 110(512):1770-1784 · Zbl 1374.68466
[114] Ziegler, A; König, IR, Mining data with random forests: current options for real-world applications, Wiley Interdiscip Rev Data Mining Knowl Discov, 4, 55-63, (2014)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.