×

zbMATH — the first resource for mathematics

Randomization as regularization: a degrees of freedom explanation for random forest success. (English) Zbl 07307469
Summary: Random forests remain among the most popular off-the-shelf supervised machine learning tools with a well-established track record of predictive accuracy in both regression and classification settings. Despite their empirical success as well as a bevy of recent work investigating their statistical properties, a full and satisfying explanation for their success has yet to be put forth. Here we aim to take a step forward in this direction by demonstrating that the additional randomness injected into individual trees serves as a form of implicit regularization, making random forests an ideal model in low signal-to-noise ratio (SNR) settings. Specifically, from a model-complexity perspective, we show that the mtry parameter in random forests serves much the same purpose as the shrinkage penalty in explicitly regularized regression procedures like lasso and ridge regression. To highlight this point, we design a randomized linear-model-based forward selection procedure intended as an analogue to tree-based random forests and demonstrate its surprisingly strong empirical performance. Numerous demonstrations on both real and synthetic data are provided.
MSC:
68T05 Learning and adaptive systems in artificial intelligence
PDF BibTeX XML Cite
Full Text: Link
References:
[1] Mehreen Ahmed, Maham Jahangir, Hammad Afzal, Awais Majeed, and Imran Siddiqi. Using crowd-source based features from social media and conventional features to predict the movies popularity.In2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pages 273-278. IEEE, 2015.
[2] Yali Amit and Donald Geman. Shape quantization and recognition with randomized trees. Neural computation, 9(7):1545-1588, 1997.
[3] Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machinelearning practice and the classical bias-variance trade-off.Proceedings of the National Academy of Sciences, 116(32):15849-15854, 2019. · Zbl 1433.68325
[4] Christel AS Bergstr¨om, Ulf Norinder, Kristina Luthman, and Per Artursson. Molecular descriptors influencing melting point and their role in classification of solid drugs.Journal of chemical information and computer sciences, 43(4):1177-1185, 2003.
[5] Simon Bernard, S´ebastien Adam, and Laurent Heutte. Using random forests for handwritten digit recognition. InNinth International Conference on Document Analysis and Recognition (ICDAR 2007), volume 2, pages 1043-1047. IEEE, 2007.
[6] Simon Bernard, Laurent Heutte, and S´ebastien Adam. Influence of hyperparameters on random forest accuracy. InInternational Workshop on Multiple Classifier Systems, pages 171-180. Springer, 2009.
[7] G´erard Biau. Analysis of a Random Forests Model.The Journal of Machine Learning Research, 98888:1063-1095, 2012. · Zbl 1283.62127
[8] G´erard Biau and Luc Devroye. On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification. Journal of Multivariate Analysis, 101(10):2499-2518, 2010. · Zbl 1198.62048
[9] G´erard Biau and Erwan Scornet. A random forest guided tour.Test, 25(2):197-227, 2016. · Zbl 1402.62133
[10] G´erard Biau, Luc Devroye, and G´abor Lugosi. Consistency of Random Forests and Other Averaging Classifiers.The Journal of Machine Learning Research, 9:2015-2033, 2008. · Zbl 1225.62081
[11] G´erard Biau, Erwan Scornet, and Johannes Welbl. Neural random forests.Sankhya A, pages 1-40, 2016. · Zbl 1437.62228
[12] Leo Breiman. Bagging predictors.Machine Learning, 24:123-140, 1996. · Zbl 0858.68080
[13] Leo Breiman. Randomizing outputs to increase prediction accuracy.Machine Learning, 40 (3):229-242, 2000. · Zbl 0962.68143
[14] Leo Breiman. Random Forests.Machine Learning, 45:5-32, 2001. · Zbl 1007.68152
[15] Leo Breiman, Jerome Friedman, Charles J. Stone, and R.A. Olshen.Classification and Regression Trees. Wadsworth, Belmont, CA, 1st edition, 1984. · Zbl 0541.62042
[16] Arthur Cammarata. Interrelation of the regression models used for structure-activity analyses.Journal of medicinal chemistry, 15(6):573-577, 1972.
[17] Scott Shaobing Chen, David L Donoho, and Michael A Saunders. Atomic decomposition by basis pursuit.SIAM review, 43(1):129-159, 2001. · Zbl 0979.94010
[18] Tim Coleman, Lucas Mentch, Daniel Fink, Frank La Sorte, Giles Hooker, Wesley Hochachka, and David Winkler. Statistical inference on tree swallow migrations with random forests.arXiv preprint arXiv:1710.09793, 2017.
[19] Tim Coleman, Wei Peng, and Lucas Mentch. Scalable and efficient hypothesis testing with random forests.arXiv preprint arXiv:1904.07830, 2019.
[20] Yifan Cui, Ruoqing Zhu, Mai Zhou, and Michael Kosorok. Some asymptotic results of survival tree and forest models.arXiv preprint arXiv:1707.09631, 2017. · Zbl 1379.62066
[21] D Richard Cutler, Thomas C Edwards Jr, Karen H Beard, Adele Cutler, Kyle T Hess, Jacob Gibson, and Joshua J Lawler. Random forests for classification in ecology.Ecology, 88 (11):2783-2792, 2007.
[22] Ram´on D´ıaz-Uriarte and Sara Alvarez De Andres. Gene selection and classification of microarray data using random forest.BMC bioinformatics, 7(1):3, 2006.
[23] Thomas G Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization.Machine learning, 40 (2):139-157, 2000.
[24] Dheeru Dua and Casey Graff.UCI machine learning repository, 2017.URLhttp:// archive.ics.uci.edu/ml.
[25] Roxane Duroux and Erwan Scornet. Impact of subsampling and pruning on random forests. arXiv preprint arXiv:1603.04261, 2016. · Zbl 1409.62072
[26] Bradley Efron. How biased is the apparent error rate of a prediction rule?Journal of the American statistical Association, 81(394):461-470, 1986. · Zbl 0621.62073
[27] Bradley Efron and Robert Tibshirani.Generalized Additive Models. Chapman and Hall, London, 1990.
[28] Phillip Ein-Dor and Jacob Feldmesser. Attributes of the performance of central processing units: A relative performance prediction model.Communications of the ACM, 30(4): 308-318, 1987.
[29] Hadi Fanaee-T and Joao Gama. Event labeling combining ensemble detectors and background knowledge.Progress in Artificial Intelligence, 2(2-3):113-127, 2014.
[30] Gabriele Fanelli, Matthias Dantone, Juergen Gall, Andrea Fossati, and Luc Van Gool. Random forests for real time 3d face analysis.International Journal of Computer Vision, 101(3):437-458, 2013.
[31] Manuel Fern´andez-Delgado, Eva Cernadas, Sen´en Barro, and Dinani Amorim. Do we need hundreds of classifiers to solve real world classification problems?The Journal of Machine Learning Research, 15(1):3133-3181, 2014. · Zbl 1319.62005
[32] Yoav Freund, Robert E Schapire, et al. Experiments with a new boosting algorithm. In icml, volume 96, pages 148-156. Citeseer, 1996.
[33] Jerome H Friedman. Multivariate Adaptive Regression Splines.The annals of statistics, pages 1-67, 1991.
[34] Robin Genuer, Jean-Michel Poggi, and Christine Tuleau. Random forests: some methodological insights.arXiv preprint arXiv:0811.3619, 2008.
[35] Robin Genuer, Jean-Michel Poggi, and Christine Tuleau-Malot. Variable selection using random forests.Pattern Recognition Letters, 31(14):2225-2236, 2010.
[36] Rajarshi Guha and Peter C Jurs. Development of linear, ensemble, and nonlinear models for the prediction and interpretation of the biological activity of a set of pdgfr inhibitors. Journal of Chemical Information and Computer Sciences, 44(6):2179-2189, 2004.
[37] Li Guo, Nesrine Chehata, Cl´ement Mallet, and Samia Boukir. Relevance of airborne lidar and multispectral image data for urban scene classification using random forests.ISPRS Journal of Photogrammetry and Remote Sensing, 66(1):56-66, 2011.
[38] David Harrison Jr and Daniel L Rubinfeld. Hedonic housing prices and the demand for clean air.Journal of environmental economics and management, 5(1):81-102, 1978. · Zbl 0375.90023
[39] Trevor Hastie, Robert Tibshirani, and Jerome Friedman.The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, 2nd edition, 2009. · Zbl 1273.62005
[40] Trevor Hastie, Robert Tibshirani, and Ryan J Tibshirani.Extended comparisons of best subset selection, forward stepwise selection, and the lasso.arXiv preprint arXiv:1707.08692, 2017. · Zbl 1059.62524
[41] Linnan He and Peter C Jurs. Assessing the reliability of a qsar model’s predictions.Journal of Molecular Graphics and Modelling, 23(6):503-523, 2005.
[42] Giles Hooker and Lucas Mentch. Please stop permuting features: An explanation and alternatives.arXiv preprint arXiv:1905.03151, 2019. · Zbl 1360.62095
[43] Torsten Hothorn, Peter B¨uhlmann, Sandrine Dudoit, Annette Molinaro, and Mark J Van Der Laan. Survival ensembles.Biostatistics, 7(3):355-373, 2005. · Zbl 1170.62385
[44] Chen Huang, Xiaoqing Ding, and Chi Fang. Head pose estimation based on random forests for multiclass classification. In2010 20th International Conference on Pattern Recognition, pages 934-937. IEEE, 2010.
[45] Hemant Ishwaran, Udaya B Kogalur, Eugene H Blackstone, Michael S Lauer, et al. Random survival forests.The annals of applied statistics, 2(3):841-860, 2008. · Zbl 1149.62331
[46] Jason M. Klusowski. Analyzing cart.arXiv preprint 1906.10086v5, 2019a. 33
[47] Jason M. Klusowski. Sharp analysis of a simple model for random forests.arXiv preprint 1805.02587v6, 2019b.
[48] Daniel LeJeune, Hamid Javadi, and Richard G Baraniuk. The implicit regularization of ordinary least squares ensembles.arXiv preprint arXiv:1910.04743, 2019.
[49] Jinyan Li, Guozhu Dong, and Kotagiri Ramamohanarao. Instance-based classification by emerging patterns. InEuropean Conference on Principles of Data Mining and Knowledge Discovery, pages 191-200. Springer, 2000. · Zbl 0988.68552
[50] Andy Liaw, Matthew Wiener, et al. Classification and regression by randomforest.R news, 2(3):18-22, 2002.
[51] Yi Lin and Yongho Jeon. Random forests and adaptive nearest neighbors.Journal of the American Statistical Association, 101(474):578-590, 2006. · Zbl 1119.62304
[52] Miles E Lopes, Suofei Wu, and Thomas Lee. Measuring the algorithmic convergence of randomized ensembles: The regression setting.arXiv preprint arXiv:1908.01251, 2019a.
[53] Miles E Lopes et al. Estimating the algorithmic variance of randomized ensembles via the bootstrap.The Annals of Statistics, 47(2):1088-1112, 2019b. · Zbl 1415.62045
[54] Mahya Mehrmohamadi, Lucas K Mentch, Andrew G Clark, and Jason W Locasale. Integrative modelling of tumour dna methylation quantifies the contribution of metabolism. Nature communications, 7:13666, 2016.
[55] Nicolai Meinshausen. Quantile regression forests.Journal of Machine Learning Research, 7(Jun):983-999, 2006. · Zbl 1222.68262
[56] Nicolai Meinshausen. Relaxed lasso.Computational Statistics & Data Analysis, 52(1): 374-393, 2007. · Zbl 1452.62522
[57] Lucas Mentch and Giles Hooker. Quantifying uncertainty in random forests via confidence intervals and hypothesis tests.The Journal of Machine Learning Research, 17(1):841-881, 2016. · Zbl 1360.62095
[58] Lucas Mentch and Giles Hooker. Formal hypothesis tests for additive structure in random forests.Journal of Computational and Graphical Statistics, 26(3):589-597, 2017. · Zbl 1360.62095
[59] S´ergio Moro, Paulo Rita, and Bernardo Vala. Predicting social media performance metrics and evaluation of the impact on brand building: A data mining approach.Journal of Business Research, 69(9):3341-3351, 2016.
[60] Kristin K Nicodemus, James D Malley, Carolin Strobl, and Andreas Ziegler. The behaviour of random forest permutation-based variable importance measures under predictor correlation.BMC bioinformatics, 11(1):110, 2010.
[61] Matthew A Olson and Abraham J Wyner. Making sense of random forest probabilities: a kernel perspective.arXiv preprint arXiv:1812.05792, 2018.
[62] Wei Peng, Tim Coleman, and Lucas Mentch. Asymptotic distributions and rates of convergence for random forests and other resampled ensemble learners.arXiv preprint arXiv:1905.10651, 2019.
[63] Andrea Peters, Torsten Hothorn, BD Ripley, T Therneau, and B Atkinson. ipred: Improved predictors, 2019.URL https://cran.r-project.org/web/packages/ipred/ R package version 0.9-9.
[64] Anantha M Prasad, Louis R Iverson, and Andy Liaw. Newer classification and regression tree techniques: bagging and random forests for ecological prediction.Ecosystems, 9(2): 181-199, 2006.
[65] Philipp Probst and Anne-Laure Boulesteix. To tune or not to tune the number of trees in random forest.Journal of Machine Learning Research, 18:181-1, 2017. · Zbl 06982937
[66] Philipp Probst, Marvin N Wright, and Anne-Laure Boulesteix. Hyperparameters and tuning strategies for random forest.Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(3):e1301, 2019.
[67] J Ross Quinlan. Combining instance-based and model-based learning. InProceedings of the tenth international conference on machine learning, pages 236-243, 1993.
[68] Erwan Scornet. Random forests and kernel methods.IEEE Transactions on Information Theory, 62(3):1485-1500, 2016. · Zbl 1359.94969
[69] Erwan Scornet. Tuning parameters in random forests.ESAIM: Proceedings and Surveys, 60:144-162, 2017. · Zbl 1427.68273
[70] Erwan Scornet, G´erard Biau, Jean-Philippe Vert, et al. Consistency of random forests.The Annals of Statistics, 43(4):1716-1741, 2015. · Zbl 1317.62028
[71] Joseph Sexton and Petter Laake. Standard errors for bagged and random forest estimators. Computational Statistics & Data Analysis, 53(3):801-811, 2009. · Zbl 1452.62121
[72] Jon Arni Steingrimsson, Liqun Diao, and Robert L Strawderman.Censoring unbiased regression trees and ensembles.Journal of the American Statistical Association, 114 (525):370-383, 2019. · Zbl 07095883
[73] Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis, and Torsten Hothorn. Bias in random forest variable importance measures: Illustrations, sources and a solution.BMC bioinformatics, 8(1):25, 2007.
[74] Carolin Strobl, Anne-Laure Boulesteix, Thomas Kneib, Thomas Augustin, and Achim Zeileis. Conditional variable importance for random forests.BMC bioinformatics, 9 (1):307, 2008. · Zbl 1452.62469
[75] Robert Tibshirani. Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267-288, 1996. · Zbl 0850.62538
[76] Ryan J Tibshirani. Degrees of freedom and model search.Statistica Sinica, pages 1265-1296, 2015. · Zbl 1415.62058
[77] R Todeschini, P Gramatica, R Provenzani, and E Marengo. Weighted holistic invariant molecular descriptors. part 2. theory development and applications on modeling physicochemical properties of polyaromatic hydrocarbons.Chemometrics and Intelligent Laboratory Systems, 27(2):221-229, 1995.
[78] Laura Tolo¸si and Thomas Lengauer. Classification with correlated features: unreliability of feature ranking and solutions.Bioinformatics, 27(14):1986-1994, 2011.
[79] Athanasios Tsanas, Max A Little, Patrick E McSharry, and Lorraine O Ramig. Accurate telemonitoring of parkinson’s disease progression by noninvasive speech tests.IEEE transactions on Biomedical Engineering, 57(4):884-893, 2009.
[80] Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. Openml: Networked science in machine learning.SIGKDD Explorations, 15(2):49-60, 2013. doi: 10.1145/ 2641190.2641198. URLhttp://doi.acm.org/10.1145/2641190.2641198.
[81] Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests.Journal of the American Statistical Association, 113(523):1228- 1242, 2018. · Zbl 1402.62056
[82] Stefan Wager, Sida Wang, and Percy S Liang. Dropout training as adaptive regularization. InAdvances in neural information processing systems, pages 351-359, 2013.
[83] Stefan Wager, Trevor Hastie, and Bradley Efron. Confidence intervals for random forests: The jackknife and the infinitesimal jackknife.The Journal of Machine Learning Research, 15(1):1625-1651, 2014. · Zbl 1319.62132
[84] Samuel George Waugh.Extending and benchmarking Cascade-Correlation: extensions to the Cascade-Correlation architecture and benchmarking of feed-forward supervised artificial neural networks. PhD thesis, University of Tasmania, 1995.
[85] Johannes Welbl. Casting random forests as artificial neural networks (and profiting from it). InGerman Conference on Pattern Recognition, pages 765-771. Springer, 2014.
[86] Abraham J Wyner, Matthew Olson, Justin Bleich, and David Mease. Explaining the success of adaboost and random forests as interpolating classifiers.The Journal of Machine Learning Research, 18(1):1558-1590, 2017. · Zbl 1433.68384
[87] I-C Yeh. Modeling of strength of high-performance concrete using artificial neural networks. Cement and Concrete research, 28(12):1797-1808, 1998.
[88] Faisal Zaman and Hideo Hirose. Effect of subsampling rate on subbagging and related ensembles of stable classifiers. InInternational Conference on Pattern Recognition and Machine Intelligence, pages 44-49. Springer, 2009.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.