Random forest estimation of conditional distribution functions and conditional quantiles. (English) Zbl 07633945

Summary: We propose a theoretical study of two realistic estimators of conditional distribution functions using random forests. The estimation process uses the bootstrap samples generated from the original dataset when constructing the forest. Bootstrap samples are reused to define the first estimator, while the second uses the original sample, once the forest has been built. We prove that both proposed estimators of the conditional distribution functions are consistent uniformly a.s. To the best of our knowledge, it is the first proof of a.s. consistency (previous consistency results are in \(L^2\) norm or in probability) and including the bootstrap part. The consistency result holds for a large class of functions, including additive models and products. The consistency of conditional quantiles estimators follows that of distribution functions estimators using standard arguments.


62-XX Statistics
Full Text: DOI arXiv Link


[1] Yali Amit and Donald Geman. Shape quantization and recognition with randomized trees. Neural computation, 9(7):1545-1588, 1997.
[2] Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B Shah. Julia: A fresh approach to numerical computing. SIAM review, 59(1):65-98, 2017. · Zbl 1356.68030
[3] Gérard Biau and Luc Devroye. On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification. Journal of Multivariate Analysis, 101(10):2499-2518, 2010. · Zbl 1198.62048
[4] Gérard Biau. Analysis of a random forests model. Journal of Machine Learning Research, 13(Apr):1063-1095, 2012. · Zbl 1283.62127
[5] Leo Breiman. Bagging Predictors. Machine Learning, 24(2):123-140, 1996. · Zbl 0858.68080
[6] Leo Breiman. Random forests. Machine learning, 45(1):5-32, 2001. · Zbl 1007.68152
[7] Leo Breiman. Consistency for a simple model of random forests. 2004. · Zbl 1105.62308
[8] Leo Breiman, Jerome H Friedman, Richard A Olshen, and Charles J Stone. Classification and regression trees. Wadsworth and Brooks/Cole Monterey, CA, USA, 1984. · Zbl 0541.62042
[9] Thomas Browne, Jean-Claude Fort, Bertrand Iooss, and Loïc Le Gratiet. Estimate of quantile-oriented sensitivity indices. 2017.
[10] Luc Devroye, László Györfi, and Gábor Lugosi. A probabilistic theory of pattern recognition, volume 31. Springer Science & Business Media, 2013. · Zbl 0853.68150
[11] Thomas G Dietterich. Ensemble methods in machine learning. In International workshop on multiple classifier systems, pages 1-15. Springer, 2000.
[12] B. Efron. Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7:1-26, 1979. · Zbl 0406.62024
[13] Kévin Elie-Dit-Cosaque. qosa-indices, a python package available at: https://gitlab.com/qosa_index/qosa, 2020.
[14] Kévin Elie-Dit-Cosaque and Véronique Maume-Deschamps. Goal-oriented shapley effects with a special attention to the quantile-oriented case. SIAM/ASA Journal on Uncertainty Quantification - JUQ, to appear. · Zbl 1498.62078
[15] Benoit Fabrège and Véronique Maume-Deschamps. Conditional distribution forest: a julia package available at https://github.com/bfabreges/conditionaldistributionforest.jl, 2020.
[16] Jean-Claude Fort, Thierry Klein, and Nabil Rachdi. New sensitivity analysis subordinated to a contrast. Communications in Statistics-Theory and Methods, 45(15):4349-4364, 2016. · Zbl 1397.62592
[17] Benjamin Goehry. Random forests for time-dependent processes. 2019. · Zbl 1455.62172
[18] László Györfi, Michael Kohler, Adam Krzyzak, and Harro Walk. A distribution-free theory of nonparametric regression. Springer Science & Business Media, 2006. · Zbl 1021.62024
[19] Tin Kam Ho. The random subspace method for constructing decision forests. IEEE transactions on pattern analysis and machine intelligence, 20(8):832-844, 1998.
[20] Jason M. Klusowski. Analyzing cart. 2020.
[21] Roger Koenker and Kevin F Hallock. Quantile regression. Journal of economic perspectives, 15(4):143-156, 2001.
[22] Yi Lin and Yongho Jeon. Random forests and adaptive nearest neighbors. Journal of the American Statistical Association, 101(474):578-590, 2006. · Zbl 1119.62304
[23] Véronique Maume-Deschamps and Ibrahima Niang. Estimation of quantile oriented sensitivity indices. Statistics & Probability Letters, 134:122-127, 2018. · Zbl 1436.62147
[24] Véronique Maume-Deschamps, Didier Rullière, and A Usseglio-Carleve. Quantile predictions for elliptical random fields. Journal of Multivariate Analysis, 159:1-17, 2017. · Zbl 1368.60040
[25] Nicolai Meinshausen. Quantile regression forests. Journal of Machine Learning Research, 7(Jun):983-999, 2006. · Zbl 1222.68262
[26] Nicolai Meinshausen. Quantile regression forests, a r package available at https://cran.r-project.org/package=quantregforest., 2019. · Zbl 1222.68262
[27] Lucas Mentch and Giles Hooker. Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. The Journal of Machine Learning Research, 17(1):841-881, 2016. · Zbl 1360.62095
[28] Jooyoung Park and Irwin W Sandberg. Universal approximation using radial-basis-function networks. Neural computation, 1991. · Zbl 0794.41010
[29] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830, 2011. · Zbl 1280.68189
[30] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2019.
[31] Erwan Scornet. On the asymptotics of random forests. Journal of Multivariate Analysis, 146:72-83, 2016. · Zbl 1337.62063
[32] Erwan Scornet. Promenade en forêts aléatoires. MATAPLI, 111, 2016. · Zbl 1366.62074
[33] Erwan Scornet. Random forests and kernel methods. IEEE Transactions on Information Theory, 62(3):1485-1500, 2016. · Zbl 1359.94969
[34] Erwan Scornet, Gérard Biau, and Jean-Philippe Vert. Supplementary materials for: Consistency of random forests. arXiv, 1510, 2015. · Zbl 1317.62028
[35] Erwan Scornet, Gérard Biau, Jean-Philippe Vert, et al. Consistency of random forests. The Annals of Statistics, 43(4):1716-1741, 2015. · Zbl 1317.62028
[36] V. N. Vapnik and A. Ya. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264-280, 1971. · Zbl 0247.60005
[37] Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523):1228-1242, 2018. · Zbl 1402.62056
[38] Stefan Wager and Guenther Walther. Adaptive concentration of regression trees, with application to random forests. arXiv preprint arXiv:1503.06388, 2015.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.