Athey, Susan; Tibshirani, Julie; Wager, Stefan Generalized random forests. (English) Zbl 1418.62102 Ann. Stat. 47, No. 2, 1148-1178 (2019). Summary: We propose generalized random forests, a method for nonparametric statistical estimation based on random forests [L. Breiman, Mach. Learn. 45, No. 1, 5–32 (2001; Zbl 1007.68152)] that can be used to fit any quantity of interest identified as the solution to a set of local moment equations. Following the literature on local maximum likelihood estimation, our method considers a weighted set of nearby training examples; however, instead of using classical kernel weighting functions that are prone to a strong curse of dimensionality, we use an adaptive weighting function derived from a forest designed to express heterogeneity in the specified quantity of interest. We propose a flexible, computationally efficient algorithm for growing generalized random forests, develop a large sample theory for our method showing that our estimates are consistent and asymptotically Gaussian and provide an estimator for their asymptotic variance that enables valid confidence intervals. We use our approach to develop new methods for three statistical tasks: nonparametric quantile regression, conditional average partial effect estimation and heterogeneous treatment effect estimation via instrumental variables. A software implementation, grf for R and C++, is available from CRAN. Cited in 1 ReviewCited in 93 Documents MSC: 62G05 Nonparametric estimation 62G07 Density estimation 62H30 Classification and discrimination; cluster analysis (statistical aspects) 68T05 Learning and adaptive systems in artificial intelligence 62G08 Nonparametric regression and quantile regression 62G20 Asymptotic properties of nonparametric inference Keywords:asymptotic theory; causal inference; instrumental variable; generalized random forest Citations:Zbl 1007.68152 Software:BayesDA; AER; CRAN; grf; ranger; BayesTree; R; BartPy × Cite Format Result Cite Review PDF Full Text: DOI arXiv Euclid References: [1] Abadie, A. (2003). Semiparametric instrumental variable estimation of treatment response models. J. Econometrics113 231-263. · Zbl 1038.62113 · doi:10.1016/S0304-4076(02)00201-4 [2] Amit, Y. and Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Comput.9 1545-1588. [3] Andrews, D. W. K. (1993). Tests for parameter instability and structural change with unknown change point. Econometrica61 821-856. · Zbl 0795.62012 · doi:10.2307/2951764 [4] Angrist, J. D. (1990). Lifetime earnings and the Vietnam era draft lottery: Evidence from social security administrative records. AER 313-336. [5] Angrist, J. D. and Evans, W. N. (1998). Children and their parents’ labor supply: Evidence from exogenous variation in family size. AER 450-477. [6] Arlot, S. and Genuer, R. (2014). Analysis of purely random forests bias. ArXiv preprint. Available at arXiv:1407.3939. · Zbl 1402.62131 [7] Athey, S. and Imbens, G. (2016). Recursive partitioning for heterogeneous causal effects. Proc. Natl. Acad. Sci. USA113 7353-7360. · Zbl 1357.62190 · doi:10.1073/pnas.1510489113 [8] Athey, S., Tibshirani, J. and Wager, S. (2018). Supplement to “Generalized random forests.” DOI:10.1214/18-AOS1709SUPP. · Zbl 1418.62102 [9] Belloni, A., Chen, D., Chernozhukov, V. and Hansen, C. (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica80 2369-2429. · Zbl 1274.62464 · doi:10.3982/ECTA9626 [10] Beygelzimer, A. and Langford, J. (2009). The offset tree for learning with partial labels. In Proceedings of KDD 129-138. ACM. [11] Biau, G. (2012). Analysis of a random forests model. J. Mach. Learn. Res.13 1063-1095. · Zbl 1283.62127 [12] Biau, G. and Devroye, L. (2010). On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification. J. Multivariate Anal.101 2499-2518. · Zbl 1198.62048 · doi:10.1016/j.jmva.2010.06.019 [13] Biau, G., Devroye, L. and Lugosi, G. (2008). Consistency of random forests and other averaging classifiers. J. Mach. Learn. Res.9 2015-2033. · Zbl 1225.62081 [14] Biau, G. and Scornet, E. (2016). A random forest guided tour. TEST25 197-227. · Zbl 1402.62133 · doi:10.1007/s11749-016-0481-7 [15] Breiman, L. (1996). Bagging predictors. Mach. Learn.24 123-140. · Zbl 0858.68080 [16] Breiman, L. (2001). Random forests. Mach. Learn.45 5-32. · Zbl 1007.68152 · doi:10.1023/A:1010933404324 [17] Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth Advanced Books and Software, Belmont, CA. · Zbl 0541.62042 [18] Bühlmann, P. and Yu, B. (2002). Analyzing bagging. Ann. Statist.30 927-961. · Zbl 1029.62037 · doi:10.1214/aos/1031689014 [19] Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. Econom. J.21 C1-C68. · Zbl 07565928 [20] Chipman, H. A., George, E. I. and McCulloch, R. E. (2010). BART: Bayesian additive regression trees. Ann. Appl. Stat.4 266-298. · Zbl 1189.62066 · doi:10.1214/09-AOAS285 [21] Darolles, S., Fan, Y., Florens, J. P. and Renault, E. (2011). Nonparametric instrumental regression. Econometrica79 1541-1565. · Zbl 1274.62277 · doi:10.3982/ECTA6539 [22] Denil, M., Matheson, D. and De Freitas, N. (2014). Narrowing the Gap: Random forests in theory and in practice. In Proceedings of ICML 665-673. [23] Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach. Learn.40 139-157. [24] Efron, B. (1982). The Jackknife, the Bootstrap and Other Resampling Plans. CBMS-NSF Regional Conference Series in Applied Mathematics38. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA. · Zbl 0496.62036 [25] Efron, B. and Stein, C. (1981). The jackknife estimate of variance. Ann. Statist.9 586-596. · Zbl 0481.62035 · doi:10.1214/aos/1176345462 [26] Fan, J., Farmen, M. and Gijbels, I. (1998). Local maximum likelihood estimation and inference. J. R. Stat. Soc. Ser. B. Stat. Methodol.60 591-608. · Zbl 0909.62036 · doi:10.1111/1467-9868.00142 [27] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Monographs on Statistics and Applied Probability66. Chapman & Hall, London. · Zbl 0873.62037 [28] Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Ann. Statist.29 1189-1232. · Zbl 1043.62034 · doi:10.1214/aos/1013203451 [29] Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian Data Analysis, 2nd ed. Chapman & Hall/CRC, Boca Raton, FL. · Zbl 1039.62018 [30] Geurts, P., Ernst, D. and Wehenkel, L. (2006). Extremely randomized trees. Mach. Learn.63 3-42. · Zbl 1110.68124 · doi:10.1007/s10994-006-6226-1 [31] Gordon, L. and Olshen, R. A. (1985). Tree-structured survival analysis. Cancer Treat. Rep.69 1065-1069. [32] Hampel, F. R. (1974). The influence curve and its role in robust estimation. J. Amer. Statist. Assoc.69 383-393. · Zbl 0305.62031 · doi:10.1080/01621459.1974.10482962 [33] Hansen, B. E. (1992). Testing for parameter instability in linear models. J. Policy Model.14 517-533. [34] Hartford, J., Lewis, G., Leyton-Brown, K. and Taddy, M. (2017). Deep IV: A flexible approach for counterfactual prediction. In Proceedings of ICML 1414-1423. [35] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning, 2nd ed. Springer, New York. · Zbl 1273.62005 [36] Hill, J. L. (2011). Bayesian nonparametric modeling for causal inference. J. Comput. Graph. Statist.20 217-240. [37] Hjort, N. L. and Koning, A. (2002). Tests for constancy of model parameters over time. J. Nonparametr. Stat.14 113-132. · Zbl 1017.62015 · doi:10.1080/10485250211394 [38] Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell.20 832-844. [39] Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Ann. Math. Stat.19 293-325. · Zbl 0032.04101 · doi:10.1214/aoms/1177730196 [40] Honoré, B. E. and Kyriazidou, E. (2000). Panel data discrete choice models with lagged dependent variables. Econometrica68 839-874. · Zbl 1026.62124 · doi:10.1111/1468-0262.00139 [41] Hothorn, T., Lausen, B., Benner, A. and Radespiel-Tröger, M. (2004). Bagging survival trees. Stat. Med.23 77-91. [42] Imbens, G. W. and Angrist, J. D. (1994). Identification and estimation of local average treatment effects. Econometrica62 467-475. · Zbl 0800.90648 · doi:10.2307/2951620 [43] Ishwaran, H. and Kogalur, U. B. (2010). Consistency of random survival forests. Statist. Probab. Lett.80 1056-1064. · Zbl 1190.62177 · doi:10.1016/j.spl.2010.02.020 [44] Kallus, N. (2017). Recursive Partitioning for Personalization using Observational Data. In Proceedings of ICML. 1789-1798. [45] Kleiber, C. and Zeileis, A. (2008). Applied Econometrics with R. Springer Science & Business Media. · Zbl 1155.91004 [46] LeBlanc, M. and Crowley, J. (1992). Relative risk trees for censored survival data. Biometrics 411-425. [47] Lewbel, A. (2007). A local generalized method of moments estimator. Econom. Lett.94 124-128. · Zbl 1255.62093 · doi:10.1016/j.econlet.2006.08.011 [48] Lin, Y. and Jeon, Y. (2006). Random forests and adaptive nearest neighbors. J. Amer. Statist. Assoc.101 578-590. · Zbl 1119.62304 · doi:10.1198/016214505000001230 [49] Loader, C. (1999). Local Regression and Likelihood. Springer, New York. · Zbl 0929.62046 [50] Mallows, C. L. (1973). Some comments on Cp. Technometrics15 661-675. · Zbl 0269.62061 [51] Meinshausen, N. (2006). Quantile regression forests. J. Mach. Learn. Res.7 983-999. · Zbl 1222.68262 [52] Mentch, L. and Hooker, G. (2016). Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. J. Mach. Learn. Res.17 26. · Zbl 1360.62095 [53] Molinaro, A. M., Dudoit, S. and van der Laan, M. J. (2004). Tree-based multivariate regression and density estimation with right-censored data. J. Multivariate Anal.90 154-177. · Zbl 1048.62046 · doi:10.1016/j.jmva.2004.02.003 [54] Newey, W. K. (1994a). Kernel estimation of partial means and a general variance estimator. Econometric Theory10 233-253. [55] Newey, W. K. (1994b). The asymptotic variance of semiparametric estimators. Econometrica62 1349-1382. · Zbl 0816.62034 · doi:10.2307/2951752 [56] Newey, W. K. and Powell, J. L. (2003). Instrumental variable estimation of nonparametric models. Econometrica71 1565-1578. · Zbl 1154.62415 · doi:10.1111/1468-0262.00459 [57] Neyman, J. (1979). \(C(α)\) tests and their use. Sankhya, Ser. A41 1-21. · Zbl 0471.62023 [58] Nyblom, J. (1989). Testing for the constancy of parameters over time. J. Amer. Statist. Assoc.84 223-230. · Zbl 0677.62018 · doi:10.1080/01621459.1989.10478759 [59] Ploberger, W. and Krämer, W. (1992). The CUSUM test with OLS residuals. Econometrica60 271-285. · Zbl 0744.62155 · doi:10.2307/2951597 [60] Poterba, J. M., Venti, S. F. and Wise, D. A. (1996). How retirement saving programs increase saving. J. Electron. Publ.10 91-112. [61] Robins, J. M. and Ritov, Y. (1997). Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models. Stat. Med.16. [62] Robinson, P. M. (1988). Root-\(N\)-consistent semiparametric regression. Econometrica56 931-954. · Zbl 0647.62100 · doi:10.2307/1912705 [63] Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika70 41-55. · Zbl 0522.62091 · doi:10.1093/biomet/70.1.41 [64] Schick, A. (1986). On asymptotically efficient estimation in semiparametric models. Ann. Statist.14 1139-1151. · Zbl 0612.62062 · doi:10.1214/aos/1176350055 [65] Scornet, E., Biau, G. and Vert, J.-P. (2015). Consistency of random forests. Ann. Statist.43 1716-1741. · Zbl 1317.62028 · doi:10.1214/15-AOS1321 [66] Sexton, J. and Laake, P. (2009). Standard errors for bagged and random forest estimators. Comput. Statist. Data Anal.53 801-811. · Zbl 1452.62121 · doi:10.1016/j.csda.2008.08.007 [67] Staniswalis, J. G. (1989). The kernel estimate of a regression function in likelihood-based models. J. Amer. Statist. Assoc.84 276-283. · Zbl 0721.62039 · doi:10.1080/01621459.1989.10478766 [68] Stone, C. J. (1977). Consistent nonparametric regression. Ann. Statist.5 595-645. · Zbl 0366.62051 · doi:10.1214/aos/1176343886 [69] Su, L., Murtazashvili, I. and Ullah, A. (2013). Local linear GMM estimation of functional coefficient IV models with an application to estimating the rate of return to schooling. J. Bus. Econom. Statist.31 184-207. [70] Su, X., Tsai, C.-L., Wang, H., Nickerson, D. M. and Li, B. (2009). Subgroup analysis via recursive partitioning. J. Mach. Learn. Res.10 141-158. [71] Tibshirani, R. and Hastie, T. (1987). Local likelihood estimation. J. Amer. Statist. Assoc.82 559-567. · Zbl 0626.62041 · doi:10.1080/01621459.1987.10478466 [72] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics3. Cambridge Univ. Press, Cambridge. · Zbl 0910.62001 [73] Varian, H. R. (2014). Big data: New tricks for econometrics. J. Electron. Publ.28 3-27. [74] Wager, S. and Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. J. Amer. Statist. Assoc.113 1228-1242. · Zbl 1402.62056 · doi:10.1080/01621459.2017.1319839 [75] Wager, S., Hastie, T. and Efron, B. (2014). Confidence intervals for random forests: The jackknife and the infinitesimal jackknife. J. Mach. Learn. Res.15 1625-1651. · Zbl 1319.62132 [76] Wager, S. and Walther, G. (2015). Adaptive concentration of regression trees, with application to random forests. ArXiv preprint. Available at arXiv:1503.06388. [77] Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data, 2nd ed. MIT Press, Cambridge, MA. · Zbl 1327.62009 [78] Wright, M. N. and Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in \(C{+}{+}\) and R. J. Stat. Softw.77 1-17. [79] Zeileis, A. (2005). A unified approach to structural change tests based on ML scores, \(F\) statistics, and OLS residuals. Econometric Rev.24 445-466. · Zbl 1080.62012 · doi:10.1080/07474930500406053 [80] Zeileis, A. and Hornik, K. (2007). Generalized \(M\)-fluctuation tests for parameter instability. Stat. Neerl.61 488-508. · Zbl 1152.62014 · doi:10.1111/j.1467-9574.2007.00371.x [81] Zeileis, A., Hothorn, T. and Hornik, K. (2008). Model-based recursive partitioning. J. Comput. Graph. Statist.17 492-514. [82] Zhu, R., Zeng, D. and Kosorok, M. R. (2015). Reinforcement learning trees. J. Amer. Statist. Assoc.110 1770-1784. · Zbl 1374.68466 · doi:10.1080/01621459.2015.1036994 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.