×

BART: Bayesian additive regression trees. (English) Zbl 1189.62066

Summary: We develop a Bayesian “sum-of-trees” model where each tree is constrained by a regularization prior to be a weak learner, and fitting and inference are accomplished via an iterative Bayesian backfitting MCMC algorithm that generates samples from a posterior. Effectively, BART is a nonparametric Bayesian regression approach which uses dimensionally adaptive random basis elements. Motivated by ensemble methods in general, and boosting algorithms in particular, BART is defined by a statistical model: a prior and a likelihood. This approach enables full posterior inference including point and interval estimates of the unknown regression function as well as the marginal effects of potential predictors. By keeping track of predictor inclusion frequencies, BART can also be used for model-free variable selection. BART’s many features are illustrated with a bake-off against competing methods on 42 different data sets, with a simulation experiment and on a drug discovery classification problem.

MSC:

62G08 Nonparametric regression and quantile regression
62F15 Bayesian inference
65C60 Computational problems in statistics (MSC2010)
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDF BibTeX XML Cite
Full Text: DOI arXiv

References:

[1] Abreveya, J. and McCulloch, R. (2006). Reversal of fortune: A statistical analysis of penalty calls in the national hockey league. Technical report, Purdue Univ.
[2] Abu-Nimeh, S., Nappa, D., Wang, X. and Nair, S. (2008). Detecting phishing emails via Bayesian additive regression trees. Technical report, Southern Methodist Univ., Dallas, TX.
[3] Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88 669-679. · Zbl 1043.62087
[4] Amit, Y. and Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Computation 9 1545-1588.
[5] Blanchard, G. (2004). Un algorithme accelere d’echantillonnage Bayesien pour le modele CART. Revue d’Intelligence artificielle 18 383-410.
[6] Breiman, L. (1996). Bagging predictors. Machine Learning 26 123-140. · Zbl 0858.68080
[7] Breiman, L. (2001). Random forests. Machine Learning 45 5-32. · Zbl 0774.62031
[8] Chang, C.-C. and Lin, C.-J. (2001). LIBSVM: A library for support vector machines. Available at .
[9] Chipman, H. A., George, E. I. and McCulloch, R. E. (1998). Bayesian CART model search (with discussion and a rejoinder by the authors). J. Amer. Statist. Assoc. 93 935-960. · Zbl 1072.62650
[10] Chipman, H. A., George, E. I. and McCulloch, R. E. (2002). Bayesian treed models. Machine Learning 48 299-320. · Zbl 0820.68098
[11] Chipman, H. A., George, E. I. and McCulloch, R. E. (2007). Bayesian ensemble learning. In Neural Information Processing Systems 19 265-272.
[12] Denison, D. G. T., Mallick, B. K. and Smith, A. F. M. (1998). A Bayesian CART algorithm. Biometrika 85 363-377. · Zbl 1048.62502
[13] Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D. and Weingessel, A. (2008). e1071: Misc functions of the Department of Statistics (e1071), TU Wien. R package version 1.5-18.
[14] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion and a rejoinder by the authors). Ann. Statist. 32 407-499. · Zbl 1091.62054
[15] Feng, J., Lurati, L., Ouyang, H., Robinson, T., Wang, Y., Yuan, S. and Young, S. (2003). Predictive toxicology: Benchmarking molecular descriptors and statistical methods. Journal of Chemical Information and Computer Sciences 43 1463-1470.
[16] Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119-139. · Zbl 0880.68103
[17] Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion and a rejoinder by the author). Ann. Statist. 19 1-67. · Zbl 0765.62064
[18] Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Ann. Statist. 29 1189-1232. · Zbl 1043.62034
[19] Green, P. J. (1995). Reversible jump MCMC computation and Bayesian model determination. Biometrika 82 711-732. · Zbl 0861.62023
[20] Hastie, T. and Tibshirani, R. (2000). Bayesian backfitting (with comments and a rejoinder by the authors). Statist. Sci. 15 196-223. · Zbl 1059.62524
[21] Kim, H., Loh, W.-Y., Shih, Y.-S. and Chaudhuri, P. (2007). Visualizable and interpretable regression models with good prediction power. IEEE Transactions: Special Issue on Data Mining and Web Mining 39 565-579.
[22] Ridgeway, G. (2004). The gbm package. R Foundation for Statistical Computing, Vienna, Austria.
[23] Sing, T., Sander, O., Beerenwinkel, N. and Lengauer, T. (2007). ROCR: Visualizing the performance of scoring classifiers. R package version 1.0-2.
[24] Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics With S , 4th ed. Springer, New York. · Zbl 1006.62003
[25] Wu, Y., Tjelmeland, H. and West, M. (2007). Bayesian CART: Prior specification and posterior simulation. J. Comput. Graph. Statist. 16 44-66.
[26] Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressions and testing for aggregation bias. J. Amer. Statist. Assoc. 57 348-368. · Zbl 0113.34902
[27] Zhang, J. L. and Haerdle, W. K. (2010). The Bayesian additive classification tree applied to credit risk modelling. Comput. Statist. Data Anal. 54 1197-1205. · Zbl 1464.62196
[28] Zhang, S., Shih, Y.-C. T. and Muller, P. (2007). A spatially-adjusted Bayesian additive regression tree model to merge two datasets. Bayesian Anal. 2 611-634. · Zbl 1331.62170
[29] Zhou, Q. and Liu, J. S. (2008). Extracting sequence features to predict protein-DNA binding: A comparative study. Nucleic Acids Research 36 4137-4148.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.