Predictive learning via rule ensembles. (English) Zbl 1149.62051

Summary: General regression and classification models are constructed as linear combinations of simple rules derived from the data. Each rule consists of a conjunction of a small number of simple statements concerning the values of individual input variables. These rule ensembles are shown to produce predictive accuracy comparable to the best methods. However, their principal advantage lies in the interpretation. Because of its simple form, each rule is easy to understand, as is its influence on individual predictions, selected subsets of predictions, or globally over the entire space of joint input variable values. Similarly, the degree of relevance of the respective input variables can be assessed globally, locally in different regions of the input space, or at individual prediction points. Techniques are presented for automatically identifying those variables that are involved in interactions with other variables, the strength and degree of those interactions, as well as the identities of the other variables with which they interact. Graphical representations are used to visualize both main and interaction effects.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62J99 Linear inference, regression
68T05 Learning and adaptive systems in artificial intelligence
65C60 Computational problems in statistics (MSC2010)


bootstrap; C4.5; LogicReg
Full Text: DOI arXiv


[1] Breiman, L. (1996). Bagging predictors. Machine Learning 26 123-140. · Zbl 0858.68080
[2] Breiman, L. (2001). Random forests. Machine Learning 45 5-32. · Zbl 1007.68152 · doi:10.1023/A:1010933404324
[3] Breiman, L., Friedman, J. H., Olshen, R. and Stone, C. (1983). Classification and Regression Trees . Wadsworth, Belmont, CA. · Zbl 0541.62042
[4] Clark, P. and Niblett, R. (1989). The CN2 induction algorithm. In Machine Learning 3 261-284.
[5] Cohen, W. (1995). Fast efficient rule induction. Machine Learning : Proceedings of the Twelfth International Conference 115-123. Morgan Kaufmann, Lake Tahoe, CA.
[6] Cohen, W. and Singer, Y. (1999). A simple, fast and efficient rule learner. In Proceedings of the Sixteenth National Conference on Artificial Intelligence ( AAAI-99 ) 335-342. AAAI Press.
[7] Donoho, D., Johnstone, I., Kerkyacharian, G. and Picard, D. (1995). Wavelet shrinkage: asymptotia? (with discussion). J. Roy. Statist. Soc. Ser. B 57 301-337. · Zbl 0827.62035
[8] Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap . Chapman and Hall, New York. · Zbl 0835.62038
[9] Freund, Y. and Schapire, R. E. (1996). Experiments with a new boosting algorithm. Machine Learning : Proceedings of the Thirteenth International Conference 148-156. Morgan Kauffman, San Francisco.
[10] Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Ann. Statist. 29 1189-1232. · Zbl 1043.62034 · doi:10.1214/aos/1013203451
[11] Friedman, J. H. and Hall, P. (2007). On bagging and nonlinear estimation. J. Statist. Plann. Inference 137 669-683. · Zbl 1104.62047 · doi:10.1016/j.jspi.2006.06.002
[12] Friedman, J. H. and Popescu, B. E. (2003). Importance sampled learning ensembles. Technical report, Dept. Statistics, Stanford Univ.
[13] Friedman, J. H. and Popescu, B. E. (2004). Gradient directed regularization for linear regression and classification. Technical report, Dep. Statist. Dept. Statistics, Stanford Univ.
[14] Harrison, D. and Rubinfield, D. C. (1978). Hedonic prices and the demand for clean air. J. Environmental Economics and Management 8 276-290.
[15] Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models . Chapman and Hall, London. · Zbl 0747.62061
[16] Hastie, T., Tibshirani, R. and Friedman, J. H. (2001). Elements of Statistical Learning . Springer, New York. · Zbl 0973.62007
[17] Ho, T. K. and Kleinberg, E. M. (1996). Building projectable classifiers of arbitrary complexity. In Proceedings of the 13th International Conference on Pattern Recognition 880-885. Vienna, Austria.
[18] Hooker, G. (2004). Black box diagnostics and the problem of extrapolation: extending the functional ANOVA. Technical report, Dept. Statistics, Stanford Univ.
[19] Huber, P. (1964). Robust estimation of a location parameter. Ann. Math. Statist. 53 73-101. · Zbl 0136.39805 · doi:10.1214/aoms/1177703732
[20] Jiang, T. and Owen, A. B. (2001). Quasi-regression for visualization and interpretation of black box functions. Technical report, Dept. Statistics, Stanford Univ. · Zbl 0993.65018 · doi:10.1006/jcom.2001.0588
[21] Kleinberg, E. M. (1996). An overtraining-resistant stochastic modelling method for pattern recognition. Ann. Statist. 24 2319-2349. · Zbl 0877.68102 · doi:10.1214/aos/1032181157
[22] Kleinberg, E. M. (2000). On the algorithmic implementation of stochastic discrimination. IEEE Trans. Anal. Machine Intelligence 22 473-490.
[23] Lavrač, N. and Džeroski, S. (1994). Inductive Logic Programming : Techniques and Applications . Ellis Horwood. · Zbl 0830.68027
[24] Mitchell, T. (1997). Machine Learning . McGraw-Hill, New York. · Zbl 0913.68167
[25] Muggleton, S. (1995). Inverse entailment and PROGOL. New Generation Computing 13 245-286.
[26] Owen, A. B. (2001). The dimension distribution and quadrature test functions. Statist. Sinica 13 1-17. · Zbl 1017.62060
[27] Pfahringer, B., Holmes, G. and Weng, C. (2004). Millions of random rules. In Proceedings of the 15th European Conference on Machine Learning ( ECML/PKDD 2004 ). Morgan Kaufmann, San Mateo.
[28] Quinlan, R. (1993). C4.5 : Programs for Machine Learning . Morgan Kaufmann, San Mateo. · Zbl 0900.68112
[29] Roosen, C. (1995). Visualization and exploration of high-dimensional functions using the functional Anova decomposition. PH.D. thesis, Dept. Statistics, Stanford Univ.
[30] Rosset, S. and Inger, I. (2000). KDD-CUP 99: Knowledge discovery in a charitable organization’s donor data base. SIGKDD Explorations 1 85-90.
[31] Ruckert, U. and Kramer, S. (2006). A statistical approach to learning. In Proceedings of the 23rd International Conference on Machine Learning . Morgan Kaufmann, San Mateo.
[32] Ruczinski, I., Kooperberg, C. and LeBlanc, M. L. (2003). Logic regression. J. Comput. Graph. Statist. 12 475-511. · Zbl 1142.62386 · doi:10.1198/1061860032238
[33] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[34] Weiss, S. and Indurkhya, N. (2000). Lightweight rule induction. In Proceedings of the 17th International Conference on Machine Learning (P. Langley, ed.) 1135-1142. Morgan Kaufmann, San Mateo.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.