×

Consistent regression using data-dependent coverings. (English) Zbl 1472.62046

Summary: We introduce a procedure to generate an estimator of the regression function based on a data-dependent quasi-covering of the feature space. A quasi-partition is generated from the quasi-covering and the estimator predicts the conditional empirical expectation over the cells of the quasi-partition. We provide sufficient conditions to ensure the consistency of the estimator. Each element of the quasi-covering is labeled as significant or insignificant. We avoid the condition of cell shrinkage commonly found in the literature for data-dependent partitioning estimators. This reduces the number of elements in the quasi-covering. An important feature of our estimator is that it is interpretable.
The proof of the consistency is based on a control of the convergence rate of the empirical estimation of conditional expectations, which is interesting in itself.

MSC:

62G05 Nonparametric estimation
62G08 Nonparametric regression and quantile regression
62G20 Asymptotic properties of nonparametric inference

Software:

UCI-ml; FORS; C4.5; shap
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Bénard, C., Biau, G., Da Veiga, S. and Scornet, E. (2020). Interpretable Random Forests via Rule Extraction. arXiv preprint arXiv:2004.14841. · Zbl 1458.62126
[2] Biran, O. and Cotton, C. (2017). Explanation and justification in machine learning: A survey. In IJCAI-17 workshop on explainable AI (XAI) 8 1.
[3] Breiman, L. (2001). Random forests. Machine learning 45 5-32. · Zbl 1007.68152
[4] Breiman, L., Friedman, J. H., Olshen, R. and Stone, C. (1984). Classification and Regression Trees. CRC press. · Zbl 0541.62042
[5] Cortez, P. and Silva, A. M. G. (2008). Using data mining to predict secondary school student performance. In Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008).
[6] Denil, M., Matheson, D. and Freitas, N. (2013). Consistency of online random forests. In International Conference on Machine Learning 1256-1264.
[7] Dua, D. and Graff, C. (2017). UCI Machine Learning Repository.
[8] Efron, B., Hastie, T., Johnstone, I., Tibshirani, R. et al. (2004). Least angle regression. The Annals of statistics 32 407-499. · Zbl 1091.62054
[9] Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics 1189-1232. · Zbl 1043.62034
[10] Friedman, J. H. (2002). Stochastic gradient boosting. Computational statistics & data analysis 38 367-378. · Zbl 1072.65502
[11] Friedman, J. H., Popescu, B. E. et al. (2003). Importance sampled learning ensembles. Journal of Machine Learning Research 94305.
[12] Friedman, J. H. and Popescu, B. E. (2008). Predective Learning via Rule Ensembles. The Annals of Applied Statistics 916-954. · Zbl 1149.62051
[13] Fürnkranz, J. and Kliegr, T. (2015). A brief overview of rule learning. In International Symposium on Rules and Rule Markup Languages for the Semantic Web 54-69. Springer.
[14] Grunewalder, S. (2018). Plug-in Estimators for Conditional Expectations and Probabilities. In International Conference on Artificial Intelligence and Statistics 1513-1521.
[15] Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F. and Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM computing surveys (CSUR) 51 1-42.
[16] Györfi, L., Kohler, M., Krzyzak, A. and Walk, H. (2006). A Distribution-Free Theory of Nonparametric Regression. Springer Science & Business Media. · Zbl 1021.62024
[17] Harrison Jr, D. and Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air. Journal of environmental economics and management 5 81-102. · Zbl 0375.90023
[18] Hastie, T., Friedman, J. H. and Tibshirani, R. (2001). The Elements of Statistical Learning 1. Springer series in statistics Springer, Berlin. · Zbl 0973.62007
[19] Holmes, G., Hall, M. and Prank, E. (1999). Generating rule sets from model trees. In Australasian Joint Conference on Artificial Intelligence 1-12. Springer.
[20] Karalič, A. and Bratko, I. (1997). First order regression. Machine Learning 26 147-176. · Zbl 0866.68089
[21] Liitiäinen, E., Verleysen, M., Corona, F. and Lendasse, A. (2009). Residual variance estimation in machine learning. Neurocomputing 72 3692-3703.
[22] Lipton, Z. C. (2018). The mythos of model interpretability. Queue 16 31-57.
[23] Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 4765-4774.
[24] Margot, V., Baudry, J.-P., Guilloux, F. and Wintenberger, O. (2018). Rule Induction Partitioning Estimator. In International Conference on Machine Learning and Data Mining in Pattern Recognition 288-301. Springer.
[25] Meinshausen, N. (2010). Node harvest. The Annals of Applied Statistics 4 2049-2072. · Zbl 1220.62084
[26] Nobel, A. (1996). Histogram Regression Estimation using Data-Dependent Partitions. The Annals of Statistics 24 1084-1105. · Zbl 0862.62038
[27] Quinlan, J. R. (1986). Induction of decision trees. Machine learning 1 81-106.
[28] Quinlan, J. R. (1993). C4.5: Programs for Machine Learning.
[29] Ramosaj, B. and Pauly, M. (2019). Consistent estimation of residual variance with random forest Out-Of-Bag errors. Statistics & Probability Letters 151 49-57. · Zbl 1459.62056
[30] Ribeiro, M. T., Singh, S. and Guestrin, C. (2016). Why should I trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining 1135-1144. ACM.
[31] Scornet, E., Biau, G., Vert, J.-P. et al. (2015). Consistency of random forests. The Annals of Statistics 43 1716-1741. · Zbl 1317.62028
[32] Shrikumar, A., Greenside, P. and Kundaje, A. (2019). Learning important features through propagating activation differences. arXiv preprint arXiv:1704.02685.
[33] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 267-288. · Zbl 0850.62538
[34] Van der Vaart, A. W. (2000). Asymptotic statistics 3. Cambridge university press. · Zbl 0943.62002
[35] Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer-Verlag. · Zbl 0833.62008
[36] Wenocur, R. S. and Dudley, R. M. (1981). Some special Vapnik-Chervonenkis classes. Discrete Mathematics 33 313-318. · Zbl 0459.60008
[37] Zhao, Q. and Bhowmick, S. S. (2003). Association rule mining: A survey. Nanyang Technological University, Singapore.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.