# zbMATH — the first resource for mathematics

Count regression trees. (English) Zbl 07205271
Summary: Count data frequently appear in many scientific studies. In this article, we propose a regression tree method called CORE for analyzing such data. At each node, besides a Poisson regression, a count regression such as hurdle, negative binomial, or zero-inflated regression which can accommodate over-dispersion and/or excess zeros is fitted. A likelihood-based procedure is suggested to select split variables and split sets. Node deviance is then used in the tree pruning process to avoid overfitting. CORE is able to eliminate variable selection bias. In the simulations and real data studies, we show that CORE has some advantages over the existing method, MOB.
##### MSC:
 62G08 Nonparametric regression and quantile regression 62J12 Generalized linear models (logistic models)
##### Software:
AER; AppliedPredictiveModeling; partykit; VGAMdata
Full Text:
##### References:
 [1] Breiman, L.; Friedman, Jh; Olshen, Ra; Stone, Cj, Classification and regression trees (1984), Monterey: Wadsworth and Brooks, Monterey [2] Cameron, Ac; Trivedi, Pk, Regression analysis of count data (2013), New York: Cambridge University Press, New York [3] Chan, Ky; Loh, Wy, LOTUS: an algorithm for building accurate and comprehensible logistic regression trees, J Comput Graph Stat, 13, 4, 826-852 (2004) [4] Choi, Y.; Ahn, H.; Chen, Jj, Regression trees for analysis of count data with extra Poisson variation, Comput Stat Data Anal, 49, 3, 893-915 (2005) · Zbl 1430.62165 [5] Ciampi, A., Generalized regression trees, Comput Stat Data Anal, 12, 1, 57-78 (1991) · Zbl 0825.62610 [6] Comizzoli, Rb; Landwehr, Jm; Sinclair, Jd, Robust materials and processes: key to reliability, AT & T Tech J, 69, 6, 113-128 (1990) [7] Hothorn, T.; Zeileis, A., partykit: a modular toolkit for recursive partytioning in R, J Mach Learn Res, 16, 3905-3909 (2015) · Zbl 1351.62005 [8] Kleiber, C.; Zeileis, A., Applied econometrics with R (2008), New York: Springer, New York · Zbl 1155.91004 [9] Kuhn, M.; Johnson, K., Applied predictive modeling (2013), New York: Springer, New York · Zbl 1306.62014 [10] Lambert, D., Zero-inflated Poisson regression, with an application to defects in manufacturing, Technometrics, 34, 1, 1-14 (1992) · Zbl 0850.62756 [11] Lee, Sk; Jin, S., Decision tree approaches for zero inflated count data, J Appl Stat, 33, 8, 853-864 (2006) · Zbl 1119.62303 [12] Loh, Wy, Regression tree with unbiased variable selection and interaction detection, Stat Sin, 12, 2, 361-386 (2002) · Zbl 0998.62042 [13] Loh, Wei-Yin, Regression tree models for designed experiments, Institute of Mathematical Statistics Lecture Notes - Monograph Series, 210-228 (2006), Beachwood, Ohio, USA: Institute of Mathematical Statistics, Beachwood, Ohio, USA · Zbl 1268.62090 [14] Loh, Wy, Improving the precision of classification trees, Ann Appl Stat, 3, 4, 1710-1737 (2009) · Zbl 1184.62109 [15] Loh, Wy, Fifty years of classification and regression trees, Int Stat Rev, 82, 3, 329-348 (2014) · Zbl 1416.62347 [16] Long, Js, The origins of sex differences in science, Soc Forces, 68, 4, 1297-1316 (1990) [17] Long, Js, Regression models for categorical and limited dependent variables (1997), Thousand Oaks: Sage Publications, Thousand Oaks [18] Mullahy, J., Specification and testing of some modified count data models, J Econ, 33, 3, 341-365 (1986) [19] Neelon, B.; O’Malley, Aj; Smith, Va, Modeling zero-modified count and semicontinuous data in health services research part 1: background and overview, Stat Med, 35, 27, 5070-5093 (2016) [20] Rusch, T.; Zeileis, A., Gaining insight with recursive partitioning of generalized linear models, J Stat Comput Simul, 83, 7, 1301-1315 (2013) · Zbl 1431.62317 [21] Wilson, Eb; Hilferty, Mm, The distribution of chi-square, Proc Natl Acad Sci USA, 17, 684-688 (1931) · Zbl 0004.36005 [22] Yee, Tw, Vector generalized linear and additive models: with an implementation in R (2015), New York: Springer, New York [23] Zeileis, A.; Hornik, K., Generalized M-fluctuation tests for parameter instability, Statistica Neerlandica, 61, 4, 488-508 (2007) · Zbl 1152.62014 [24] Zeileis, A.; Hothorn, T.; Hornik, K., Model-based recursive partitioning, J Comput Graph Stat, 17, 2, 492-514 (2008) [25] Zeileis, A.; Kleiber, C.; Jackman, S., Regression models for count data in R, J Stat Softw Articles, 27, 8, 1-25 (2008)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.