zbMATH — the first resource for mathematics

Geometry Search for the term Geometry in any field. Queries are case-independent.
Funct* Wildcard queries are specified by * (e.g. functions, functorial, etc.). Otherwise the search is exact.
"Topological group" Phrases (multi-words) should be set in "straight quotation marks".
au: Bourbaki & ti: Algebra Search for author and title. The and-operator & is default and can be omitted.
Chebyshev | Tschebyscheff The or-operator | allows to search for Chebyshev or Tschebyscheff.
"Quasi* map*" py: 1989 The resulting documents have publication year 1989.
so: Eur* J* Mat* Soc* cc: 14 Search for publications in a particular source with a Mathematics Subject Classification code (cc) in 14.
"Partial diff* eq*" ! elliptic The not-operator ! eliminates all results containing the word elliptic.
dt: b & au: Hilbert The document type is set to books; alternatively: j for journal articles, a for book articles.
py: 2000-2015 cc: (94A | 11T) Number ranges are accepted. Terms can be grouped within (parentheses).
la: chinese Find documents in a given language. ISO 639-1 language codes can also be used.

a & b logic and
a | b logic or
!ab logic not
abc* right wildcard
"ab c" phrase
(ab c) parentheses
any anywhere an internal document identifier
au author, editor ai internal author identifier
ti title la language
so source ab review, abstract
py publication year rv reviewer
cc MSC code ut uncontrolled term
dt document type (j: journal article; b: book; a: book article)
Improving the precision of classification trees. (English) Zbl 1184.62109
Summary: Besides serving as prediction models, classification trees are useful for finding important predictor variables and identifying interesting subgroups in the data. These functions can be compromised by weak split selection algorithms that have variable selection biases or that fail to search beyond local main effects at each node of the tree. The resulting models may include many irrelevant variables or select too few of the important ones. Either eventuality can lead to erroneous conclusions. Four techniques to improve the precision of the models are proposed and their effectiveness compared with that of other algorithms, including tree ensembles, on real and simulated data sets.

62H30Classification and discrimination; cluster analysis (statistics)
05C90Applications of graph theory
65C60Computational problems in statistics
Stata; rpart; C4.5; SAS
Full Text: DOI
[1] Amasyali, M. F. and Ersoy, O. (2008). CLINE: A new decision-tree family. IEEE Transactions on Neural Networks 19 356-363.
[2] Atkinson, E. J. and Therneau, T. M. (2000). An introduction to recursive partitioning using the RPART routines. Technical report 61, Biostatistic Section, Mayo Clinic, Rochester, NY.
[3] Breiman, L. (1996). Bagging predictors. Mach. Learn. 24 123-140. · Zbl 0867.62055 · doi:10.1214/aos/1032181158
[4] Breiman, L. (2001). Random forests. Mach. Learn. 45 5-32. · Zbl 1007.68152 · doi:10.1023/A:1010933404324
[5] Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees . Wadsworth, Belmont. · Zbl 0541.62042
[6] Buttrey, S. E. and Karo, C. (2002). Using k-nearest-neighbor classification in the leaves of a tree. Comput. Statist. Data Anal. 40 27-37. · Zbl 0990.62050 · doi:10.1016/S0167-9473(01)00098-6
[7] Cantu-Paz, E. and Kamath, C. (2003). Inducing oblique decision trees with evolutionary algorithms. IEEE Transactions on Evolutionary Computation 7 54-68.
[8] Clark, V. (2004). SAS/STAT 9.1 User’s Guide . SAS Publishing, Cary, NC.
[9] Doyle, P. (1973). The use of Automatic Interaction Detector and similar search procedures. Operational Research Quarterly 24 465-467. · Zbl 0262.55001 · doi:10.1093/qmath/24.1.397
[10] Fan, G. (2008). Kernel-induced classification trees and random forests. Manuscript.
[11] Gama, J. (2004). Functional trees. Mach. Learn. 55 219-250. · Zbl 1078.68699 · doi:10.1023/B:MACH.0000027782.67192.13
[12] Ghosh, A. K., Chaudhuri, P. and Sengupta, D. (2006). Classification using kernel density estimates: Multiscale analysis and visualization. Technometrics 48 120-132. · doi:10.1198/004017005000000391
[13] Heinz, G., Peterson, L. J., Johnson, R. W. and Kerk, C. J. (2003). Exploring relationships in body dimensions. Journal of Statistics Education 11 . Available at www.amstat.org/publications/jse/v11n2/datasets.heinz.html.
[14] Hosmer, D. W. and Lemeshow, S. (2000). Applied Logistic Regression , 2nd ed. Wiley, New York. · Zbl 0967.62045
[15] Hothorn, T., Hornik, K. and Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. J. Comput. Graph. Statist. 15 651-674. · doi:10.1198/106186006X133933
[16] Kim, H. and Loh, W.-Y. (2001). Classification trees with unbiased multiway splits. J. Amer. Statist. Assoc. 96 589-604. · doi:10.1198/016214501753168271
[17] Kim, H. and Loh, W.-Y. (2003). Classification trees with bivariate linear discriminant node models. J. Comput. Graph. Statist. 12 512-530. · doi:10.1198/1061860032049
[18] Lee, T.-H. and Shih, Y.-S. (2006). Unbiased variable selection for classification trees with multivariate responses. Comput. Statist. Data Anal. 51 659-667. · Zbl 1157.62438 · doi:10.1016/j.csda.2006.02.015
[19] Li, X. B., Sweigart, J. R., Teng, J. T. C., Donohue, J. M., Thombs, L. A. and Wang, S. M. (2003). Multivariate decision trees using linear discriminants and tabu search. IEEE Transactions on Systems Man and Cybernetics Part A-Systems and Humans 33 194-205.
[20] Li, Y. H., Dong, M. and Kothari, R. (2005). Classifiability-based omnivariate decision trees. IEEE Transactions on Neural Networks 16 1547-1560.
[21] Liaw, A. and Wiener, M. (2002). Classification and regression by randomforest. R News 2 18-22. Available at http://CRAN.R-project.org/doc/Rnews/.
[22] Lim, T.-S., Loh, W.-Y. and Shih, Y.-S. (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach. Learn. J. 40 203-228. · Zbl 0969.68669 · doi:10.1023/A:1007608224229
[23] Loh, W.-Y. (2002). Regression trees with unbiased variable selection and interaction detection. Statist. Sinica 12 361-386. · Zbl 0998.62042
[24] Loh, W.-Y. and Shih, Y.-S. (1997). Split selection methods for classification trees. Statist. Sinica 7 815-840. · Zbl 1067.62545
[25] Loh, W.-Y. and Vanichsetakul, N. (1988). Tree-structured classification via generalized discriminant analysis (with discussion). J. Amer. Statist. Assoc. 83 715-728. · Zbl 0649.62055 · doi:10.2307/2289295
[26] McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models , 2nd ed. Chapman and Hall, London. · Zbl 0744.62098
[27] Morgan, J. N. and Messenger, R. C. (1973). THAID: A sequential analysis program for the analysis of nominal scale dependent variables. Technical report, Institute for Social Research, Univ. Michigan, Ann Arbor. · Zbl 0276.62074
[28] Morgan, J. N. and Sonquist, J. A. (1963). Problems in the analysis of survey data, and a proposal. J. Amer. Statist. Assoc. 58 415-434. · Zbl 0114.10103 · doi:10.2307/2283276
[29] Noh, H. G., Song, M. S. and Park, S. H. (2004). An unbiased method for constructing multilabel classification trees. Comput. Statist. Data Anal. 47 149-164. · Zbl 05374046
[30] Perlich, C., Provost, F. and Simonoff, J. S. (2003). Tree induction vs. logistic regression: A learning-curve analysis. J. Mach. Learn. Res. 4 211-255. · Zbl 1093.68088 · doi:10.1162/153244304322972694
[31] Quinlan, J. R. (1993). C4.5: Programs for Machine Learning . Morgan Kaufmann, San Mateo.
[32] StataCorp. (2003). Stata Statistical Software: Release 8.0 . Stata Corporation, College Station, TX.
[33] Strobl, C., Boulesteix, A.-L. and Augustin, T. (2007). Unbiased split selection for classification trees based on the Gini index. Comput. Statist. Data Anal. 52 483-501. · Zbl 05560173
[34] Wilson, E. B. and Hilferty, M. M. (1931). The distribution of chi-square. Proc. Nat. Acad. Sci. USA 17 684-688. · Zbl 0004.36005 · doi:10.1073/pnas.17.12.684
[35] Witten, I. H. and Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques , 2nd ed. Morgan Kaufmann, San Fransico, CA. · Zbl 1076.68555
[36] Yildlz, O. T. and Alpaydin, E. (2005). Linear discriminant trees. International Journal of Pattern Recognition and Artificial Intelligence 19 323-353.