Classifier technology and the illusion of progress. (English) Zbl 1426.62188

Summary: A great many tools have been developed for supervised classification, ranging from early methods such as linear discriminant analysis through to modern developments such as neural networks and support vector machines. A large number of comparative studies have been conducted in attempts to establish the relative superiority of these methods. This paper argues that these comparisons often fail to take into account important aspects of real problems, so that the apparent superiority of more sophisticated methods may be something of an illusion. In particular, simple methods typically yield performance almost as good as more sophisticated methods, to the extent that the difference in performance may be swamped by other sources of uncertainty that generally are not considered in the classical supervised classification paradigm.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T05 Learning and adaptive systems in artificial intelligence


Full Text: DOI arXiv Euclid


[1] Adams, N. M. and Hand, D. J. (1999). Comparing classifiers when the misallocation costs are uncertain. Pattern Recognition 32 1139–1147. · Zbl 1059.62065
[2] Benton, T. C. (2002). Theoretical and empirical models. Ph.D. dissertation, Dept. Mathematics, Imperial College London.
[3] Breiman, L. (2001). Statistical modeling: The two cultures (with discussion). Statist. Sci. 16 199–231. · Zbl 1059.62505
[4] Duin, R. P. W. (1996). A note on comparing classifiers. Pattern Recognition Letters 17 529–536.
[5] Efron, B. (2001). Comment on “Statistical modeling: The two cultures,” by L. Breiman. Statist. Sci. 16 218–219. · Zbl 1059.01542
[6] Fawcett, T. and Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery 1 291–316.
[7] Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics 7 179–188.
[8] Friedman, C. P. and Wyatt, J. C. (1997). Evaluation Methods in Medical Informatics . Springer. New York.
[9] Friedman, J. H. (1997). On bias, variance, 0/1 loss, and the curse of dimensionality. Data Mining and Knowledge Discovery 1 55–77.
[10] Gallagher, J. C., Hedlund, L. R., Stoner, S. and Meeger, C. (1988). Vertebral morphometry: Normative data. Bone and Mineral 4 189–196.
[11] Hand, D. J. (1981). Discrimination and Classification . Wiley, Chichester. · Zbl 0587.62119
[12] Hand, D. J. (1996). Classification and computers: Shifting the focus. In COMPSTAT-96 : Proceedings in Computational Statistics (A. Prat, ed.) 77–88. Physica, Berlin.
[13] Hand, D. J. (1997). Construction and Assessment of Classification Rules . Wiley, Chichester. · Zbl 0997.62500
[14] Hand, D. J. (1998). Strategy, methods, and solving the right problems. Comput. Statist. 13 5–14. · Zbl 0923.62006
[15] Hand, D. J. (1999). Intelligent data analysis and deep understanding. In Causal Models and Intelligent Data Management (A. Gammerman, ed.) 67–80. Springer, Berlin.
[16] Hand, D. J. (2001). Modelling consumer credit risk. IMA J. Management Mathematics 12 139–155. · Zbl 1001.91061
[17] Hand, D. J. (2001). Reject inference in credit operations. In Handbook of Credit Scoring (E. Mays, ed.) 225–240. Glenlake, Chicago.
[18] Hand, D. J. (2004). Academic obsessions and classification realities: Ignoring practicalities in supervised classification. In Classification , Clustering and Data Mining Applications (D. Banks, L. House, F. R. McMorris, P. Arabie and W. Gaul, eds.) 209–232. Springer, Berlin.
[19] Hand, D. J. (2005). Supervised classification and tunnel vision. Applied Stochastic Models in Business and Industry 21 97–109. · Zbl 1089.62077
[20] Hand, D. J. and Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: A review. J. Roy. Statist. Soc. Ser. A 160 523–541.
[21] Hoadley, B. (2001). Comment on “Statistical modeling: The two cultures,” by L. Breiman. Statist. Sci. 16 220–224. · Zbl 1059.62505
[22] Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning 11 63–90. · Zbl 0850.68278
[23] Jamain, A. (2004). Meta-analysis of classification methods. Ph.D. dissertation, Dept. Mathematics, Imperial College London.
[24] Jamain, A. and Hand, D. J. (2005). Mining supervised classification performance studies: A meta-analytic investigation. Technical report, Dept. Mathematics, Imperial College London. · Zbl 1260.62043
[25] Kelly, M. G and Hand, D. J. (1999). Credit scoring with uncertain class definitions. IMA J. Mathematics Management 10 331–345. · Zbl 0962.91505
[26] Kelly, M. G., Hand, D. J. and Adams, N. M. (1998). Defining the goals to optimise data mining performance. In Proc. Fourth International Conference on Knowledge Discovery and Data Mining (R. Agrawal, P. Stolorz and G. Piatetsky-Shapiro, eds.) 234–238. AAAI Press, Menlo Park, CA.
[27] Kelly, M. G., Hand, D. J. and Adams, N. M. (1999). The impact of changing populations on classifier performance. In Proc. Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (S. Chaudhuri and D. Madigan, eds.) 367–371. ACM, New York.
[28] Kelly, M. G., Hand, D. J. and Adams, N. M. (1999). Supervised classification problems: How to be both judge and jury. In Advances in Intelligent Data Analysis. Lecture Notes in Comput. Sci. 1642 235–244. Springer, Berlin.
[29] Klinkenberg, R. and Thorsten, J. (2000). Detecting concept drift with support vector machines. In Proc. 17th International Conference on Machine Learning (P. Langley, ed.) 487–494. Morgan Kaufmann, San Francisco.
[30] Lane, T. and Brodley, C. E. (1998). Approaches to online learning and concept drift for user identification in computer security. In Proc. Fourth International Conference on Knowledge Discovery and Data Mining (R. Agrawal, P. Stolorz and G. Piatetsky-Shapiro, eds.) 259–263. AAAI Press, Menlo Park, CA.
[31] Lewis, E. M. (1990). An Introduction to Credit Scoring . Athena, San Rafael, CA.
[32] Li, H. G. and Hand, D. J. (2002). Direct versus indirect credit scoring classifications. J. Operational Research Society 53 647–654. · Zbl 1130.91319
[33] McLachlan, G. J. (1992). Discriminant Analysis and Statistical Pattern Recognition . Wiley, New York. · Zbl 1108.62317
[34] Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction. Machine Learning 4 227–243.
[35] Newman, D. J., Hettich, S., Blake, C. L. and Merz, C. J. (1998). UCI repository of machine learning databases. Dept. Information and Computer Sciences, Univ. California, Irvine. Available at www.ics.uci.edu/ mlearn/MLRepository.html.
[36] Provost, F. and Fawcett, T. (2001). Robust classification for imprecise environments. Machine Learning 42 203–231. · Zbl 0969.68126
[37] Rendell, A. L. and Seshu, R. (1990). Learning hard concepts through constructive induction: Framework and rationale. Computational Intelligence 6 247–270.
[38] Ripley, B. D. (1996). Pattern Recognition and Neural Networks . Cambridge Univ. Press. · Zbl 0853.62046
[39] Rosenberg, E. and Gleit, A. (1994). Quantitative methods in credit management: A survey. Oper. Res. 42 589–613. · Zbl 0815.90110
[40] Salzberg, S. L. (1997). On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery 1 317–328.
[41] Shavlik, J., Mooney, R. J. and Towell, G. (1991). Symbolic and neural learning algorithms: An experimental comparison. Machine Learning 6 111–143. · Zbl 1141.68327
[42] Thomas, L. C. (2000). A survey of credit and behavioural scoring: Forecasting financial risk of lending to consumers. International J. Forecasting 16 149–172.
[43] von Winterfeldt, D. and Edwards, W. (1982). Costs and payoffs in perceptual research. Psychological Bulletin 91 609–622.
[44] Webb, A. R. (2002). Statistical Pattern Recognition , 2nd ed. Wiley, Chichester. · Zbl 1102.68639
[45] Weiss, S. M., Galen, R. S. and Tadepalli, P. V. (1990). Maximizing the predictive value of production rules. Artificial Intelligence 45 47–71. · Zbl 0899.68070
[46] Widmer, G. and Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine Learning 23 69–101.
[47] Zahavi, J. and Levin, N. (1997). Issues and problems in applying neural computing to target marketing. J. Direct Marketing 11 (4) 63–75.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.