Classification by pairwise coupling. (English) Zbl 0932.62071

Summary: We discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together. The coupling model is similar to the Bradley-Terry method for paired comparisons [R. A. Bradley and M. E. Terry, Biometrika 39, 324-345 (1952; Zbl 0047.12903)]. We study the nature of the class probability estimates that arise, and examine the performance of the procedure in real and simulated data sets. Classifiers used include linear discriminants, nearest neighbors, adaptive nonlinear methods and the support vector machine.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62J15 Paired and multiple comparisons; multiple testing
68T10 Pattern recognition, speech recognition


Zbl 0047.12903
Full Text: DOI


[1] Bishop, Y., Fienberg, S. and Holland, P. (1975). Discrete Multivariate Analy sis. MIT Press. · Zbl 0332.62039
[2] Boser, B., Guy on, I. and Vapnik, I. (1992). A training algorithm for optimal margin classifiers. In Proceedings of COLT II, Philadelphia, PA.
[3] Bradley, R. and Terry, M. (1952). The rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika 39 324-345. JSTOR: · Zbl 0047.12903
[4] Deming, W. and Stephan, F. (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann. Math. Statist. 11 427-444. · Zbl 0024.05502
[5] Friedman, J. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1-141. Friedman, J. (1996a). Another approach to poly chotomous classification. Technical report, Stanford Univ. Friedman, J. (1996b). Bias, variance, 0-1 loss and the curse of dimensionality. Technical report, Stanford Univ. · Zbl 0765.62064
[6] Hastie, T. (1989). Discussion of ”Flexible parsimonious smoothing and additive modelling” by Friedman and Silverman. Technometrics 31 3-39. · Zbl 0672.65119
[7] Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall, London. · Zbl 0747.62061
[8] Hastie, T., Tibshirani, R. and Buja, A. (1994). Flexible discriminant analysis by optimal scoring. J. Amer. Statist. Assoc. 89 1255-1270. JSTOR: · Zbl 0812.62067
[9] Robinson, A. J. (1989). Dy namic error propagation networks. Ph.D. dissertation, Dept. Electrical Engineering, Cambridge Univ.
[10] Rosen, D., Burke, H. and Goodman, O. (1995). Local learning methods in high dimensions: beating the bias-variance dilemma via recalibration. In NIPS Workshop: Machines that Learn-Neural Networks for Computing.
[11] Vapnik, V. (1996). The Nature of Statistical Learning Theory. Springer, New York. · Zbl 0934.62009
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.