Multiclass classification, information, divergence and surrogate risk. (English) Zbl 1408.62115

Summary: We provide a unifying view of statistical information measures, multiway Bayesian hypothesis testing, loss functions for multiclass classification problems and multidistribution \(f\)-divergences, elaborating equivalence results between all of these objects, and extending existing results for binary outcome spaces to more general ones. We consider a generalization of \(f\)-divergences to multiple distributions, and we provide a constructive equivalence between divergences, statistical information (in the sense of DeGroot) and losses for multiclass classification. A major application of our results is in multiclass classification problems in which we must both infer a discriminant function \(\gamma\) – for making predictions on a label \(Y\) from datum \(X\) – and a data representation (or, in the setting of a hypothesis testing problem, an experimental design), represented as a quantizer \(\mathsf{q}\) from a family of possible quantizers \(\mathsf{Q}\). In this setting, we characterize the equivalence between loss functions, meaning that optimizing either of two losses yields an optimal discriminant and quantizer \(\mathsf{q}\), complementing and extending earlier results of X. Nguyen et al. [ibid. 37, No. 2, 876–904 (2009; Zbl 1162.62060)] to the multiclass case. Our results provide a more substantial basis than standard classification calibration results for comparing different losses: we describe the convex losses that are consistent for jointly choosing a data representation and minimizing the (weighted) probability of error in multiclass classification problems.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62B10 Statistical aspects of information-theoretic topics
62G10 Nonparametric hypothesis testing
62K05 Optimal statistical designs
94A17 Measures of information, entropy


