Cohen’s kappa can always be increased and decreased by combining categories. (English) Zbl 1232.62161

Summary: The kappa coefficient is a popular descriptive statistic for summarizing the cross classification of two nominal variables with identical categories. It has been frequently observed in the literature that combining two categories increases the value of kappa. We prove the following existence theorem for kappa: For any nontrivial \(k\times k\) agreement table with \(k \in \mathbb N_{\geq 3}\) categories, there exist two categories such that, when combined, the kappa value of the collapsed \((k - 1)\times (k - 1)\) agreement table is higher than the original kappa value. In addition, there exist two categories such that, when combined, the kappa value of the collapsed table is smaller than the original kappa value.


62P15 Applications of statistics to psychology
62P99 Applications of statistics
Full Text: DOI Link


[1] Agresti, A., Categorical Data Analysis (1990), Wiley: Wiley New York · Zbl 0716.62001
[2] Bishop, Y. M.M.; Fienberg, S. E.; Holland, P. W., Discrete Multivariate Analysis. Theory and Practice (1976), MIT Press: MIT Press Cambridge
[3] Brennan, R. L.; Prediger, D. J., Coefficient kappa: some uses, misuses, and alternatives, Educational and Psychological Measurement, 41, 687-699 (1981)
[4] Cohen, J., A coefficient of agreement for nominal scales, Educational and Psychological Measurement, 20, 213-220 (1960)
[5] Fleiss, J. L., Measuring nominal scale agreement among many raters, Psychological Bulletin, 76, 378-382 (1971)
[6] Fleiss, J. L., Measuring agreement between two judges on the presence or absence of a trait, Biometrics, 31, 651-659 (1975)
[7] James, I. R., Analysis of nonagreement among multiple raters, Biometrics, 39, 651-657 (1983) · Zbl 0523.62091
[8] Kraemer, H. C., Ramifications of a population model for \(\kappa\) as a coefficient of reliability, Psychometrika, 44, 461-472 (1979) · Zbl 0425.62088
[9] Kraemer, H. C., Extension of the kappa coefficient, Biometrics, 36, 207-216 (1980) · Zbl 0463.62103
[10] Kraemer, H. C., Measurement of reliability for categorical data in medical research, Statistical Methods in Medical Research, 1, 183-199 (1992)
[11] Krippendorff, K., Reliability in content analysis: some common misconceptions and recommendations, Human Communication Research, 30, 411-433 (2004)
[12] Roberts, C.; McNamee, R., A matrix of kappa-type coefficients to assess the reliability of nominal scales, Statistics in Medicine, 17, 471-488 (1998)
[13] Schouten, H. J.A., Nominal scale agreement among observers, Psychometrika, 51, 453-466 (1986)
[14] Visser, H.; De Nijs, T., The map comparison kit, Environmental Modelling & Software, 21, 346-358 (2006)
[15] Warrens, M. J., On the equivalence of Cohen’s kappa and the Hubert-Arabie adjusted Rand index, Journal of Classification, 25, 177-183 (2008) · Zbl 1276.62043
[16] Warrens, M. J., On similarity coefficients for 2×2 tables and correction for chance, Psychometrika, 73, 487-502 (2008) · Zbl 1301.62125
[17] Warrens, M. J., On association coefficients for 2×2 tables and properties that do not depend on the marginal distributions, Psychometrika, 73, 777-789 (2008) · Zbl 1284.62762
[18] Warrens, M. J., Inequalities between kappa and kappa-like statistics for \(k \times k\) tables, Psychometrika, 75, 176-185 (2010) · Zbl 1272.62138
[19] Zwick, R., Another look at interrater agreement, Psychological Bulletin, 103, 374-378 (1988)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.