Some paradoxical results for the quadratically weighted kappa. (English) Zbl 1284.62764

Summary: The quadratically weighted kappa is the most commonly used weighted kappa statistic for summarizing interrater agreement on an ordinal scale. The paper presents several properties of the quadratically weighted kappa that are paradoxical. For agreement tables with an odd number of categories \(n\) it is shown that if one of the raters uses the same base rates for categories 1 and \(n\), categories 2 and \(n - 1\), and so on, then the value of quadratically weighted kappa does not depend on the value of the center cell of the agreement table. Since the center cell reflects the exact agreement of the two raters on the middle category, this result questions the applicability of the quadratically weighted kappa to agreement studies. If one wants to report a single index of agreement for an ordinal scale, it is recommended that the linearly weighted kappa instead of the quadratically weighted kappa is used.


62P15 Applications of statistics to psychology
Full Text: DOI Link


[1] Agresti, A. (1988). A model for agreement between ratings on an ordinal scale. Biometrics, 44, 539–548. · Zbl 0707.62227
[2] Agresti, A. (2010). Analysis of ordinal categorical data (2nd ed.). Hoboken: Wiley. · Zbl 1263.62007
[3] Becker, M.P. (1989). Using association models to analyse agreement data: two examples. Statistics in Medicine, 8, 1199–1207.
[4] Brennan, R.L., & Prediger, D.J. (1981). Coefficient kappa: some uses, misuses, and alternatives. Educational and Psychological Measurement, 41, 687–699.
[5] Brenner, H., & Kliebsch, U. (1996). Dependence of weighted kappa coefficients on the number of categories. Epidemiology, 7, 199–202.
[6] Cicchetti, D., & Allison, T. (1971). A new procedure for assessing reliability of scoring EEG sleep recordings. The American Journal of EEG Technology, 11, 101–109.
[7] Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 213–220.
[8] Cohen, J. (1968). Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220.
[9] Crewson, P.E. (2005). Fundamentals of clinical research for radiologists: reader agreement studies. American Journal of Roentgenology, 184, 1391–1397.
[10] Fleiss, J.L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613–619.
[11] Graham, P., & Jackson, R. (1993). The analysis of ordinal agreement data: beyond weighted kappa. Journal of Clinical Epidemiology, 46, 1055–1062.
[12] Hsu, L.M., & Field, R. (2003). Interrater agreement measures: comments on kappan, Cohen’s kappa, Scott’s {\(\pi\)} and Aickin’s {\(\alpha\)}. Understanding Statistics, 2, 205–219.
[13] Jakobsson, U., & Westergren, A. (2005). Statistical methods for assessing agreement for ordinal data. Scandinavian Journal of Caring Sciences, 19, 427–431.
[14] Kundel, H.L., & Polansky, M. (2003). Measurement of observer agreement. Radiology, 288, 303–308.
[15] Maclure, M., & Willett, W.C. (1987). Misinterpretation and misuse of the kappa statistic. American Journal of Epidemiology, 126, 161–169.
[16] Schuster, C. (2004). A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales. Educational and Psychological Measurement, 64, 243–253.
[17] Tanner, M.A., & Young, M.A. (1985). Modeling ordinal scale agreement. Psychological Bulletin, 98, 408–415.
[18] Vanbelle, S., & Albert, A. (2009a). Agreement between two independent groups of raters. Psychometrika, 74, 477–491. · Zbl 1272.62135
[19] Vanbelle, S., & Albert, A. (2009b). A note on the linearly weighted kappa coefficient for ordinal scales. Statistical Methodology, 6, 157–163. · Zbl 1220.62172
[20] Warrens, M.J. (2008a). On the equivalence of Cohen’s kappa and the Hubert-Arabie adjusted Rand index. Journal of Classification, 25, 177–183. · Zbl 1276.62043
[21] Warrens, M.J. (2008b). On similarity coefficients for 2{\(\times\)}2 tables and correction for chance. Psychometrika, 73, 487–502. · Zbl 1301.62125
[22] Warrens, M.J. (2010a). Inequalities between kappa and kappa-like statistics for k{\(\times\)}k tables. Psychometrika, 75, 176–185. · Zbl 1272.62138
[23] Warrens, M.J. (2010b). A formal proof of a paradox associated with Cohen’s kappa. Journal of Classification, 27, 322–332. · Zbl 1337.62143
[24] Warrens, M.J. (2010c). Cohen’s kappa can always be increased and decreased by combining categories. Statistical Methodology, 7, 673–677. · Zbl 1232.62161
[25] Warrens, M.J. (2011a). Weighted kappa is higher than Cohen’s kappa for tridiagonal agreement tables. Statistical Methodology, 8, 268–272. · Zbl 1213.62187
[26] Warrens, M.J. (2011b). Cohen’s linearly weighted kappa is a weighted average of 2{\(\times\)}2 kappas. Psychometrika, 76, 471–486. · Zbl 1284.62763
[27] Warrens, M.J. (2012a). Cohen’s quadratically weighted kappa is higher than linearly weighted kappa for tridiagonal agreement tables. Statistical Methodology, 9, 440–444. · Zbl 1365.62217
[28] Warrens, M.J. (2012b, in press). Cohen’s linearly weighted kappa is a weighted average. Advances in Data Analysis and Classification. · Zbl 1284.62348
[29] Zwick, R. (1988). Another look at interrater agreement. Psychological Bulletin, 103, 374–378.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.