×

Equivalences of weighted kappas for multiple raters. (English) Zbl 1365.62216

Summary: Cohen’s unweighted kappa and weighted kappa are popular descriptive statistics for measuring agreement between two raters on a categorical scale. With \(m \geq 3\) raters, there are several views in the literature on how to define agreement. We consider a family of weighted kappas for multiple raters using the concept of \(g\)-agreement (\(g = 2, 3, \ldots, m\)) which refers to the situation in which it is decided that there is agreement if \(g\) out of \(m\) raters assign an object to the same category. Given \(m\) raters, we may formulate \(m - 1\) weighted kappas in this family, one for each type of \(g\)-agreement. We show that the \(m - 1\) weighted kappas coincide if we use the weighting scheme proposed by P. W. Mielke, K. J. Berry and J. E. Johnston [“The exact variance of weighted kappa with multiple raters”, Psychol. Rep. 101, No. 2, 655–660 (2007; doi:10.2466/pr0.101.2.655-660)].

MSC:

62H17 Contingency tables
62H20 Measures of association (correlation, canonical correlation, etc.)
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Abraira, V.; Pérez de Vargas, A., Generalization of the kappa coefficient for ordinal categorical data, multiple observers and incomplete designs, Qüestiió, 23, 561-571, (1999) · Zbl 1167.62472
[2] Banerjee, M.; Capozzoli, M.; McSweeney, L.; Sinha, D., Beyond kappa: A review of interrater agreement measures, Canadian journal of statistics, 27, 3-23, (1999) · Zbl 0929.62117
[3] Berry, K.J.; Mielke, P.W., A generalization of cohen’s kappa agreement measure to interval measurement and multiple raters, Educational and psychological measurement, 48, 921-933, (1988)
[4] Berry, K.J.; Johnston, J.E.; Mielke, P.W., Weighted kappa for multiple raters, Perceptual and motor skills, 107, 837-848, (2008)
[5] Brennan, R.L.; Prediger, D.J., Coefficient kappa: some uses, misuses, and alternatives, Educational and psychological measurement, 41, 687-699, (1981)
[6] Brenner, H.; Kliebsch, U., Dependence of weighted kappa coefficients on the number of categories, Epidemiology, 7, 199-202, (1996)
[7] Cicchetti, D.; Allison, T., A new procedure for assessing reliability of scoring EEG sleep recordings, The American journal of EEG technology, 11, 101-109, (1971)
[8] Cicchetti, D.; Bronen, R.; Spencer, S.; Haut, S.; Berg, A.; Oliver, P.; Tyrer, P., Rating scales, scales of measurement, issues of reliability, resolving some critical issues for clinicians and researchers, The journal of nervous and mental disease, 194, 557-564, (2006)
[9] Cohen, J., A coefficient of agreement for nominal scales, Educational and psychological measurement, 20, 213-220, (1960)
[10] Cohen, J., Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit, Psychological bulletin, 70, 213-220, (1968)
[11] Conger, A.J., Integration and generalization of kappas for multiple raters, Psychological bulletin, 88, 322-328, (1980)
[12] Crewson, P.E., Fundamentals of clinical research for radiologists, reader agreement studies, American journal of roentgenology, 184, 1391-1397, (2005)
[13] Davies, M.; Fleiss, J.L., Measuring agreement for multinomial data, Biometrics, 38, 1047-1051, (1982) · Zbl 0501.62045
[14] Fleiss, J.L., Statistical methods for rates and proportions, (1981), Wiley New York · Zbl 0544.62002
[15] Fleiss, J.L.; Cohen, J., The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability, Educational and psychological measurement, 33, 613-619, (1973)
[16] Gower, J.C.; De Rooij, M., A comparison of the multidimensional scaling of triadic and dyadic distances, Journal of classification, 20, 115-136, (2003) · Zbl 1113.91334
[17] Graham, P.; Jackson, R., The analysis of ordinal agreement data: beyond weighted kappa, Journal of clinical epidemiology, 46, 1055-1062, (1993)
[18] Heiser, W.J.; Bennani, M., Triadic distance models: axiomatization and least squares representation, Journal of mathematical psychology, 41, 189-206, (1997) · Zbl 1072.91639
[19] A.P.J.M. Heuvelmans, P.F. Sanders, Beoordelaarsovereenstemming, in: T.J.H.M. Eggen, P.F. Sanders (Eds.), Psychometrie in de Praktijk, Arnhem: Cito Instituut voor Toestontwikkeling, 1993, pp. 443-470.
[20] Holmquist, N.D.; McMahan, C.A.; Williams, E.O., Variability in classification of carcinoma in situ of the uterine cervix, Obstetrical & gynecological survey, 23, 580-585, (1967)
[21] Hsu, L.M.; Field, R., Interrater agreement measures: comments on \(\operatorname{kappa}_n\), cohen’s kappa, scott’s \(\pi\) and aickin’s \(\alpha\), Understanding statistics, 2, 205-219, (2003)
[22] Hubert, L., Kappa revisited, Psychological bulletin, 84, 289-297, (1977)
[23] Janson, H.; Olsson, U., A measure of agreement for interval or nominal multivariate observations, Educational and psychological measurement, 61, 277-289, (2001)
[24] Kraemer, H.C., Ramifications of a population model for \(\kappa\) as a coefficient of reliability, Psychometrika, 44, 461-472, (1979) · Zbl 0425.62088
[25] Kraemer, H.C.; Periyakoil, V.S.; Noda, A., Tutorial in biostatistics: kappa coefficients in medical research, Statistics in medicine, 21, 2109-2129, (2004)
[26] Landis, J.R.; Koch, G.G., The measurement of observer agreement for categorical data, Biometrics, 33, 159-174, (1977) · Zbl 0351.62039
[27] Landis, J.R.; Koch, G.G., An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers, Biometrics, 33, 363-374, (1977) · Zbl 0357.62037
[28] Light, R.J., Measures of response agreement for qualitative data: some generalizations and alternatives, Psychological bulletin, 76, 365-377, (1971)
[29] Maclure, M.; Willett, W.C., Misinterpretation and misuse of the kappa statistic, American journal of epidemiology, 126, 161-169, (1987)
[30] Mielke, P.W.; Berry, K.J., A note on cohen’s weighted kappa coefficient of agreement with linear weights, Statistical methodology, 6, 439-446, (2009)
[31] Mielke, P.W.; Berry, K.J.; Johnston, J.E., The exact variance of weighted kappa with multiple raters, Psychological reports, 101, 655-660, (2007)
[32] Mielke, P.W.; Berry, K.J.; Johnston, J.E., Resampling probability values for weighted kappa with multiple raters, Psychological reports, 102, 606-613, (2008)
[33] Nelson, J.C.; Pepe, M.S., Statistical description of interrater variability in ordinal ratings, Statistical methods in medical research, 9, 475-496, (2000) · Zbl 1121.62644
[34] R. Popping, Overeenstemmingsmaten Voor Nominale Data, Ph.D. Thesis, Rijksuniversiteit Groningen, Groningen, 1983.
[35] Popping, R., Some views on agreement to be used in content analysis studies, Quality & quantity, 44, 1067-1078, (2010)
[36] Schouten, H.J.A., Nominal scale agreement among observers, Psychometrika, 51, 453-466, (1986)
[37] Schuster, C., A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales, Educational and psychological measurement, 64, 243-253, (2004)
[38] Schuster, C.; Smith, D.A., Dispersion-weighted kappa: an integrative framework for metric and nominal scale agreement coefficients, Psychometrika, 70, 135-146, (2005) · Zbl 1306.62498
[39] Vanbelle, S.; Albert, A., Agreement between two independent groups of raters, Psychometrika, 74, 477-491, (2009) · Zbl 1272.62135
[40] Vanbelle, S.; Albert, A., Agreement between an isolated rater and a group of raters, Statistica neerlandica, 63, 82-100, (2009)
[41] Vanbelle, S.; Albert, A., A note on the linearly weighted kappa coefficient for ordinal scales, Statistical methodology, 6, 157-163, (2009) · Zbl 1220.62172
[42] Von Eye, A.; Mun, E.Y., Analyzing rater agreement. manifest variable methods, (2006), Lawrence Erlbaum Associates
[43] Warrens, M.J., On the equivalence of cohen’s kappa and the hubert – arabie adjusted rand index, Journal of classification, 25, 177-183, (2008) · Zbl 1276.62043
[44] Warrens, M.J., On similarity coefficients for 2×2 tables and correction for chance, Psychometrika, 73, 487-502, (2008) · Zbl 1301.62125
[45] Warrens, M.J., On multi-way metricity, minimality and diagonal planes, Advances in data analysis and classification, 2, 109-119, (2008) · Zbl 1306.62037
[46] Warrens, M.J., \(k\)-adic similarity coefficients for binary (presence/absence) data, Journal of classification, 26, 227-245, (2009) · Zbl 1337.62142
[47] Warrens, M.J., Inequalities between kappa and kappa-like statistics for \(k \times k\) tables, Psychometrika, 75, 176-185, (2010) · Zbl 1272.62138
[48] Warrens, M.J., Cohen’s kappa can always be increased and decreased by combining categories, Statistical methodology, 7, 673-677, (2010) · Zbl 1232.62161
[49] Warrens, M.J., A formal proof of a paradox associated with cohen’s kappa, Journal of classification, 27, 322-332, (2010) · Zbl 1337.62143
[50] Warrens, M.J., Inequalities between multi-rater kappas, Advances in data analysis and classification, 4, 271-286, (2010) · Zbl 1284.62338
[51] Warrens, M.J., \(n\)-way metrics, Journal of classification, 27, 173-190, (2010) · Zbl 1337.54019
[52] Warrens, M.J., A family of multi-rater kappas that can always be increased and decreased by combining categories, Statistical methodology, 9, 3, 330-340, (2012) · Zbl 1365.62214
[53] Warrens, M.J., Cohen’s linearly weighted kappa is a weighted average of 2×2 kappas, Psychometrika, 76, 471-486, (2011) · Zbl 1284.62763
[54] Warrens, M.J., Weighted kappa is higher than cohen’s kappa for tridiagonal agreement tables, Statistical methodology, 8, 268-272, (2011) · Zbl 1213.62187
[55] M.J. Warrens, Cohen’s linearly weighted kappa is a weighted average, Advances in Data Analysis and Classification (2011) (in press). · Zbl 1248.62227
[56] M.J. Warrens, Some paradoxical results for the quadratically weighted kappa, Psychometrika (2011) (in press). · Zbl 1284.62764
[57] Zwick, R., Another look at interrater agreement, Psychological bulletin, 103, 374-378, (1988)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.