zbMATH — the first resource for mathematics

Bayesian testing of agreement criteria under order constraints. (English) Zbl 1359.62081
Summary: The most popular criterion to measure the overall agreement between two raters is the Cohen’s kappa coefficient. This coefficient measures the agreement of two raters who judge about some subjects with a binary nominal rating. In this paper, we consider a unified Bayesian approach for testing some hypotheses about the kappa coefficients under order constraints. This is done for rating of more than two studies with binary response. The Monte Carlo Markov chain (MCMC) approach is used for the model implementation. The approach is illustrated using some simulation studies. Also, the proposed method is applied for analyzing a real data set.
62F15 Bayesian inference
62F03 Parametric hypothesis testing
62F30 Parametric inference under constraints
62P10 Applications of statistics to biology and medical sciences; meta analysis
62P15 Applications of statistics to psychology
Full Text: DOI
[1] Agresti, A., Testing marginal homogeneity for ordinal categorical variables, Biometrics, 39, 505-510, (1983)
[2] Altaye, M.; Donner, A.; Klar, N., Inference procedures for assessing agreement among multiple raters, Biometrics, 57, 2, 584-588, (2001) · Zbl 1209.62095
[3] Barlow, W.; Lai, M. Y.; Azen, S., A comparison of methods for calculating a stratified kappa, Statistics in Medicine, 10, 1465-1472, (1991)
[4] Basu, S.; Banerjee, M.; Sen, A., Bayesian inference for kappa from single and multiple studies, Biometrics, 56, 2, 577-582, (2000) · Zbl 1060.62507
[5] Bloch, D. A.; Kraemer, H. C., 2 × 2 kappa coefficients: measures of agreement or association, Biometrics, 45, 269-287, (1989) · Zbl 0715.62113
[6] Brennan, R. L.; Prediger, D. J., Coefficient kappa: some uses, misuses, and alternatives, Educational and Psychological Measurement, 41, 687-699, (1981)
[7] Cohen, J., A coefficient of agreement for nominal scales, Educational and Psychological Measurement, 20, 213-220, (1960)
[8] Conger, A. J., Integration and generalization of kappas for multiple raters, Psychological Bulletin, 88, 322-328, (1980)
[9] Davies, M.; Fleiss, J. L., Measuring agreement for multinomial data, Biometrics, 38, 1047-1051, (1982) · Zbl 0501.62045
[10] Dickey, J., The weighted likelihood ratio, linear hypotheses on normal location parameters, The Annals of Statistics, 42, 204-223, (1971) · Zbl 0274.62020
[11] Dickey, J., Approximate posterior distributions, Journal of the American Statistical Association, 71, 680-689, (1976) · Zbl 0343.62004
[12] Dickey, J.; Lientz, B. P., The weighted likelihood ratio, sharp hypotheses about chances, the order of a Markov chain, The Annals of Mathematical Statistics, 41, 214-226, (1970) · Zbl 0188.50102
[13] Donner, A.; Eliasziw, M., A goodness-of-fit approach to inference procedures for the kappa statistic: confidence interval construction, significance-testing and sample size estimation, Statistics in Medicine, 11, 1511-1519, (1992)
[14] Donner, A.; Eliasziw, M.; Klar, N., Testing the homogeniety of kappa statistics, Biometrics, 52, 176-183, (1996) · Zbl 0880.62110
[15] Fleiss, J. L., Measuring nominal scale agreement among many raters, Psychological Bulletin, 76, 378-382, (1971)
[16] Fleiss, J. L., Statistical methods for rates and proportions, (1981), Wiley New York · Zbl 0544.62002
[17] Gilks, W. R.; Wild, P., Adaptive rejection sampling for Gibbs sampling, Applied Statistics, 337-348, (1992) · Zbl 0825.62407
[18] Hale, C. A.; Fleiss, J. L., Interval estimation under two study designs for kappa with binary classifications, Biometrics, 49, 523-534, (1993)
[19] Hoijtink, H., Informative hypotheses: theory and practice for behavioral and social scientists, (2011), Chapman & Hall/CRC London, UK
[20] Hsu, L. M.; Field, R., Inter-rater agreement measures: comments on kappa, cohen’s kappa, scott’s \(\pi\) and aickin’s \(\alpha\), Understanding Statistics, 2, 205-219, (2003)
[21] Jakobsson, U.; Westergren, A., Statistical methods for assessing agreement for ordinal data, Scandinavian Journal of Caring Sciences, 19, 427-431, (2005)
[22] Kass, R. E.; Raftery, A. E., Bayes factors, Journal of the American Statistical Association, 90, 773-795, (1995) · Zbl 0846.62028
[23] Klugkist, I.; Laudy, O.; Hoijtink, H., Inequality constrained analysis of variance: A Bayesian approach, Psychological Methods, 10, 477-493, (2005)
[24] Klugkist, I.; Laudy, O.; Hoijtink, H., Bayesian evaluation of inequality and equality constrained hypotheses for contingency tables, Psychological Methods, (2010)
[25] Kraemer, H. C., How many raters? toward the most reliable diagnostic consensus, Statistics in Medicine, 11, 317-332, (1992)
[26] Kraemer, H. C., Measurement of reliability for categorical data in medical research, Statistical Methods in Medical Research, 1, 183-200, (1992)
[27] Kraemer, H. C.; Periyakoil, V. S.; Noda, A., Tutorial in biostatistics: kappa coefficients in medical research, Statistics in Medicine, 21, 2109-2129, (2004)
[28] Krippendorff, K., Reliability in content analysis: some common misconceptions and recommendations, Human Communication Research, 30, 411-433, (2004)
[29] Lee, J. J.; Tu, Z. N., A better confidence interval for kappa (\(\kappa\)) on measuring agreement between two raters with binary outcomes, Journal of Computational and Graphical Statistics, 3, 3, 301-321, (1994)
[30] Lipsitz, S. R.; Laird, N. M.; Breman, T. A., Simple moment estimates of the k-coefficient and its variance, Applied Statistics, 43, 2, 309-323, (1994) · Zbl 0825.62891
[31] Mulder, J., Bayes factors for testing order-constrained hypotheses 486 on correlations, Journal of Mathematical Psychology, (2015)
[32] Mulder, J., Bayes factors for testing order-constrained hypotheses on correlations, Journal of Mathematical Psychology, 72, 104-115, (2016) · Zbl 1357.62122
[33] Mulder, J.; Hoijtink, H.; Klugkist, I., Equality and inequality constrained multivariate linear models: objective model selection using constrained posterior priors, Journal of Statistical Planning and Inference, 140, 887-906, (2010) · Zbl 1179.62041
[34] Mulder, J.; Klugkist, I.; Meeus, W.; van de Schoot, A.; Selfhout, M.; Hoijtink, H., Bayesian model selection of informative hypotheses for repeated measurements, Journal of Mathematical Psychology, 53, 530-546, (2009) · Zbl 1181.62026
[35] Nam, J. M., Homogeneity score test for the intraclass version of the kappa statistics and sample size determination in multiple or stratified studies, Biometrics, 59, 1027-1035, (2003) · Zbl 1274.62847
[36] Oh, M. S.; Shin, D. W., A unified Bayesian inference on treatment means with order constraints, Computational Statistics & Data Analysis, 55, 1, 924-934, (2011) · Zbl 1247.62092
[37] Popping, R., Some views on agreement to be used in content analysis studies, Quality & Quantity, 44, 1067-1078, (2010)
[38] Rifkin, M. D.; Zerhouni, E. A.; Constantine, M. D.; Gastonis, C. A.; Quint, L. E.; Paushter, D. M.; Epstein, J. I.; Hamper, U.; Walsh, P. C.; McNeil, B. J., Comparison of magnetic resonance imaging and ultrasonography in staging early prostate cancer, The New England Journal of Medicine, 323, 10, 621-626, (1990)
[39] Rogel, A.; Boelle, P. Y.; Mary, J. Y., Global and partial agreement among several observers, Statistics Methods, 17, 489-501, (1998)
[40] Scott, W. A., Reliability of content analysis: the case of nominal scale coding, Public Opinion Quarterly, 19, 321-325, (1955)
[41] Shoukri, M. M., Measures of inter observer agreement, (2004), Chapman &Hall/CRC Boca Raton · Zbl 1039.62106
[42] Shoukri, M. M.; Martin, S. W.; Mian, I. U.H., Maximum likelihood estimation of the kappa coefficient from models of matched binary responses, Statistics in Medicine, 14, 83-99, (1995)
[43] Sturtz, S.; Ligges, U.; Gelman, A., R2winbugs: A package for running winbugs from R, Journal of Statistical Software, 12, 3, 1-16, (2005)
[44] Thompson, W. D.; Walter, S. D., A reappraisal of the kappa coefficient, Journal of Clinical Epidemiology, 41, 949-958, (1988)
[45] Vanbelle, S.; Albert, A., Agreement between two independent groups of raters, Psychometrika, 74, 477-491, (2009) · Zbl 1272.62135
[46] Vanbelle, S.; Albert, A., Agreement between an isolated rater and a group of raters, Statistica Neerlandica, 63, 82-100, (2009)
[47] Verdinelli, I.; Wasserman, L., Computing Bayes factors using a generalization of the savage-Dickey density ratio, Journal of the American Statistical Association, 90, 614-618, (1995) · Zbl 0826.62022
[48] Visser, H.; de Nijs, T., The map comparison kit, Environmental Modeling and Software, 21, 346-358, (2006)
[49] Von Eye, A.; Mun, E. Y., Analyzing rater agreement, manifest variable methods, (2005), Lawrence Erlbaum Associates Mahwash, N.J., London
[50] Warrens, M. J., On similarity coefficients for 2 × 2 tables and correction for chance, Psychometrika, 73, 487-502, (2008) · Zbl 1301.62125
[51] Warrens, M. J., On the equivalence of cohen’s kappa and the hubert-arabie adjusted rand index, Journal of Classification, 25, 177-183, (2008) · Zbl 1276.62043
[52] Warrens, M. J., Inequalities between kappa and kappa-like statistics for k × k tables, Psychometrika, 75, 176-185, (2010) · Zbl 1272.62138
[53] Warrens, M. J., A formal proof of a paradox associated with cohen’s kappa, Journal of Classification, 27, 322-332, (2010) · Zbl 1337.62143
[54] Warrens, M. J., Weighted kappa is higher than cohen’s kappa for tridiagonal agreement tables, Statistical Methodology, 4, 271-286, (2011) · Zbl 1213.62187
[55] Williams, G. W., Comparing the joint agreement of several raters with one rater, Biometrics, 32, 619-627, (1976) · Zbl 0336.62086
[56] Zwick, R., Another look at interrater agreement, Psychological Bulletin, 103, 374-378, (1988)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.