×

An item response theory model of matching test performance. (English) Zbl 1437.91370

Summary: In a matching test, a test taker is presented with a list of test items and a list of response alternatives and asked to match each response alternative with a test item. The response alternatives can be given as a response to at most one test item. As a result, the response a test taker offers to one test item depends on his or her responses to all of the other test items. This violates the “local independence” assumption underlying most existing item response theory (IRT) methods, such as the Rasch model. Here we develop a framework for extending dichotomous IRT models to account for test taking behavior on matching tests. This model separates an individual’s knowledge of the correct responses to the items of a matching test from his or her responses to those items. In addition to developing the matching framework, we derive a number of important properties, including its item response function and score distribution. Finally, we demonstrate through an empirical example that our matching test framework provides a good account of behavior on matching tests.

MSC:

91E10 Cognitive psychology
62F15 Bayesian inference
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Aiken, L. R., Psychological testing and assessment (1997), Allyn & Bacon
[2] Anders, R.; Batchelder, W. H., Cultural consensus theory for the ordinal data case, Psychometrika, 80, 1, 151-181 (2015) · Zbl 1314.62267
[3] Anders, R.; Oravecz, Z.; Batchelder, W., Cultural consensus theory for continuous responses: A latent appraisal model for information pooling, Journal of Mathematical Psychology, 61, 1-13 (2014) · Zbl 1309.91108
[4] Andrich, D., A rating formulation for ordered response categories, Psychometrika, 43, 561-573 (1978) · Zbl 0438.62086
[5] Barton, M. A.; Lord, F. M., An upper asymptote for the three-parameter logistic item-response modelTechnical report (1981), Educational Testing Services: Educational Testing Services Princeton, NJ
[6] Batchelder, W. H.; Romney, A. K., Test theory without an answer key, Psychometrika, 53, 1, 71-92 (1988) · Zbl 0718.62260
[7] Bean, T. W.; Searles, D.; Singer, H.; Cowen, S., Learning concepts from biology text through pictorial analogies and an analogical study guide, The Journal of Educational Research, 84, 4, 233-237 (1990)
[8] Béguin, A. A.; Glas, C. A., MCMC estimation and some model-fit analysis of multidimensional IRT models, Psychometrika, 66, 4, 541-561 (2001) · Zbl 1293.62234
[9] Benson, J., A redefinition of content validity, Educational and Psychological Measurement, 41, 3, 793-802 (1981)
[10] Benson, J.; Crocker, L., The effects of item format and reading ability on objective test performance: A question of validity, Educational and Psychological Measurement, 39, 2, 381-387 (1979)
[11] Birnbaum, A., Some latent trait models and their use in inferring an examinee’s ability, (Statistical theories of mental test scores (1968), Addison-Wesley)
[12] Bock, R. D., Estimating item parameters and latent ability when responses are scored in two or more latent categories, Psychometrika, 29-51 (1972) · Zbl 0233.62016
[13] Bradlow, E. T.; Wainer, H.; Wang, X., A Bayesian rand om effects model for testlets, Psychometrika, 64, 2, 153-168 (1999) · Zbl 1365.62451
[14] Craig, S. D.; Gholson, B.; Driscoll, D. M., Animated pedagogical agents in multimedia educational environments: Effects of agent properties, picture features, and redundancy, Journal of Educational Psychology (2002)
[15] Fox, J.-P., Bayesian item response modeling (2010), Springer · Zbl 1271.62012
[16] Gelman, A.; Carlin, J. B.; Stern, H. S.; Dunson, D. B.; Vehtari, A.; Rubin, D. B., Bayesian data analysis (2014), CRC Press: CRC Press Boca Raton, FL · Zbl 1279.62004
[17] Gronlund, N. E., Assessment of student acheivement (1998), Allyn & Bacon
[18] Haynie, W. J., Effects of multiple-choice and matching tests on delayed retention in postsecondary metals technology, Journal of Industrial Teacher Education, 40, 2 (2003)
[19] Heywood, J., Assessment in higher education (1977), John Wiley & Sons
[20] Lee, M. D.; Steyvers, M.; De Young, M.; Miller, B., Inferring expertise in knowledge and prediction ranking tasks, Topics in Cognitive Science, 4, 1, 151-163 (2012)
[21] Lee, M. D.; Steyvers, M.; Miller, B., A cognitive model for aggregating people’s rankings, PLoS One, 9, 5, Article e96431 pp. (2014)
[22] Lord, F. M.; Wingersky, M. S., Comparison of IRT true-score and equipercentile observed-score “equatings”, Applied Psychological Measurement, 8, 4, 453-461 (1984)
[23] Masters, G. N., A Rasch model for partial credit scoring, Psychometrika, 47, 149-174 (1982) · Zbl 0493.62094
[24] McDonald, R. P., Test theory: A unified treatment (2013), Psychology Press: Psychology Press New York
[25] Miller, M. D.; Linn, R. L.; Gronlund, N. E., Measurement and assessment in teaching (2009), Merrill/Pearson
[26] Molenaar, I. W., Estimation of item parameters, (Fischer, G. H.; Molenaar, I. W., Rasch models: foundations, recent developments, and applications (1995), Springer-Verlag: Springer-Verlag New York), 39-51 · Zbl 0831.62093
[27] Moore, K. D., Classroom teaching skills (2001), McGraw-Hill: McGraw-Hill New York
[28] Moreno, R.; Mayer, R. E., Cognitive principles of multimedia learning: The role of modality and contiguity, Journal of Educational Psychology, 91, 358-368 (1999)
[29] Nitko, A. J.; Brookhart, S. M., Educational assessment of students (2010), Pearson
[30] Osgood, D. W.; McMorris, B. J.; Potenza, M. T., Analyzing multiple-item measures of crime and deviance I: Item response theory scaling, Journal of Quantitative Criminology, 18, 3, 267-296 (2002)
[31] Osterweil, D.; Mulford, P.; Syndulko, K.; Martin, M., Cognitive function in old and very old residents of a residential facility: Relationship to age, education and dementia, Journal of the American Geriatrics Society, 42, 7, 766-773 (1994)
[32] Popp, H. M., Visual discrimination of alphabet letters, The Reading Teacher (1964)
[33] Rasch, G., Probabilistic models for some intelligence and attainment tests (1960), Danish Paedagogiske Institut: Danish Paedagogiske Institut Copenhagen
[34] Reise, S. P.; Waller, N. G., How many IRT Parameters does it take to model psychopathy items?, Psychological Methods, 8, 2, 164-184 (2003)
[35] Romney, A. K.; Batchelder, W. H.; Weller, S. C., Recent applications of cultural consensus theory, American Behavioral Scientist, 31, 2, 163-177 (1987)
[36] Samejima, F., Estimation of ability using a response pattern of graded scores, (Psychometrika monograph, no. 17 (1969))
[37] Shaha, S. H., Matching-tests: Redcued anxiety and increased test effectiveness, Educational and Psychological Measurement, 44, 4, 869-881 (1984)
[38] Sinharay, S.; Johnson, M. S.; Stern, H. S., Posterior predictive assessment of item response theory models, Applied Psychological Measurement, 30, 4, 298-321 (2006)
[39] Thissen, D.; Steinberg, L., A response model for multiple choice items, Psychometrika, 49, 501-519 (1984)
[40] Thissen, D.; Steinberg, L.; Wainer, H., Detection of differential item functioning using the parameters of item response models, (Holland, P. W.; Wainer, H., Differential item functioning (2012), Routledge: Routledge New York, NY)
[41] Zeigenfuse, M. D.; Steyvers, M., An Item Response Theory Model of Matching Test Performance: Stan Code. osf.io/gp6s4 (2020)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.