×

Psychometrics: from practice to theory and back. 15 years of nonparametric multidimensional IRT, DIF/test equity, and skills diagnostic assessment. (English) Zbl 1298.62198

Summary: The paper surveys 15 years of progress in three psychometric research areas: latent dimensionality structure, test fairness, and skills diagnosis of educational tests. It is proposed that one effective model for selecting and carrying out research is to chose one’s research questions from practical challenges facing educational testing, then bring to bear sophisticated probability modeling and statistical analyses to solve these questions, and finally to make effectiveness of the research answers in meeting the educational testing challenges be the ultimate criterion for judging the value of the research. The problem-solving power and the joy of working with a dedicated, focused, and collegial group of colleagues is emphasized. Finally, it is suggested that the summative assessment testing paradigm that has driven test measurement research for over half a century is giving way to a new paradigm that in addition embraces skills level formative assessment, opening up a plethora of challenging, exciting, and societally important research problems for psychometricians.
This aricle is based on the Presidential Address the author gave on June 23, 2002 at the 67th Annual Meeting of the Psychometric Society held in Chapel Hill, North Carolina.

MSC:

62P15 Applications of statistics to psychology
62-02 Research exposition (monographs, survey articles) pertaining to statistics

Software:

TESTGRAF; MSP5; DIMTEST
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ackerman, T.A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective.Journal of Educational Measurement, 29, 67–91. · doi:10.1111/j.1745-3984.1992.tb00368.x
[2] Angoff, W.H. (1993). Perspectives on differential item functioning methodology. In P.W. Holland & H. Wainer (Eds.),Differential item functioning (pp. 3–24). Hillsdale, NJ: Lawrence Erlbaum Associates.
[3] Bolt, D., Froelich, A.G., Habing, B., Hartz, S., Roussos, L., & Stout, W. (in press).An applied and foundational research project addressing DIF, impact, and equity: With applications to ETS test development (ETS Technical Report). Princeton, NJ:ETS.
[4] Chang, H., Mazzeo, J., & Roussos, L. (1996). Detecting DIF for polytomously scored items: an adaptation of the SIBTEST procedure.Journal of Educational Measurement, 33, 333–353 · doi:10.1111/j.1745-3984.1996.tb00496.x
[5] Chang, H., & Stout, W. (1993). The asymptotic posterior normality of the latent trait in an IRT model.Psychometrika, 58, 37–52. · Zbl 0785.62099 · doi:10.1007/BF02294469
[6] DiBello, L., Stout, W., & Roussos, L. (1995). Unified cognitive/psychometric diagnostic assessment likelihood-based classification techniques. In P. Nichols, S. Chipman, & R. Brennen (Eds.),Cognitively diagnostic assessment (pp. 361–389). Hillsdale, NJ: Earlbaum.
[7] Doignon, J.-P., & Falmagne, J.-C. (in press),Knowledge spaces. Berlin Springer-Verlag. · Zbl 0719.92026
[8] Dorans, N.J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test.Journal of Educational Measurement, 23, 355–368. · doi:10.1111/j.1745-3984.1986.tb00255.x
[9] Douglas, J. (1997). Joint consistency of nonparametric item characteristic curve and ability estimation.Psychometrika, 62, 7–28. · Zbl 1003.62546 · doi:10.1007/BF02294778
[10] Douglas, J.A. (2001). Asymptotic identifiability of nonparametric item response models.Psychometrika, 66, 531–540. · Zbl 1293.62241 · doi:10.1007/BF02296194
[11] Douglas J.A., & Cohen A. (2001). Nonparametric ICC estimation to assess fit of parametric models.Applied Psychological Measurement, 25, 234–243. · doi:10.1177/01466210122032046
[12] Douglas, J., Kim, H.R., Habing, B., & Gao, F. (1998) Investigating local dependence with conditional covariance functions.Journal of Educational and Behavioral Statistics, 23, 129–151. · doi:10.3102/10769986023002129
[13] Douglas, J., Roussos, L., & Stout, W., (1996). Item bundle DIF hypothesis testing: Identifying suspect bundles and assessing their DIF.Journal of Educational Measurement, 33, 465–484. · doi:10.1111/j.1745-3984.1996.tb00502.x
[14] Douglas, J., Stout, W., & DiBello, L. (1996). A kernel smoothed version of SIBTEST with applications to local DIF inference and unction estimation.Journal of Educational and Behavioral Statistics, 21, 333–363. · Zbl 02320001 · doi:10.3102/10769986021004333
[15] Ellis, J.L., & Junker, B.W. (1997). Tail-measurability in monotone latent variable models.Psychometrika, 62, 495–524. · Zbl 1053.62586 · doi:10.1007/BF02294640
[16] Embretson (Whitely), S.E. (1980). Multicomponent latent trait models for ability testsPsychometrika, 45, 479–494. · Zbl 0466.62106 · doi:10.1007/BF02293610
[17] Embretson, S.E. (1984). A general latent trait model for response processes.Psychometrika, 49, 175–186. · doi:10.1007/BF02294171
[18] Embretson, S. E. (Ed.). (1985),Test design: Developments in psychology and psychometrics (pp. 195–218, chap. 7). Orlando, FL: Academic Press.
[19] Fischer, G.H. (1973). The linear logistic test model as an instrument in educational research.Acta Psychologica, 37, 359–374. · doi:10.1016/0001-6918(73)90003-6
[20] Froelich, A.G., & Habing, B. (2002, July). A study of methods for selecting the AT subtest in the DIMTEST procedure. Paper presented at the 2002 Annual Meeting of the Psychometrika Society, University of North Carolina at Chapel Hill.
[21] Gierl, M.J., Bisanz, J., Bisanz, G., Boughton, K., & Khaliq, S. (2001). Illustrating the utility of differential bundle functioning analyses to identify and interpret group differences on achievement tests.Educational Measurement: Issues and Practice, 20, 26–36. · doi:10.1111/j.1745-3992.2001.tb00060.x
[22] Gierl, M.J., & Khaliq, S.N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests.Journal of Educational Measurement, 38, 164–187. · doi:10.1111/j.1745-3984.2001.tb01121.x
[23] Gierl, M.J., Bisanz, J., Bisanz, G.L., & Boughton, K.A. (2002, April). Identifying content and cognitive skills that produce gender differences in mathematics: A demonstration of the DIF analysis framework. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA.
[24] Haberman, S.J. (1977). Maximum likelihood estimates in exponential response models.The Annals of Statistics, 5, 815–841. · Zbl 0368.62019 · doi:10.1214/aos/1176343941
[25] Habing, B. (2001). Nonparametric regression and the parametric bootstrap for local dependence assessment.Applied Psychological Measurement, 25, 221–233. · doi:10.1177/01466210122032037
[26] Haertel, E. (1989). Using restricted latent class models to map the skill structure of achievement items.Journal of Educational Measurement, 26, 301–321. · doi:10.1111/j.1745-3984.1989.tb00336.x
[27] Hartz, S.M. (2002).A Bayesian framework for the Unified Model for assessing cognitive abilities: blending theory with practicality. Unpublished doctoral dissertation, University of Illinois, Urbana-Champaign, Department of Statistics.
[28] Holland, P.W. (1990a). On the sampling theory foundations of item response theory models.Psychometrika, 55, 577–601. · Zbl 0736.62094 · doi:10.1007/BF02294609
[29] Holland, P.W. (1990b). The Dutch identity: a new tool for the study of item response models.Psychometrika, 55, 5–18. · Zbl 0725.62097 · doi:10.1007/BF02294739
[30] Holland, P.W., & Rosenbaum, P.R. (1986). Conditional association and unidimensionality in monotone latent variable models.The Annals of Statistics, 14, 1523–1543. · Zbl 0625.62102 · doi:10.1214/aos/1176350174
[31] Holland, W.P., & Thayer, D.T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H.I. Braun (Eds.),Test validity (pp. 129–145). Hillsdale, NJ: Lawrence Earlbaum Associates.
[32] Jiang, H., & Stout, W. (1998). Improved Type I error control and reduced estimation bias for DIF detection using SIBTEST.Journal of Educational and Behavioral Statistics, 23, 291–322. · doi:10.3102/10769986023004291
[33] Junker, B.W. (1993). Conditional association, essential independence, and monotone unidimensional latent variable models.Annals of Statistics, 21, 1359–1378. · Zbl 0791.62099 · doi:10.1214/aos/1176349262
[34] Junker, B.W. (1999).Some statistical models and computational methods that may be useful for cognitively-relevant assessment. Prepared for the National Research Council Committee on the Foundations of Assessment. Retrieved April 2, 2001, from http://www.stat.cmu.edu/rian/nrc/cfa/
[35] Junker, B.W., & Ellis, J.L. (1998). A characterization of monotone unidimensional latent variable models.Annals of Statistics, 25(3), 1327–1343. · Zbl 0898.62014
[36] Junker, B. W. & Sijtsma, K. (2001). Nonparametric item response theory in action: an overview of the special issue.Applied Psychological Measurement, 25, 211–220. · doi:10.1177/01466210122032028
[37] Koedinger, K.R., & MacLaren, B.A. (2002). Developing a pedagogical domain theory of early algebra problem solving (CMU-HCII Tech. Rep. 02-100). Pittsburgh, PA: Carnegie Mellon University, School of Computer Science.
[38] Li, H. & Stout, W. (1996). A new procedure for detecting crossing DIF.Psychometrika, 61, 647–677. · Zbl 0863.62082 · doi:10.1007/BF02294041
[39] Kok, F. (1988). Item bias and test multidimensionality. In R. Langeheine & J. Rost (Eds.),Latent trait and latent models (pp. 263–275). New York, NY: Plenum Press.
[40] Linn, R.L. (1993). The use of differential item functioning statistics: A discussion of current practice and future implications. In P.W. Holland & H. Wainer (Eds.),Differential item functioning (pp. 349–364). Hillsdale, NJ: Lawrence Erlbaum Associates.
[41] Lord, F.M. (1980)Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates, Hinsdale, NJ.
[42] McDonald, R.P. (1994). Testing for approximate dimensionality. In D. Laveault, B.D. Zumbo, M.E. Gessaroli, & M.W. Boss (Eds.),Modern theories of measurement: Problems and issues (pp. 63–86). Ottawa, Canada: University of Ottawa.
[43] Maris, E. (1995). Psychometric latent response models.Psychometrika, 60, 523–547. · Zbl 0862.62087 · doi:10.1007/BF02294327
[44] Mislevy, R.J. (1994). Evidence and inference in educational assessment.Psychometrika, 59, 439–483. · Zbl 0925.62507 · doi:10.1007/BF02294388
[45] Mislevy, R.J. Almond, R.G., Yan, D., & Steinberg, L.S. (1999). Bayes nets in educational assessment: Where do the numbers come from? In K.B. Laskey & H. Prade (Eds.),Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 437–446). San Francisco, CA: Morgan Kaufmann.
[46] Mislevy, R., Steinberg, L. & Almond, R. (in press). On the structure of educational assessments.Measurement: Interdisciplinary research and perspective. Hillsdale, NJ: Lawrence Erlbaum Associates.
[47] Mokken, R.J. (1971).A theory and procedure of scale analysis. The Hague: Mouton.
[48] Molenaar, I.W., & Sijtsma, K. (2000).User’s manual MSP5 for Windows: A program for Mokken Scale Analysis for Polytomous Items. Version 5.0 [Software manual]. Groningen: ProGAMMA.
[49] Nandakumar, R. (1993). Simultaneous DIF amplification and cancellation: Shealy-Stout’s test for DIF.Journal of Educational Measurement, 30, 293–311. · doi:10.1111/j.1745-3984.1993.tb00428.x
[50] Nandakumar, R., & Roussos, L. (in press). Evaluation of CATSIB procedure in pretest setting.Journal of Educational and Behavioral Statistics.
[51] Nandakumar, R., & Stout, W. (1993). Refinements of Stout’s procedure for assessing latent trait unidimensionality.Journal of Educational Statistics, 18, 41–68. · doi:10.2307/1165182
[52] O’Neill, K.A., & McPeek, W.M. (1993). Item and test characteristics that are associated with differential item functioning. In P.W. Holland & H. Wainer (Eds.),Differential item functioning (pp. 255–276). Hillsdale, NJ: Lawrence Erlbaum Associates.
[53] Pellegrino, J.W., Chudowski, N., & Glaser, R (Eds.). (2001).Knowing what students know: The science and design of educational assessment (chap. 4, pp. 111–172) Washington, DC: National Academy Press.
[54] Philipp, W. & Stout, W. (1975).Almost sure convergence principles for sums of dependent random variables (American Mathematical Society Memoir No. 161). Providence, RI: American Mathematical Society. · Zbl 0361.60007
[55] Ramsay, J.O. (2000). TESTGRAF:A program for the graphical analysis of multiple choice test and questionnaire data (TESTGRAF user’s guide for TESTGRAF98 software). Montreal, Quebec: Author. Versions available for Windows®, DOS, and Unix. The Windows® version was retrived November 11, 2002 from ftp://ego.psych.mcgill.ca/pub/ramsay/testgraf/TestGraf98.wpd
[56] Ramsey, P.A. (1993). Sensitivity review: the ETS experience as a case study. In P.W. Holland & H. Wainer (Eds.),Differential item functioning (pp. 367–388). Hillsdale, NJ: Lawrence Erlbaum Associates.
[57] Rossi, N., Wang, W. & Ramsay, J.O. (in press). Nonparametric item response function estimates with the EM algorithm.Journal of Educational and Behavioral Statistics.
[58] Roussos, L., & Stout, W. (1996a). DIF from the multidimensional perspective.Applied Psychological Measurement, 20, 335–371. · doi:10.1177/014662169602000404
[59] Roussos, L., & Stout, W. (1996b). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel Type 1 error performance.Journal of Education Measurement, 33, 215–230. · doi:10.1111/j.1745-3984.1996.tb00490.x
[60] Roussos, L.A., Stout, W.F., & Marden, J. (1998). Using new proximity measures with hierarchical cluster analysis to detect multidimensionality.Journal of Educational Measurement, 35, 1–30. · doi:10.1111/j.1745-3984.1998.tb00525.x
[61] Roussos, L.A., Schnipke, D.A., & Pashley, P.J. (1999). A generalized formula for the Mantel-Haenszel differential item functioning parameter.Journal of Educational and Behavioral Statistics, 24, 293–322. · doi:10.3102/10769986024003293
[62] Shealy, R.T. (1989).An item response theory-based statistical procedure for detecting concurrent internal bias in ability tests. Unpublished doctoral dissertation, Department of Statistics, University of Illinois, Urbana-Champaign.
[63] Shealy, R., & Stout, W. (1993a). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF.Psychometrika, 58, 159–194. · Zbl 0785.62101 · doi:10.1007/BF02294572
[64] Shealy, R., & Stout, W. (1993b). An item response theory model for test bias and differential test functioning. In P. Holland & H. Wainer (Eds.),Differential item functioning (pp. 197–240). Hillsdale, NJ: Lawrence Erlbaum. · Zbl 0785.62101
[65] Sijtsma, K. (1998). Methodology review: nonparametric IRT approaches to the analysis of dichotomous item scores.Applied Psychological Measurement, 22, 3–32. · doi:10.1177/01466216980221001
[66] Sternberg, R.J. (1985).Beyond IQ: A triarchic theory of human intelligence. New York, NY: Cambridge University Press.
[67] Stout, W. (1987). A nonparametric approach for assessing latent trait unidimensionality.Psychometrika, 52, 589–617. · Zbl 0718.62089 · doi:10.1007/BF02294821
[68] Stout, W. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation.Psychometrika, 55, 293–325. · Zbl 0746.62103 · doi:10.1007/BF02295289
[69] Stout, W., Froelich, A.G., & Gao, F. (2001). Using resampling to produce an improved DIMTEST procedure. In A. Boomsma, M.A.J. van Duijn, T.A.B. Snijders (Eds.),Essays on item response theory (pp. 357–376). New York, NY: Springer-Verlag. · Zbl 1021.62099
[70] Stout, W., Habing, B., Douglas, J., Kim, H.R., Roussos, L., & Zhang, J. (1996). Conditional covariance based nonparametric multidimensionality assessment.Applied Psychological Measurement, 20, 331–354. · doi:10.1177/014662169602000403
[71] Stout, W., Li, H., Nandakumar, R., & Bolt, D. (1997). MULTISIB–A procedure to investigate DIF when a test is intentionally multidimensional.Applied Psychological Measurement, 21, 195–213. · doi:10.1177/01466216970213001
[72] Suppes, P., & Zanotti, M. (1981). When are probabilistic explanations possible?Synthese, 48, 191–199. · Zbl 0476.03011 · doi:10.1007/BF01063886
[73] Tatsuoka, K. K. (1990). Toward an integration of item-response theory and cognitive error diagnosis. In N. Frederiksen, R. Glazer, A. Lesgold, & M.G. Shafto (Eds.),Diagnostic monitoring of skill and knowledge acquisition (pp. 453–488). Hillsdale, NJ: Lawrence Erlbaum Associates.
[74] Tatsuoka, K. K. (1995). Architecture of knowledge structures and cognitive diagnosis: A statistical pattern recognition and classification approach. In P. Nichols, S. Chipman, & R. Brennen (Eds.),Cognitively diagnostic assessment. Hillsdale, NJ: Earlbaum. 327–359.
[75] Thissen, D., & Wainer, H. (2001).Test scoring. Hillsdale, NJ: Lawrence Erlbaum Associates.
[76] Trachtenberg, F., & He, X. (2002). One-step joint maximum likelihood estimation for item response theory models. Submitted for publication.
[77] Tucker, L.R., Koopman, R.F., & Linn, R.L. (1969). Evaluation of factor analytic research procedures by means of simulated correlation matrices.Psychometrika, 34, 421–459. · doi:10.1007/BF02290601
[78] Wainer, H., & Braun, H.I. (1988).Test validity. Hillsdale, NJ: Lawrence Erlbaum Associates. Zhang, J., & Stout, W. (1999a). Conditional covariance structure of generalized compensatory multidimensional items.Psychometrika, 64, 129–152.
[79] Whitely, S.E. (1980). (See Embretson, 1980)
[80] Zhang, J., & Stout, W. (1999). The theoretical DETECT index of dimensionality and its application to approximate simple structure.Psychometrika, 64, 213–249. · Zbl 1291.62254 · doi:10.1007/BF02294536
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.