×

On Wald tests for differential item functioning detection. (English) Zbl 1427.62135

Summary: Wald-type tests are a common procedure for DIF detection among the IRT-based methods. However, the empirical type I error rate of these tests departs from the significance level. In this paper, two reasons that explain this discrepancy will be discussed and a new procedure will be proposed. The first reason is related to the equating coefficients used to convert the item parameters to a common scale, as they are treated as known constants whereas they are estimated. The second reason is related to the parameterization used to estimate the item parameters, which is different from the usual IRT parameterization. Since the item parameters in the usual IRT parameterization are obtained in a second step, the corresponding covariance matrix is approximated using the delta method. The proposal of this article is to account for the estimation of the equating coefficients treating them as random variables and to use the untransformed (i.e. not reparameterized) item parameters in the computation of the test statistic. A simulation study is presented to compare the performance of this new proposal with the currently used procedure. Results show that the new proposal gives type I error rates closer to the significance level.

MSC:

62P15 Applications of statistics to psychology
62F03 Parametric hypothesis testing

Software:

R; difR; ltm; equateIRT
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bartholomew DJ, Knott M, Moustaki I (2011) Latent variable models and factor analysis: a unified approach. Wiley, West Sussex · Zbl 1266.62040 · doi:10.1002/9781119970583
[2] Battauz M (2015) equateIRT: an R package for IRT test equating. J. Stat. Softw. 68(7):1-22. https://doi.org/10.18637/jss.v068.i07 · doi:10.18637/jss.v068.i07
[3] Bock RD, Aitkin M (1981) Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika 46(4):443-459. https://doi.org/10.1007/BF02293801 · doi:10.1007/BF02293801
[4] Candell GL, Drasgow F (1988) An iterative procedure for linking metrics and assessing item bias in item response theory. Appl. Psychol. Meas. 12(3):253-260. https://doi.org/10.1177/014662168801200304 · doi:10.1177/014662168801200304
[5] Casella G, Berger RL (2002) Statistical inference. Duxbury, Pacific Grove · Zbl 0699.62001
[6] Gregory AW, Veall MR (1985) Formulating Wald tests of nonlinear restrictions. Econometrica 53(6):1465-1468. https://doi.org/10.2307/1913221 · doi:10.2307/1913221
[7] Kim SH, Cohen AS, Kim HO (1994) An investigation of Lord’s procedure for the detection of differential item functioning. Appl. Psychol. Meas. 18(3):217-228. https://doi.org/10.1177/014662169401800303 · doi:10.1177/014662169401800303
[8] Kim SH, Cohen AS, Park TH (1995) Detection of differential item functioning in multiple groups. J. Educ. Meas. 32(3):261-276. https://doi.org/10.1111/j.1745-3984.1995.tb00466.x · doi:10.1111/j.1745-3984.1995.tb00466.x
[9] Kolen M, Brennan R (2014) Test equating, scaling, and linking: methods and practices, 3rd edn. Springer, New York · Zbl 1284.62013 · doi:10.1007/978-1-4939-0317-7
[10] Lord FM (1980) Applications of item response theory to practical testing problems. Erlbaum, Hillsdale, NJ
[11] Magis D, Béland S, Tuerlinckx F, De Boeck P (2010) A general framework and an R package for the detection of dichotomous differential item functioning. Behav. Res. Methods 42(3):847-862. https://doi.org/10.3758/BRM.42.3.847 · doi:10.3758/BRM.42.3.847
[12] Mislevy RJ (1986) Bayes modal estimation in item response models. Psychometrika 51(2):177-195. https://doi.org/10.1007/BF02293979 · Zbl 0596.62114 · doi:10.1007/BF02293979
[13] Ogasawara H (2000) Asymptotic standard errors of IRT equating coefficients using moments. Econ. Rev. (Otaru Univ. Commer.) 51(1):1-23
[14] Ogasawara H (2001) Standard errors of item response theory equating/linking by response function methods. Appl. Psychol. Meas. 25(1):53-67. https://doi.org/10.1177/01466216010251004 · doi:10.1177/01466216010251004
[15] Patz RJ, Junker BW (1999) Applications and extensions of MCMC in IRT: multiple item types, missing data, and rated responses. J. Educ. Behav. Stat. 24(4):342-366. https://doi.org/10.3102/10769986024004342 · doi:10.3102/10769986024004342
[16] R Development Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org/, ISBN 3-900051-07-0
[17] Reise SP, Revicki DA (2014) Handbook of item response theory modeling: applications to typical performance assessment. Routledge, New York · doi:10.4324/9781315736013
[18] Rizopoulos D (2006) ltm: an R package for latent variable modeling and item response theory analyses. J. Stat. Softw. 17(5):1-25. https://doi.org/10.18637/jss.v017.i05 · doi:10.18637/jss.v017.i05
[19] van der Linden W (2016) Handbook of item response theory, volume one: models. Chapman & Hall, Boca Raton · Zbl 1345.62001 · doi:10.1201/9781315374512
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.