×

Estimate-based goodness-of-fit test for large sparse multinomial distributions. (English) Zbl 1452.62395

Summary: The Pearson’s chi-squared statistic \((X^{2})\) does not in general follow a chi-square distribution when it is used for goodness-of-fit testing for a multinomial distribution based on sparse contingency table data. We explore properties of D. Zelterman [J. Am. Stat. Assoc. 82, 624–629 (1987; Zbl 0641.62037)] \(D^{2}\) statistic and compare them with those of \(X^{2}\) and compare the power of goodness-of-fit test among the tests using \(D^{2}, X^{2}\), and the statistic \((L_r)\) which is proposed by A. Maydeu-Olivares and H. Joe [J. Am. Stat. Assoc. 100, No. 471, 1009–1020 (2005; Zbl 1117.62398)] when the given contingency table is very sparse. We show that the variance of \(D^{2}\) is not larger than the variance of \(X^{2}\) under null hypotheses where all the cell probabilities are positive, that the distribution of \(D^{2}\) becomes more skewed as the multinomial distribution becomes more asymmetric and sparse, and that, as for the \(L_r\) statistic, the power of the goodness-of-fit testing depends on the models which are selected for the testing. A simulation experiment strongly recommends to use both \(D^{2}\) and \(L_r\) for goodness-of-fit testing with large sparse contingency table data.

MSC:

62H17 Contingency tables
62G10 Nonparametric hypothesis testing
62-08 Computational methods for problems pertaining to statistics
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Agresti, A., Categorical Data Analysis (1990), John Wiley & Sons: John Wiley & Sons New York · Zbl 0716.62001
[2] Barnard, G. A., Discussion on paper by M.S. Bartlett, J. Roy. Statist. Soc. Ser. B, 25, 294 (1963)
[3] Bartlett, M. S., Contingency table interactions, Suppl. J. R. Statist. Soc., 2, 248-252 (1935)
[4] Bartholomew, D. J.; Leung, S. O., A goodness of fit test for sparse \(2^p\) contingency tables, British J. Math. Statist. Psych., 55, 1-15 (2002)
[5] Berry, K. J.; Mielke, P. W., Monte Carlo comparisons of the asymptotic Chi-square and likelihood-ratio tests with the nonasymptotic Chi-square test for sparse \(r \times c\) tables, Psychological Bulletin, 103, 2, 256-264 (1988)
[6] Besag, J.; Clifford, P., Generalized Monte Carlo significace tests, Biometrika, 76, 4, 633-642 (1989) · Zbl 0679.62033
[7] Bishop, Y. M.M.; Fienberg, S. E.; Holland, P. W., Discrete Multivariate Analysis: Theory and Practice (1975), MIT Press: MIT Press Cambridge · Zbl 0332.62039
[8] Bunea, F.; Besag, J., MCMC in \(I \times J \times K\) contingency tables, Fields Inst. Commun., 26, 25-36 (2000) · Zbl 0965.65013
[9] Cochran, W.G. The \(\chi^2\) test of goodness-of-fit. Ann. Math. Statist. 23, 315-345; Cochran, W.G. The \(\chi^2\) test of goodness-of-fit. Ann. Math. Statist. 23, 315-345 · Zbl 0047.13105
[10] Cressie, N.; Read, T. R.C., Multinomial goodness-of-fit tests, J. Roy. Statist. Soc. Ser. B, 46, 3, 440-464 (1984) · Zbl 0571.62017
[11] Diaconis, P.; Sturmfels, B., Algebraic algorithms for sampling from conditional distributions, Ann. Statist., 26, 363-398 (1998) · Zbl 0952.62088
[12] Fisher, R. A., The logic of inductive inference (with discussion), J. Roy. Statist. Soc., 98, 39-54 (1935) · JFM 61.1308.06
[13] Fienberg, S. E., The use of Chi-squared statistics for categorical data problems, J. Roy. Statist. Soc. Ser. B, 41, 54-64 (1979) · Zbl 0427.62013
[14] Forster, J. J.; McDonald, J. W.; Smith, P. W.F., Markov chain Monte Carlo exact interence for binomial and multinomial logistic regression models, Statist. Comput., 13, 160-177 (2003)
[15] Horn, S. D., Goodness-of-fit tests for discrete data: A review and an application to a health impairment scale, Biometrics, 33, 237-248 (1977) · Zbl 0344.62046
[16] Kim, S.H., Choi, H., Lee, S., 2007. Estimate-based goodness-of-fit test for large sparse multinomial distributions, Applied Mathematics Research Report 07-08, Department of Mathematical Sciences, KAIST, Daejeon, S. Korea; Kim, S.H., Choi, H., Lee, S., 2007. Estimate-based goodness-of-fit test for large sparse multinomial distributions, Applied Mathematics Research Report 07-08, Department of Mathematical Sciences, KAIST, Daejeon, S. Korea
[17] Koehler, K. J., Goodness-of-fit tests for log-linear models in sparse contingency tables, J. Amer. Statist. Assoc., 81, 394, 483-493 (1986) · Zbl 0625.62033
[18] Koehler, K. J.; Larntz, K., An empirical investigation of goodness-of-fit statistics for sparse multinomials, J. Amer. Statist. Assoc., 75, 370, 336-344 (1980) · Zbl 0442.62025
[19] Kreuzer, M.; Robbiano, L., Computational Commutative Algebra I (2000), Springer: Springer New York · Zbl 0956.13008
[20] Lancaster, H. O., The Chi-squared Distribution (1969), Wiley: Wiley New York · Zbl 0193.17802
[21] Maydeu-Olivares, A.; Joe, H., Limited- and full-information estimation and goodness-of-fit testing in \(2^n\) contingency tables: A unified framework, J. Amer. Statist. Assoc., 100, 471, 1009-1020 (2005) · Zbl 1117.62398
[22] Maydeu-Olivares, A.; Joe, H., Limited information goodness-of-fit testing in multidimensional contingency tables, Psychometrika, 71, 4, 713-732 (2006) · Zbl 1306.62477
[23] Mielke, P. W.; Berry, K. J., Non-asymptotic inferences based on the chi-square statistic for \(r\) by \(c\) contingency tables, J. Statist. Plann. Inference, 12, 41-45 (1985)
[24] Moore, D.S., Recent developments in chi-square tests for goodness-of-fit. Mimeo. (Dept. of Statistics, Purdue University, Lafayette, IN, 1976); Moore, D.S., Recent developments in chi-square tests for goodness-of-fit. Mimeo. (Dept. of Statistics, Purdue University, Lafayette, IN, 1976)
[25] Pearson, K., On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling, Philosophy Magazine Series, 50, 5, 157-172 (1900) · JFM 31.0238.04
[26] Pristone, G.; Riccomagno, E.; Wynn, H. P., Algebraic Statistics: Computational Commutative Algebra in Statistics (2002), Chapman&Hall/CRC Press: Chapman&Hall/CRC Press Boca Raton, FL
[27] Rapallo, F., Algebraic Markov bases and MCMC for two-way contingency tables, Scand. J. Statist., 30, 385-397 (2003) · Zbl 1055.65018
[28] Read, T. R.C., Small-sample comparisons for the power divergence goodness-of-fit statistics, J. Amer. Statist. Assoc., 79, 388, 929-935 (1984)
[29] Simonoff, J. S., An improved goodness-of-fit statistics for sparse multinomials, J. Amer. Statist. Assoc., 80, 391, 671-677 (1985)
[30] Smith, P. W.F.; Forster, J. J.; McDonald, J. W., Monte Carlo exact tests for square contingency tables, J. Roy. Statist. Soc. Ser. A, 159, 2, 309-321 (1996)
[31] Watson, G. S., Some recent results in chi-square goodness-of-fit tests, Biometrics, 15, 440-468 (1959) · Zbl 0095.33503
[32] Zelterman, D., Goodness-of-fit tests for large sparse multinomial distributions, J. Amer. Statist. Assoc., 82, 398, 624-629 (1987) · Zbl 0641.62037
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.