×

Hybrid schemes for exact conditional inference in discrete exponential families. (English) Zbl 1403.62106

Summary: Exact conditional goodness-of-fit tests for discrete exponential family models can be conducted via Monte Carlo estimation of \(p\) values by sampling from the conditional distribution of multiway contingency tables. The two most popular methods for such sampling are Markov chain Monte Carlo (MCMC) and sequential importance sampling (SIS). In this work we consider various ways to hybridize the two schemes and propose one standout strategy as a good general purpose method for conducting inference. The proposed method runs many parallel chains initialized at SIS samples across the fiber. When a Markov basis is unavailable, the proposed scheme uses a lattice basis with intermittent SIS proposals to guarantee irreducibility and asymptotic unbiasedness. The scheme alleviates many of the challenges faced by the MCMC and SIS schemes individually while largely retaining their strengths. It also provides diagnostics that guide and lend credibility to the procedure. Simulations demonstrate the viability of the approach.

MSC:

62H17 Contingency tables
62L12 Sequential estimation
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Agresti, A, A survey of exact inference for contingency tables, Statistical Science, 7, 131-153, (1992) · Zbl 0955.62587 · doi:10.1214/ss/1177011454
[2] Agresti, A. (2002). Categorical data analysis (2nd ed.). Hoboken: Wiley. · Zbl 1018.62002 · doi:10.1002/0471249688
[3] Aoki, S., Hara, H., Takemura, A. (2012). Markov bases in algebraic statistics (Vol. 199). New York: Springer. · Zbl 1304.62015
[4] Baldoni, V., Berline, N., De Loera, J., Dutra, B., Koppe, M., Moreinis, S., Pinto, G., Vergne, M., Wu, J. (2014). A user’s guide for LattE integral v1.7.2. URL: http://www.math.ucdavis.edu/ latte/. · Zbl 0361.62037
[5] Bélisle, C. J., Romeijn, H. E., Smith, R. L. (1993). Hit-and-run algorithms for generating multivariate distributions. Mathematics of Operations Research, 18(2), 255-266. · Zbl 0771.60052
[6] Berkelaar, M., Eikland, K., Notebaert, P. (2015). lpSolve: Interface to Lp_solve v.5.5 to solve linear/integer programs. http://CRAN.R-project.org/package=lpSolve, R package version 5.6.11.
[7] Bishop, Y. M. M., Fienberg, S. E., Holland, P. W. (1975). Discrete multivariate analysis: Theory and practice. Cambridge: The MIT Press. · Zbl 0332.62039
[8] Booth, J. G., Butler, R. W. (1999). An importance sampling algorithm for exact conditional tests in log-linear models. Biometrika, 86(2), 321-332. · Zbl 0931.62057
[9] Boyett, JM, Algorithm as 144: random r\(× \) c tables with given row and column totals, Journal of the Royal Statistical Society Series C-Applied Statistics, 28, 329-332, (1979) · Zbl 0431.62033
[10] Brooks, S. P., Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, \(7\)(4), 434-455. · Zbl 0467.62050
[11] Caffo, B. (2013). exactLoglinTest: Monte Carlo exact tests for log-linear models. http://CRAN.R-project.org/package=exactLoglinTest, R package version 1.4.2.
[12] Caffo, B. S., Booth, J. G. (2001). A Markov chain Monte Carlo algorithm for approximating exact conditional probabilities. Journal of Computational and Graphical Statistics, 10(4), 730-745.
[13] Chen, Y., Diaconis, P., Holmes, S. P., Liu, J. S. (2005a). Sequential monte carlo methods for statistical analysis of tables. Journal of the American Statistical Association, 100(469), 109-120. · Zbl 1117.62310
[14] Chen, Y., Dinwoodie, I., Dobra, A., Huber, M. (2005b). Lattice points, contingency tables, and sampling. Contemporary Mathematics, 374, 65-78. · Zbl 1073.62051
[15] Chen, Y., Dinwoodie, I., Sullivant, S. (2006). Sequential importance sampling for multiway tables. The Annals of Statistics, 34(1), 523-545. · Zbl 1091.62051
[16] Clarkson, D. B., Fan, Y., Joe, H. (1993). A remark on algorithm 643: Fexact: An algorithm for performing fisher’s exact test in RXC contingency tables. ACM Transactions on Mathematical Software, 19(4), 484-488. · Zbl 0893.65075
[17] Cox, D., Little, J., O’Shea, D. (1997). Ideals, varieties, and algorithms (2nd ed.). New York: Springer.
[18] De Loera, J., Onn, S. (2005). Markov bases of three-way tables are arbitrarily complicated. Journal of Symbolic Computation, 41(2), 173-181. · Zbl 1120.62043
[19] Diaconis, P., Sturmfels, B. (1998). Algebraic algorithms for sampling from conditional distributions. The Annals of Statistics, 26(1), 363-397. · Zbl 0952.62088
[20] Dobra, A, Markov bases for decomposable graphical models, Bernoulli, 9, 1093-1108, (2003) · Zbl 1053.62072 · doi:10.3150/bj/1072215202
[21] Dobra, A., Sullivant, S. (2004). A divide-and-conquer algorithm for generating Markov bases of multi-way tables. Computational Statistics, 19, 347-366. · Zbl 1063.62085
[22] Drton, M., Sturmfels, B., Sullivant, S. (2009). Lectures on algebraic statistics. Boston: Birkhauser Basel. · Zbl 1166.13001
[23] Eddelbuettel, D. (2013). Seamless R and C++ integration with Rcpp. New York: Springer. · Zbl 1283.62001 · doi:10.1007/978-1-4614-6868-4
[24] Eddelbuettel, D., François, R. (2011). Rcpp: Seamless R and C++ integration. Journal of Statistical Software, 40(8), 1-18.
[25] Fisher, R. A. (1922a). On the interpretation of \(χ \)2 from contingency tables, and the calculation of p. Journal of the Royal Statistical Society, 85(1), 87-94.
[26] Fisher, R. A. (1922b). On the mathematical foundations of theoretical statistics. Philosophical transactions of the royal society of London series A—Containing papers of a mathematical or physical character (pp. 309-368). · JFM 48.1280.02
[27] Fisher, R. A. (1934). Statistical methods for research workers (5th ed.). Edinburgh: Oliver & Boyd. · JFM 60.1162.01
[28] Gelman, A., Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, \(7\)(4), 457-472. · Zbl 1386.65060
[29] Halton, JH, A rigorous derivation of the exact contingency formula, Mathematical Proceedings of the Cambridge Philosophical Society, 65, 527-530, (1969) · Zbl 0183.48508 · doi:10.1017/S0305004100044546
[30] Hara, H., Takemura, A., Yoshida, R. (2010). On connectivity of fibers with positive marginals in multiple logistic regression. Journal of Multivariate Analysis, 101(4), 909-925. · Zbl 1181.62108
[31] Hara, H., Aoki, S., Takemura, A. (2012). Running Markov chain without Markov basis. In Proceedings of the second CREST-SBM international conference, Harmony of Gröbner bases and the modern industrial society, Singapore (pp. 19-34). · Zbl 1341.62150
[32] Kahle, D., Garcia-Puente, L., Yoshida, R. (2015). algstat: Algebraic statistics in R. http://CRAN.R-project.org/package=algstat, R package version 0.1.0.
[33] Kahle, T., Rauh, J. (2011). The Markov bases database. http://www.markov-bases.de.
[34] Lange, K. (2010). Numerical analysis for statisticians (2nd ed.). New York: Springer. · Zbl 1258.62003 · doi:10.1007/978-1-4419-5945-4
[35] Lehmann, E. L., Romano, J. P. (2005). Testing statistical hypotheses (3rd ed.). New York: Springer. · Zbl 1076.62018
[36] Liu, J. S. (2008). Monte Carlo strategies in scientific computing. New York: Springer. · Zbl 1132.65003
[37] Lunn, D., Jackson, C., Best, N., Thomas, A., Spiegelhalter, D. (2012). The BUGS book: A practical introduction to Bayesian analysis. Boca Raton: CRC Press. · Zbl 1281.62009
[38] Mehta, C. R., Patel, N. R. (1986). Algorithm 643: Fexact: A Fortran subroutine for fisher’s exact test on unordered r\(× \) c contingency tables. ACM Transactions on Mathematical Software, 12(2), 154-161. · Zbl 0623.62046
[39] Patefield, WM, Algorithm as 159: an efficient method of generating random r\(× \) c tables with given row and column totals, Journal of the Royal Statistical Society Series C-Applied Statistics, 30, 91-97, (1981) · Zbl 0467.62050
[40] Pearson, K, On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling, Philosophical Magazine Series 5, 50, 157-175, (1900) · JFM 31.0238.04 · doi:10.1080/14786440009463897
[41] R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/.
[42] Read, T. R., Cressie, N. (1988). Goodness-of-fit statistics for discrete multivariate data. New York: Springer. · Zbl 0663.62065
[43] Schrijver, A. (1986). Theory of linear and integer programming. Chichester: Wiley. · Zbl 0665.90063
[44] Sheskin, D. J. (2007). Handbook of parametric and nonparametric statistical procedures (4th ed.). Boca Raton: Chapman and Hall/CRC Press. · Zbl 1118.62001
[45] Snee, RD, Graphical display of two-way contingency tables, The American Statistician, 28, 9-12, (1974) · Zbl 0361.62037
[46] Snijders, T, Enumeration and simulation methods for 0-1 matrices with given marginals, Psychometrika, 56, 397-417, (1991) · Zbl 0850.05002 · doi:10.1007/BF02294482
[47] Sturmfels, B. (1996). Gröbner bases and convex polytopes (Vol. 8). Providence: American Mathematical Society. · Zbl 0856.13020
[48] 4ti2 team (2008). 4ti2—A software package for algebraic, geometric and combinatorial problems on linear spaces. http://www.4ti2.de. · Zbl 0955.62587
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.