Diaconis, Persi; Sturmfels, Bernd Algebraic algorithms for sampling from conditional distributions. (English) Zbl 0952.62088 Ann. Stat. 26, No. 1, 363-397 (1998). This paper aims to describe the construction of new Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic. Section 2 introduces the necessary stochastic and statistical background. Section 3 contains the main contribution of the paper: it shows how to compute a Markov basis using tools from computational algebra. More precisely, to find a Markov basis is proved to be equivalent to finding a set of generators of an ideal in a polynomial ring, using Gröbner bases. To represent this ideal in a way suitable for computation is illustrated by MATHEMATICA and MAPLE programs. The next sections of the paper include detailed treatments of the proposed technique for some important special cases: contingency tables (Section 4), logistic regression (in Section 5), and the spectral analysis of permutation data (Section 6). Reviewer: Neculai Curteanu (Iaşi) Cited in 13 ReviewsCited in 196 Documents MSC: 62M99 Inference from stochastic processes 65C60 Computational problems in statistics (MSC2010) 13P10 Gröbner bases; other bases for ideals and modules (e.g., Janet and border bases) 62F25 Parametric tolerance and confidence regions 62H17 Contingency tables Keywords:conditional distribution; computational algebra methods; Markov chain algorithms for sampling; Markov basis; contingency tables; logistic regression; spectral analysis of permutation data Software:Maple; Mathematica PDF BibTeX XML Cite \textit{P. Diaconis} and \textit{B. Sturmfels}, Ann. Stat. 26, No. 1, 363--397 (1998; Zbl 0952.62088) Full Text: DOI Link References: [1] AGRESTI, A. 1990. Categorical Data Analy sis. Wiley, New York. Z. [2] AGRESTI, A. 1992. A survey of exact inference for contingency tables. Statist. Sci. 7 131 177. Z. · Zbl 0955.62587 [3] ALDOUS, D. 1987. On the Markov chain stimulation method for uniform combinatorial distributions and simulated annealing. Probab. Engrg. Infom. Sci. 1 33 46. Z. · Zbl 1133.60327 [4] ANDREWS, D. and HERZBERG, A. 1985. Data. Springer, New York. Z. · Zbl 0567.62002 [5] BAGLIVIO, J., OLIVIER, D. and PAGANO, M. 1988. Methods for the analysis of contingency tables with large and small cell counts. J. Amer. Statist. Assoc. 83 1006 1013. Z. JSTOR: [6] BAGLIVIO, J., OLIVIER, D. and PAGANO, M. 1992. Methods for exact goodness-of-fit tests. J. Amer. Statist. Assoc. 87 464 469.Z. JSTOR: · Zbl 0850.62392 [7] BAGLIVIO, J., OLIVIER, D. and PAGANO, M. 1993. Analy sis of discrete data: rerandomization methods and complexity. Technical report, Dept. Mathematics, Boston College. Z. [8] BAy ER, D. and STILLMAN, M. 1989. MACAULAY: a computer algebra sy stem for algebraic geometry. Available via anony mous ftp from zariski.harvard.edu. Z. [9] BELISLE, C., ROMEIJN, H. and SMITH, R. 1993. Hit and run algorithms for generating multivariate distributions. Math. Oper. Res. 18 255 266. Z. JSTOR: · Zbl 0771.60052 [10] BESAG, J. and CLIFFORD, P. 1989. Generalized Monte Carlo significance tests. Biometrika 76 633 642. Z. JSTOR: · Zbl 0679.62033 [11] BIRCH, B. W. 1963. Maximum likelihood in three-way contingency tables. J. Roy. Statist. Soc. Ser. B 25 220 233. Z. JSTOR: · Zbl 0121.14001 [12] BISHOP, Y., FINEBERG, S. and HOLLAND, P. 1975. Discrete Multivariate Analy sis. MIT Press. Z. [13] BJORNER, A., LAS VERGNAS, M., STURMFELS, B., WHITE, N. and ZIEGLER, G. 1993. Oriented \" Matroids. Cambridge Univ. Press. Z. · Zbl 0773.52001 [14] BOKOWSKI, J. and RICHTER-GEBERT, J. 1990. On the finding of final poly nomials. European J. Combin. 11 21 34. Z. · Zbl 0693.05021 [15] BOKOWSKI, J. and RICHTER-GEBERT, J. 1991. On the classification of non-realizable oriented matroids. II. Preprint, T. H. Darmstadt. Z. [16] BOKENHOLT, U. 1993. Applications of Thurstonian models to ranking data. Probability Models änd Statistical Analy sis for Ranking Data. Lecture Notes in Statist. 80 157 172. Springer, New York. Z. · Zbl 0764.62097 [17] BROWN, L. D. 1990. An ancillarity paradox which appears in multiple linear regression. Ann. Statist. 18 471 538. Z. · Zbl 0721.62011 [18] CHRISTENSEN, R. 1990. Log-Linear Models. Springer, New York. Z. · Zbl 0711.62050 [19] CHUNG, F., GRAHAM, R. and YAU, S. T. 1996. On sampling with Markov chains. Random Structures Algorithms 9 55 77. Z. · Zbl 0861.60079 [20] COHEN, A., KEMPERMAN, J. and SACKROWITZ, H. 1994. Unbiased testing in exponential family regression. Ann. Statist. 22 1931 1946. Z. · Zbl 0824.62011 [21] CONTI, P. and TRAVERSO, C. 1991. Buchberger algorithm and integer programming. Proceedings AAECC-9. Lecture Notes in Comp. Sci. 539 130 139. Springer, New York. Z. · Zbl 0771.13014 [22] COX, D. 1958. Some problems connected with statistical inference. Ann. Math. Statist. 29 357 372. Z. · Zbl 0088.11702 [23] COX, D. 1988. Some aspects of conditional and asy mptotic inference. Sankhy a Ser. A 50 314 337. · Zbl 0676.62031 [24] COX, D., LITTLE, J. and O’SHEA, D. 1992. Ideals, Varieties, and Algorithms. Springer, New York. Z. [25] CROON, M. 1989. Latent class models for the analysis of rankings. In New Developments in Z. Psy chological Choice Modeling G. De Solte, H. Feger and K. C. Klauer, eds. 99 121. North-Holland, Amsterdam. Z. [26] DARROCH, J., LAURITZEN, S. and SPEED T. 1980. Markov fields a log-linear interaction models for contingency tables. Ann. Statist. 8 522 539. Z. · Zbl 0444.62064 [27] DIACONIS, P. 1988. Group Representations in Probability and Statistics. IMS, Hay ward, CA. Z. · Zbl 0695.60012 [28] DIACONIS, P. 1989. A generalization of spectral analysis with application to ranked data. Ann. Statist. 17 949 979. Z. · Zbl 0688.62005 [29] DIACONIS, P. and EFRON, B. 1985. Testing for independence in a two-way table: new interpretations for the chi-square statistic. Ann. Statist. 13 845 905. Z. · Zbl 0593.62040 [30] DIACONIS, P. and EFRON, B. 1987. Probabilistic-geometric theorems arising from the analysis of contingency tables. In Contributions to the Theory and Application of Statistics: A Z. Volume in Honor of Herbert Solomon A. Gelfand, ed.. Academic Press, New York. Z. · Zbl 0709.60566 [31] DIACONIS, P., EISENBUD, D. and HOLMES, S. 1997. Speeding up algebraic random walks. Dept. Mathematics, Brandeis Univ. Preprint. Z. [32] DIACONIS, P., EISENBUD, D. and STURMFELS, B. 1996. Lattice walks and primary decompositions. Z. In Proceedings of the Rota Fest B. Sagan, ed.. To appear. Z. · Zbl 0962.05010 [33] DIACONIS, P. and FREEDMAN, D. 1987. A dozen deFinetti-sty le results in search of a theory. Ann. Inst. H. Poincare 23 397 423. Ź. · Zbl 0619.60039 [34] DIACONIS, P. and GANGOLLI, A. 1995. Rectangular array s with fixed margins. In Discrete Z. Probability and Algorithms D. Aldous, et al., eds.. 15 41. Springer, New York. Z. · Zbl 0839.05005 [35] DIACONIS, P., GRAHAM, R. and STURMFELS, B. 1996. Primitive partition identities. Combinatorics. Paul Erdos Is Eighty 2 173 192. Z. · Zbl 0852.05014 [36] DIACONIS, P., HOLMES, S. and NEALE, R. 1997. A nonreversible Markov chain sampling method. Technical report, Biometry, Cornell Univ. Z. [37] DIACONIS, P. and RABINOWITZ, A. 1997. Conditional inference for logistic regression. Technical report, Stanford Univ. Z. [38] DIACONIS, P. and SALOFF-COSTE, L. 1995a. Random walk on contingency tables with fixed row and column sums. Dept. Mathematics, Harvard Univ., Preprint. Z. · Zbl 1006.60095 [39] DIACONIS, P. and SALOFF-COSTE, L. 1995b. What do we know about the Metropolis algorithm. Technical report, Dept. Mathematics, Harvard Univ. Z. · Zbl 1006.60095 [40] DIACONIS, P. and SALOFF-COSTE, L. 1996a. Nash inequalities for finite Markov chains. J. Theoret. Probab. 9 459 510. Z. · Zbl 1006.60095 [41] DIACONIS, P. and SALOFF-COSTE, L. 1996b. Logarithmic Sobolev inequalities for finite Markov chains. Ann. Appl. Probab. 6 695 750. Z. · Zbl 1006.60095 [42] DIACONIS, P. and STROOCK, D. 1991. Geometric bounds for eigenvalues of Markov chains. Ann. Appl. Probab. 1 36 61. Z. · Zbl 0731.60061 [43] Dy ER, R., KANNAN, R. and MOUNT, J. 1995. Sampling contingency tables. Random Structures Algorithms. To appear. Z. [44] EFRON, B. and HINKLEY, D. 1978. Assessing the accuracy of the MLE: observed versus expected Z. Fisher information with discussion. Biometrika 65 457 487. Z. JSTOR: · Zbl 0401.62002 [45] FARRELL, R. 1971. The necessity that a conditional procedure be almost every where admissible. Z. Wahrsch. Verw. Gebiete 19 57 66. Z. Z. · Zbl 0203.51403 [46] FISHER, R. 1925. Statistical Methods for Research Workers, 1st ed. 14th ed. 1970. Oliver and Boy d, Edinburgh. Z. · JFM 51.0414.08 [47] FISHER, R. 1950. The significance of deviations from expectation in a Poisson series. Biometrics 6 17 24. Z. [48] FISHER, R., THORNTON, H. and MACKENZIE, N. 1922. The accuracy of the plating method of estimating the density of bacterial populations. Ann. Appl. Biology 9 325 359. Z. [49] FULTON, W. 1993. Introduction to Toric Varieties. Princeton Univ. Press. Z. · Zbl 0813.14039 [50] GANGOLLI, A. 1991. Convergence bounds for Markov chains and applications to sampling. Ph.D. thesis, Dept. Computer Science, Stanford Univ. [51] GLONEK, G. 1987. Some aspects of log linear models. Ph.D. thesis, School of Math. Sciences, Flinders Univ. South Australia. Z. [52] GOODMAN, L. 1970. The multivariate analysis of qualitative data: interactions among multiple classifications. J. Amer. Statist. Assoc. 65 226 256. Z. [53] GUO, S. and THOMPSON, E. 1992. Performing the exact test for Hardy Weinberg proportion for multiple alleles. Biometrics 48 361 372. Z. · Zbl 0825.62835 [54] HABERMAN, S. 1978. Analy sis of Qualitative Data 1, 2. Academic Press, Orlando, FL. Z. [55] HAMMERSLY, J. and HANDSCOMB, D. 1964. Monte Carlo Methods. Wiley, New York. Z. · Zbl 0121.35503 [56] HARRIS, J. 1992. Algebraic Geometry: A First Course. Springer, New York. Z. · Zbl 0779.14001 [57] HERNEK, D. 1997. Random generation and counting of rectangular array s with fixed margins. Dept. Mathematics, Preprint, UCLA. Z. [58] HOLMES, R. and JONES, L. 1996. On uniform generation of two-way tables with fixed margins and the conditional volume test of Diaconis and Efron. Ann. Statist. 24 64 68. Z. · Zbl 0888.62060 [59] HOLMES, S. 1995. Examples for Stein’s method. Preprint, Dept. Statistics, Stanford Univ. Z. [60] JENSEN, J. 1991. Uniform saddlepoint approximations and log-convex densities. J. Roy. Statist. Soc. Ser. B 157 172. Z. Z. JSTOR: · Zbl 0800.62081 [61] KIEFER, J. 1977. Conditional confidence statements and confidence estimators with discussion. J. Amer. Statist. Assoc. 72 789 827. Z. JSTOR: · Zbl 0375.62023 [62] KOLASSA, J. and TANNER, M. 1994. Approximate conditional inference in exponential families via the Gibbs sample. J. Amer. Statist. Assoc. 89 697 702. Z. JSTOR: · Zbl 0803.62013 [63] KOLASSA, J. and TANNER, M. 1996. Approximate Monte Carlo conditional inference. Dept. Statistics, Northwestern Univ. Preprint. Z. [64] KONG, F. 1993. Edgeworth expansions for conditional distributions in logistic regression models. Technical report, Dept. Statistics, Columbia Univ. Z. [65] KONG, F. and LEVIN, B. 1993. Edgeworth expansions for the sum of discrete random vectors and their applications in generalized linear models. Technical report, Dept. Statistics, Columbia Univ. Z. [66] LANGE, K. and LAZZERONI, L. 1997. Markov chains for Monte Carlo tests of genetic equilibrium in multidimensional contingency tables. Ann. Statist. To appear. Z. · Zbl 0871.62094 [67] LARNTZ, K. 1978. Small-sample comparison of exact levels for chi-squared goodness-of-fit statistics. J. Amer. Statist. Assoc. 73 253 263. Z. · Zbl 0414.62022 [68] LAURITZEN, S. 1996. Graphical Models. Oxford Univ. Press. Z. · Zbl 0907.62001 [69] LEHMANN, E. 1986. Testing Statistical Hy potheses, 2nd ed. Wiley, New York. Z. [70] LEVIN, B. 1992. On calculations involving the maximum cell frequency. Comm. Statist. Z. [71] LEVIN, B. 1992. Tests of odds ratio homogeneity with improved power in sparse fourfold tables. Comm. Statist. Theory Methods 21 1469 1500. Z. · Zbl 0775.62007 [72] MARDEN, J. 1995. Analy zing and Modeling Rank Data. Chapman and Hall, London. Z. · Zbl 0853.62006 [73] MAy R, E. and MEy ER, A. 1982. The complexity of the word problem for commutative semigroups and poly nomial ideals. Adv. in Math. 46 305 329. Z. · Zbl 0506.03007 [74] MCCULLOGH, P. 1985. On the asy mptotic distribution of Pearson’s statistic in linear exponential family models. International Statistical Review 53 61 67. Z. JSTOR: · Zbl 0575.62020 [75] MCCULLOUGH, P. 1986. The conditional distribution of goodness-to-fit statistics for discrete data. J. Amer. Statist. Assoc. 81 104 107. Z. [76] MEHTA, C. and PATEL, N. 1983. A network algorithm for performing Fisher’s exact test in r c contingency tables. J. Amer. Statist. Assoc. 78 427 434. Z. JSTOR: · Zbl 0545.62039 [77] NEy MAN, J. 1937. Outline of a theory of statistical estimation based on the classical theory of probability. Philos. Trans. 236 333 380. Z. · Zbl 0017.12403 [78] ODOROFF, C. 1970. A comparison of minimum logit chi-square estimation and maximum likelihood estimation in 2 2 2 and 3 2 2 contingency tables: tests for interaction. J. Amer. Statist. Assoc. 65 1617 1631. Z. · Zbl 0224.62028 [79] PROPP, J. and WILSON, D. 1986. Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures Algorithms 9 232 252. Z. · Zbl 0859.60067 [80] REID, N. 1995. The roles of conditioning in inference. Statist. Sci. 10 138 199. Z. Z. · Zbl 0955.62524 [81] SAVAGE, L. 1976. On rereading R. A. Fisher with discussion. Ann. Statist. 4 441 450. Z. · Zbl 0325.62008 [82] SCHRIJVER, A. 1986. Theory of Linear and Integer Programming. Wiley, New York. · Zbl 0665.90063 [83] SINCLAIR, A. 1993. Algorithms for Random Generation and Counting: A Markov Chain Approach. Birkhauser, Boston. \" Z. · Zbl 0780.68096 [84] SKOVGAARD, I. 1987. Saddlepoint expansions for conditional distributions. J. Appl. Probab. 24 875 887. Z. SNEE 1974. Graphical display of two-way contingency tables. Amer. Statist. 38 9 12. Z. JSTOR: · Zbl 0638.62018 [85] STANLEY, R. 1980. Decompositoin of rational convex poly topes. Ann. Discrete Math. 6 333 342. Z. · Zbl 0812.52012 [86] STEIN, C. 1986. Approximate Computation of Expectations. IMS, Hay ward, CA. Z. · Zbl 0721.60016 [87] STURMFELS, B. 1991. Grobner bases of toric varieties. Tohoko Math. J. 43 249 261. \" Z. · Zbl 0714.14034 [88] STURMFELS, B. 1992. Asy mptotic analysis of toric ideals. Mem. Fac. Sci. Ky ushu Univ. Ser. A 46 217 228. Z. · Zbl 0784.14024 [89] STURMFELS, B. 1996. Grobner Bases and Convex Poly topes. Amer. Math. Soc., Providence, RI. \" Z. [90] THOMAS, R. 1995. A geometric Buchberger algorithm for integer programming. Math. Oper. Res. 20 864 884. Z. JSTOR: · Zbl 0846.90079 [91] VIRAG, B. 1997. Random walks on finite convex sets of lattice points. Technical report, Dept. Statistics, Univ. California, Berkeley. Z. · Zbl 0918.60054 [92] WEISPFENNING, V. 1987. Admissible orders and linear forms. ACM SIGSAM Bulletin 21 16 18. Z. 2 · Zbl 0655.13017 [93] YARNOLD, J. 1970. The minimum expectation in X goodness-of-fit tests and the accuracy of approximations for the null distribution. J. Amer. Statist. Assoc. 65 864 886. Z. JSTOR: [94] YATES, F. 1984. Tests of significance for 2 2 contingency tables. J. Roy. Statist. Soc. Ser. A 147 426 463. JSTOR: · Zbl 0573.62050 [95] CORNELL UNIVERSITY BERKELEY, CALIFORNIA 94720 [96] ITHACA, NEW YORK 14853 E-MAIL: ims@math.cornell.edu This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.