×

Slice sampling. (With discussions and rejoinder). (English) Zbl 1051.65007

The author describes a class of slice sampling methods that can be applied to a wide variety of distributions. Section 2 summarizes general-purpose Markov chain sampling methods as the Gibbs sampling, the adaptive rejection sampling, the adaptive rejection Metropolis sampling etc.
Section 3 presents the basic ideas of a slice sampling and thoroughly discusses different predecessors more or less connected to it. The principal message of the paper is concentrated in chapters 4–7. At first, simple variable slice samplings methods are described. Then the author concentrates on multivariate slice sampling methods and reflective slice sampling. An example forms the final section.
I liked the paper and I must say that despite it is a paper for Annals of Statistics, the author really concentrates on the ideas and not on the formal proofs as is typical for this journal. I am sure that everybody who want to get an idea of what is slice sampling will be satisfied.
The paper is complemented by an interesting discussion prepared by Ming-Hui Chen, B. W. Schmeiser, O. B. Downs, A. Mira, G. O. Roberts, J. Skilling, D. J. C. MacKey and G. S. Walker.

MSC:

65C60 Computational problems in statistics (MSC2010)
65C05 Monte Carlo methods
62D05 Sampling theory, sample surveys
65C40 Numerical analysis or methods applied to Markov chains

Software:

BUGS
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] ADLER, S. L. (1981). Over-relaxation method for the Monte Carlo evaluation of the partition function for multiquadratic actions. Phy s. Rev. D 23 2901-2904.
[2] BARONE, P. and FRIGESSI, A. (1990). Improving stochastic relaxation for Gaussian random fields. Probab. Engrg. Inform. Sci. 4 369-389. · Zbl 1134.60347 · doi:10.1017/S0269964800001674
[3] BESAG, J. and GREEN, P. J. (1993). Spatial statistics and Bayesian computation (with discussion). J. Roy. Statist. Soc. Ser. B 55 25-37, 53-102. JSTOR: · Zbl 0800.62572
[4] CHEN, M.-H. and SCHMEISER, B. W. (1998). Toward black-box sampling: A random-direction interior-point Markov chain approach. J. Comput. Graph. Statist. 7 1-22. JSTOR: · doi:10.2307/1390766
[5] DAMIEN, P., WAKEFIELD, J. C. and WALKER, S. G. (1999). Gibbs sampling for Bayesian nonconjugate and hierarchical models by using auxiliary variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 331-344. JSTOR: · Zbl 0913.62028 · doi:10.1111/1467-9868.00179
[6] DIACONIS, P., HOLMES, S. and NEAL, R. M. (2000). Analy sis of a non-reversible Markov chain sampler. Ann. Appl. Probab. 10 726-752. · Zbl 1083.60516 · doi:10.1214/aoap/1019487508
[7] DOWNS, O. B., MACKAY, D. J. C. and LEE, D. D. (2000). The nonnegative Boltzmann machine. In Advances in Neural Information Processing Sy stems (S. A. Solla, T. K. Leen and K.-R. Muller, eds.) 428-434. MIT Press, Cambridge, MA.
[8] DUANE, S., KENNEDY, A. D., PENDLETON, B. J. and ROWETH, D. (1987). Hy brid Monte Carlo. Phy s. Lett. B 195 216-222.
[9] EDWARDS, R. G. and SOKAL, A. D. (1988). Generalization of the Fortuin-Kasteley n-Swendsen- Wang representation and Monte Carlo algorithm. Phy s. Rev. D 38 2009-2012. · doi:10.1103/PhysRevD.38.2009
[10] FREY, B. J. (1997). Continuous sigmoidal belief networks trained using slice sampling. In Advances in Neural Information Processing Sy stems (M. C. Mozer, M. I. Jordan and T. Petsche, eds.). MIT Press, Cambridge, MA.
[11] GELFAND, A. E. and SMITH, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398-409. JSTOR: · Zbl 0702.62020 · doi:10.2307/2289776
[12] GEy ER, C. J. and THOMPSON, E. A. (1995). Annealing Markov chain Monte Carlo with applications to ancestral inference. J. Amer. Statist. Assoc. 90 909-920. · Zbl 0850.62834 · doi:10.2307/2291325
[13] GILKS, W. R. (1992). Derivative-free adaptive rejection sampling for Gibbs sampling. In Bayesian Statistics (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 641-649. Oxford Univ. Press. · Zbl 0825.62407
[14] GILKS, W. R., BEST, N. G. and TAN, K. K. C. (1995). Adaptive rejection Metropolis sampling within Gibbs sampling. Appl. Statist. 44 455-472. · Zbl 0893.62110 · doi:10.2307/2986138
[15] GILKS, W. R., NEAL, R. M., BEST, N. G. and TAN, K. K. C. (1997). Corrigendum: Adaptive rejection Metropolis sampling. Appl. Statist. 46 541-542. · Zbl 0893.62110
[16] GILKS, W. R. and WILD, P. (1992). Adaptive rejection sampling for Gibbs sampling. Appl. Statist. 41 337-348. · Zbl 0825.62407 · doi:10.2307/2347565
[17] GREEN, P. J. and HAN, X. (1992). Metropolis methods, Gaussian proposals and antithetic variables. Stochastic Models, Statistical Methods, and Algorithms in Image Analy sis (P. Barone et al., eds.). Lecture Notes in Statist. 74 142-164. Springer, New York.
[18] GREEN, P. J. and MIRA, A. (2001). Delay ed rejection in reversible jump Metropolis-Hastings. Biometrika 88 1035-1053. JSTOR: · Zbl 1099.60508 · doi:10.1093/biomet/88.4.1035
[19] HASTINGS, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 97-109. · Zbl 0219.65008 · doi:10.1093/biomet/57.1.97
[20] HIGDON, D. M. (1996). Auxiliary variable methods for Markov chain Monte Carlo with applications. ISDS discussion paper 96-17, Duke Univ.
[21] HOROWITZ, A. M. (1991). A generalized guided Monte Carlo algorithm. Phy s. Lett. B 268 247-252.
[22] LUNN, D. J., THOMAS, A., BEST, N. and SPIEGELHALTER, D. (2000). WinBUGSa Bayesian modelling framework: Concepts, structure, and extensibility. Statist. Comput. 10 325-337.
[23] METROPOLIS, N., ROSENBLUTH, A. W., ROSENBLUTH, M. N., TELLER, A. H. and TELLER, E.
[24] . Equation of state calculations by fast computing machines. J. Chem. Phy s. 21 1087-1092. · Zbl 0960.93506
[25] MIRA, A. (1998). Ordering, splicing and splitting Monte Carlo Markov chains. Ph.D. dissertation, School of Statistics, Univ. Minnesota.
[26] MIRA, A. and TIERNEY, L. (2002). Efficiency and convergence properties of slice samplers. Scand. J. Statist. 29 1-12. · Zbl 1018.91030 · doi:10.1111/1467-9469.00267
[27] NEAL, R. M. (1994). An improved acceptance procedure for the hy brid Monte Carlo algorithm. J. Comput. Phy s. 111 194-203. · Zbl 0797.65115 · doi:10.1006/jcph.1994.1054
[28] NEAL, R. M. (1996). Bayesian Learning for Neural Networks. Lecture Notes in Statist. 118. Springer, New York. · Zbl 0888.62021
[29] NEAL, R. M. (1998). Suppressing random walks in Markov chain Monte Carlo using ordered overrelaxation. In Learning in Graphical Models (M. I. Jordan, ed.) 205-228. Kluwer, Dordrecht. · Zbl 0920.60055
[30] NEAL, R. M. (2001). Annealed importance sampling. Statist. Comput. 11 125-139. · doi:10.1023/A:1008923215028
[31] ROBERTS, G. O. and ROSENTHAL, J. S. (1999). Convergence of slice sampler Markov chains. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 643-660. JSTOR: · Zbl 0929.62098 · doi:10.1111/1467-9868.00198
[32] THOMAS, A., SPIEGELHALTER, D. J. and GILKS, W. R. (1992). BUGS: A program to perform Bayesian inference using Gibbs sampling. In Bayesian Statistics (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 837-842. Oxford Univ. Press.
[33] TIERNEY, L. and MIRA, A. (1999). Some adaptive Monte Carlo methods for Bayesian inference. Statistics in Medicine 18 2507-2515.
[34] TORONTO, ONTARIO CANADA M5S 3G3 E-MAIL: radford@stat.utoronto.ca WEB: http://www.cs.utoronto.ca/ radford/ URL:
[35] and Schmeiser (1998). For single-variable slice sampling, the variation of slice sampling proposed by Neal operates analogously to Gibbs sampling in the sense that to obtain the next point x1, y is generated from the conditional distribution [y|x0] given the current point x0 and then x1 is drawn from [x|y]. Both [y|x0] and [x|y] are uniform distributions. Since the closed form of the support of [x|y] is not available, sampling directly from [x|y] is not possible. A clever development is Neal’s sophisticated (but relatively expensive) sampling procedure to generate x1 from the ”slice” S = x : y < f (x). In Chen and Schmeiser (1998), we proposed random-direction interior point (RDIP), a general sampler designed to be ”black box” in the sense that the user need not tune the sampler to the problem. RDIP samples from the uniform distribution defined over the region U below the curve of the surface defined by f (x). Both slice sampling and RDIP require evaluations of f (x). Slice sampling, however, can be more expensive than RDIP because slice sampling requires evaluating f (x) more than once per iteration. The intention of RDIP’s design is to use as much free information as possible. For the high-dimensional case, the hy perrectangle idea in slice sampling could be inefficient. For example, suppose f (x) is the bivariate normal density with a high correlation. Then, the hy perrectangle idea essentially mimics the Gibbs sampler, which suffers slow convergence; see Chen and Schmeiser (1993) for a detailed discussion. Aligning the hy perrectangle (or ellipses) to the shape of f (x), along the lines of Kaufman and Smith (1998), seems like a good idea. As Neal mentions, the computational efficiency of our ”black-box” sampler RDIP depends on the normalization constant. Our goal was to be automatic and reasonably efficient, rather than to tune the sampler to the problem. If, however,
[36] Chen (1996). · Zbl 0953.53004
[37] CHEN, M.-H. and SCHMEISER, B. W. (1993). Performance of the Gibbs, hit-and-run, and Metropolis samplers. J. Comput. Graph. Statist. 2 251-272. JSTOR: · Zbl 04516295 · doi:10.2307/1390645
[38] CHEN, M.-H. and SCHMEISER, B. W. (1998). Toward black-box sampling: A random-direction interior-point Markov chain approach. J. Comput. Graph. Statist. 7 1-22. JSTOR: · doi:10.2307/1390766
[39] NANDRAM, B. and CHEN, M.-H. (1996). Reparameterizing the generalized linear model to accelerate Gibbs sampler convergence. J. Statist. Comput. Simulation 54 129-144. · Zbl 0925.62309 · doi:10.1080/00949659608811724
[40] KAUFMAN, D. E. and SMITH, R. L. (1998). Direction choice for accelerated convergence in hit-and-run sampling. Oper. Res. 46 84-95. · Zbl 1009.62597 · doi:10.1287/opre.46.1.84
[41] STORRS, CONNECTICUT 06269-4120 E-MAIL: mhchen@stat.uconn.edu SCHOOL OF INDUSTRIAL ENGINEERING PURDUE UNIVERSITY 1287 GRISSOM HALL WEST LAFAy ETTE, INDIANA 47907-1287 E-MAIL: bruce@purdue.edu DOWNS, O. B. (2001). High-temperature expansions for learning models of nonnegative data. In Advances in Neural Information Processing Sy stems 13 (T. K. Leen, T. G. Dietterich and V. Tresp, eds.) 465-471. MIT Press, Cambridge, MA.
[42] DOWNS, O. B., MACKAY, D. J. C. and LEE, D. D. (2000). The nonnegative Boltzmann machine. In Advances in Neural Information Processing Sy stems 12 (S. A. Solla, T. K. Leen and K.-R. Muller, eds.) 428-434. MIT Press, Cambridge, MA.
[43] HINTON, G. E. and SEJNOWSKI, T. J. (1983). Optimal perceptual learning. In IEEE Conference on Computer Vision and Pattern Recognition 448-453. Washington.
[44] KAPPEN, H. J. and RODRIGUEZ, F. B. (1998). Efficient learning in Boltzmann machines using linear response theory. Neural Computation 10 1137-1156.
[45] LEE, D. D. and SEUNG, H. S. (1999). Learning the parts of objects by nonnegative matrix factorization. Nature 401 788-791. · Zbl 1369.68285
[46] MACKAY, D. J. C. (1998). Introduction to Monte Carlo methods. In Learning in Graphical Models (M. I. Jordan, ed.) 175-204. Kluwer, Dordrecht. · Zbl 0911.65004
[47] NEAL, R. M. (1997). Markov chain Monte Carlo methods based on ”slicing” the density function. Technical Report 9722, Dept. Statistics, Univ. Toronto.
[48] REDMOND, WASHINGTON 98052 E-MAIL: t-odowns@microsoft.com Rosenthal (1999). Generalizing just a little from the setting described in Section 4 of the paper, suppose that our target density can be written as the methodology of Roberts and Tweedie (2000). As an illustration of these results, it can be shown that, for the case where f0 is constant (i.e., in the single-variable slice sampler), and f1 is a real-valued log-concave function, 525 iterations suffice for convergence from all starting points x with f1(x) 0.01 supy f1(y). Similar results can be deduced for multidimensional log-concave distributions but the bounds worsen as dimension increases reflecting a genuine curse of dimensionality in this problem (despite the fact that this is inherently a twodimensional Gibbs sampler). To counteract this issue, Roberts and Rosenthal (2002) introduces the polar slice sampler in d dimensions, where f0(x) is chosen can be constructed in the spirit of Kendall and Møller (2000). Reversibility of the slice sampler allows easy simulation of these processes backwards in time to identify the starting point of the maximal and minimal chains. The beauty of the perfect slice sampling construction relies on the possibility of coupling the maximal and minimal chains even on a continuous state space. This is achieved because, thanks to monotonicity, the minimal horizontal slice (i.e., the one defined by the minimal chain) is alway s a superset of the maximal horizontal slice. If, when sampling over the minimal horizontal slice, a point is selected that belongs to the intersection of the minimal and the maximal horizontal slices, instantaneous coupling happens. Examples of applications given in Mira, Møller and Roberts (2001) include the Ising model on a two-dimensional grid at the critical temperature and various other automodels. In Casella, Mengersen, Robert and Titterington (2002) a further application of the perfect slice sampler construction to mixture of distributions is studied.
[49] to Fill (1998). A further improvement of the algorithm in Mira, Møller and Roberts (2001) allows the use of read once random numbers as introduced in Wilson (2000).
[50] CASELLA, G., MENGERSEN, K. L., ROBERT, C. P. and TITTERINGTON, D. M. (2002). Perfect samplers for mixtures of distributions. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 777-790. JSTOR: · Zbl 1067.62028 · doi:10.1111/1467-9868.00360
[51] FILL, J. A. (1998). An interruptible algorithm for perfect sampling via Markov chains. Ann. Appl. Probab. 8 131-162. · Zbl 0939.60084 · doi:10.1214/aoap/1027961037
[52] GREEN, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711-732. JSTOR: · Zbl 0861.62023 · doi:10.1093/biomet/82.4.711
[53] GREEN, P. J. and MIRA, A. (2001). Delay ed rejection in reversible jump Metropolis-Hastings. Biometrika 88 1035-1053. JSTOR: · Zbl 1099.60508 · doi:10.1093/biomet/88.4.1035
[54] KENDALL, W. S. and MØLLER, J. (2000). Perfect simulation using dominating processes on ordered spaces, with application to locally stable point processes. Adv. in Appl. Probab. 32 844-865. · Zbl 1123.60309 · doi:10.1239/aap/1013540247
[55] MIRA, A. (1998). Ordering, slicing and splitting Monte Carlo Markov chains. Ph.D. dessertation, School of Statistics, Univ. of Minnesota.
[56] MIRA, A. (2002). On Metropolis-Hastings algorithms with delay ed rejection. Metron 59 231-241. · Zbl 0998.65502
[57] MIRA, A., MØLLER, J. and ROBERTS, G. O. (2001). Perfect slice samplers. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 593-606. JSTOR: · Zbl 0993.65015 · doi:10.1111/1467-9868.00301
[58] MIRA, A. and TIERNEY, L. (2002). Efficiency and convergence properties of slice samplers. Scand. J. Statist. 29 1-12. · Zbl 1018.91030 · doi:10.1111/1467-9469.00267
[59] PESKUN, P. H. (1973). Optimum Monte Carlo sampling using Markov chains. Biometrika 60 607-612. JSTOR: · Zbl 0271.62041 · doi:10.1093/biomet/60.3.607
[60] PROPP, J. and WILSON, D. B. (1996). Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures Algorithms 9 223-252. · Zbl 0859.60067 · doi:10.1002/(SICI)1098-2418(199608/09)9:1/2<223::AID-RSA14>3.0.CO;2-O
[61] ROBERTS, G. O. and ROSENTHAL, J. S. (1999). Convergence of slice sampler Markov chains. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 643-660. JSTOR: · Zbl 0929.62098 · doi:10.1111/1467-9868.00198
[62] ROBERTS, G. O. and ROSENTHAL, J. S. (2001). Markov chains and deinitialising processes. Scand. J. Statist. 28 489-505. · Zbl 0985.60067 · doi:10.1111/1467-9469.00250
[63] ROBERTS, G. O. and ROSENTHAL, J. S. (2002). The polar slice sampler. Stoch. Models 18 257-280. · Zbl 1006.65004 · doi:10.1081/STM-120004467
[64] ROBERTS, G. O. and TWEEDIE, R. L. (2000). Rates of convergence of stochastically monotone and continuous time Markov models. J. Appl. Probab. 37 359-373. · Zbl 0979.60060 · doi:10.1239/jap/1014842542
[65] TIERNEY, L. and MIRA, A. (1999). Some adaptive Monte Carlo methods for Bayesian inference. Statistics in Medicine 18 2507-2515.
[66] WILSON, D. B. (2000). How to couple from the past using a read-once source of randomness. Random Structures Algorithms 16 85-113. · Zbl 0952.60072 · doi:10.1002/(SICI)1098-2418(200001)16:1<85::AID-RSA6>3.0.CO;2-H
[67] which appear in Damien, Wakefield and Walker (1999). I took Neal’s example in Section 8 and used a many-variable slice sampler on it. In fact I took a latent variable for each data point, the idea criticized by Neal. The overall joint density is given by DAMIEN, P., WAKEFIELD, J. C. and WALKER, S. G. (1999). Gibbs sampling for Bayesian nonconjugate and hierarchical models by using auxiliary variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 331-344. JSTOR: · Zbl 0913.62028 · doi:10.1111/1467-9868.00179
[68] ROBERTS, G. O. and ROSENTHAL, J. S. (1999). Convergence of slice sampler Markov chains. J. R. Statist. Soc. Ser. B 61 643-660. JSTOR: · Zbl 0929.62098 · doi:10.1111/1467-9868.00198
[69] BATH, BA2 7AY UNITED KINGDOM E-MAIL: S.G.Walker@bath.ac.uk CARACCIOLO, S., PELISSETTO, A. and SOKAL, A. D. (1994). A general limitation on Monte Carlo algorithms of Metropolis ty pe. Phy s. Rev. Lett. 72 179-182. · Zbl 0973.65500 · doi:10.1103/PhysRevLett.72.179
[70] CHEN, M.-H. and SCHMEISER, B. W. (1998). Toward black-box sampling: A random-direction interior-point Markov chain approach. J. Comput. Graph. Statistics 7 1-22. JSTOR: · doi:10.2307/1390766
[71] DAMIEN, P., WAKEFIELD, J. C. and WALKER, S. G. (1999). Gibbs sampling for Bayesian nonconjugate and hierarchical models by using auxiliary variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 331-344. JSTOR: · Zbl 0913.62028 · doi:10.1111/1467-9868.00179
[72] EDWARDS, R. G. and SOKAL, A. D. (1988). Generalization of the Fortuin-Kasteley n-Swendsen- Wang representation and Monte Carlo algorithm. Phy s. Rev. D 38 2009-2012. · doi:10.1103/PhysRevD.38.2009
[73] GREEN, P. J. and MIRA, A. (2001). Delay ed rejection in reversible jump Metropolis-Hastings. Biometrika 88 1035-1053. JSTOR: · Zbl 1099.60508 · doi:10.1093/biomet/88.4.1035
[74] MIRA, A. (1998). Ordering, splicing and splitting Monte Carlo Markov chains. Ph.D. dissertation, School of Statistics, Univ. Minnesota.
[75] NEAL, R. M. (1997). Markov chain Monte Carlo methods based on ”slicing” the density function. Technical Report 9722, Dept. Statistics, Univ. Toronto.
[76] ROBERTS, G. O. and ROSENTHAL, J. S. (2002). The polar slice sampler. Stoch. Models 18 257-280. · Zbl 1006.65004 · doi:10.1081/STM-120004467
[77] TIERNEY, L. and MIRA, A. (1999). Some adaptive Monte Carlo methods for Bayesian inference. Statistics in Medicine 18 2507-2515.
[78] TORONTO, ONTARIO CANADA M5S 3G3 E-MAIL: radford@stat.utoronto.ca
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.