×

The ubiquitous Ewens sampling formula. (English) Zbl 1442.60010

Summary: Ewens’s sampling formula exemplifies the harmony of mathematical theory, statistical application, and scientific discovery. The formula not only contributes to the foundations of evolutionary molecular genetics, the neutral theory of biodiversity, Bayesian nonparametrics, combinatorial stochastic processes, and inductive inference but also emerges from fundamental concepts in probability theory, algebra, and number theory. With an emphasis on its far-reaching influence throughout statistics and probability, we highlight these and many other consequences of Ewens’s seminal discovery.

MSC:

60C05 Combinatorial probability
60G09 Exchangeability for stochastic processes
60G57 Random measures
62G05 Nonparametric estimation
62F15 Bayesian inference
62P10 Applications of statistics to biology and medical sciences; meta analysis
92D15 Problems related to evolution
PDF BibTeX XML Cite
Full Text: DOI Euclid

References:

[1] Aldous, D. J. (1985). Exchangeability and related topics. In École d’été de Probabilités de Saint-Flour, XIII—1983. Lecture Notes in Math.1117 1-198. Springer, Berlin. · Zbl 0562.60042
[2] Antoniak, C. E. (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist.2 1152-1174. · Zbl 0335.60034
[3] Arratia, R., Barbour, A. D. and Tavaré, S. (1992). Poisson process approximations for the Ewens sampling formula. Ann. Appl. Probab.2 519-535. · Zbl 0756.60006
[4] Arratia, R., Barbour, A. D. and Tavaré, S. (2000). Limits of logarithmic combinatorial structures. Ann. Probab.28 1620-1644. · Zbl 1044.60003
[5] Arratia, R., Barbour, A. D. and Tavaré, S. (2003). Logarithmic Combinatorial Structures: A Probabilistic Approach. Eur. Math. Soc., Zürich. · Zbl 1040.60001
[6] Bacallado, S., Favaro, S. and Trippa, L. (2015). Bayesian nonparametric inference for shared species richness in multiple populations. J. Statist. Plann. Inference166 14-23. · Zbl 1394.62031
[7] Bartholomew, D. J. (1973). Stochastic Models for Social Processes, 2nd ed. Wiley, London. · Zbl 0278.60058
[8] Bayes, T. (1764). An essay toward solving a problem in the doctrine of chances. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci.53 370-418. · Zbl 1250.60007
[9] Bertoin, J. (2006). Random Fragmentation and Coagulation Processes. Cambridge Studies in Advanced Mathematics102. Cambridge Univ. Press, Cambridge.
[10] Billingsley, P. (1972). On the distribution of large prime divisors. Period. Math. Hungar.2 283-289. · Zbl 0242.10033
[11] Blackwell, D. and MacQueen, J. B. (1973). Ferguson distributions via Pólya urn schemes. Ann. Statist.1 353-355. · Zbl 0276.62010
[12] Blei, D., Ng, A. and Jordan, M. (2003). Latent Dirichlet allocation. J. Mach. Learn. Res.3 993-1022. · Zbl 1112.68379
[13] Borodin, A. and Corwin, I. (2014). Macdonald processes. Probab. Theory Related Fields158 225-400. · Zbl 1291.82077
[14] Cauchy, A. (1815). Mémoire sur les fonctions qui ne peuvent obtenir que deux valeurs égales et de signes contraires par suite des transpositions opérées entre les variables qu’elles renferment. Journal de l’École Polytechnique10 91-169.
[15] Cesari, O., Favaro, S. and Nipoti, B. (2014). Posterior analysis of rare variants in Gibbs-type species sampling models. J. Multivariate Anal.131 79-98. · Zbl 1299.62115
[16] Champernowne, D. (1953). A model of income distribution. Econom. J.63.
[17] Christiansen, F. B. (2008). Theories of Population Variation in Genes and Genomes. Princeton Univ. Press, Princeton, NJ. · Zbl 1309.92002
[18] Crane, H. (2011). A consistent Markov partition process generated from the paintbox process. J. Appl. Probab.48 778-791. · Zbl 1235.60092
[19] Crane, H. (2013). Some algebraic identities for the \(α\)-permanent. Linear Algebra Appl.439 3445-3459. · Zbl 1283.15026
[20] Crane, H. (2014). The cut-and-paste process. Ann. Probab.42 1952-1979. · Zbl 1317.60034
[21] Crane, H. (2015a). Clustering from categorical data sequences. J. Amer. Statist. Assoc.110 810-823. · Zbl 1373.62304
[22] Crane, H. (2015b). Generalized Ewens-Pitman model for Bayesian clustering. Biometrika102 231-238. · Zbl 1345.62089
[23] de Finetti, B. (1937). La prévision: Ses lois logiques, ses sources subjectives. Ann. Inst. H. Poincaré7 1-68. · Zbl 0017.07602
[24] Derrida, B. (1981). Random-energy model: An exactly solvable model of disordered systems. Phys. Rev. B (3) 24 2613-2626. · Zbl 1323.60134
[25] Derrida, B. (1997). From random walks to spin glasses. Phys. D107 186-198. · Zbl 1029.82509
[26] Diaconis, P. and Ram, A. (2012). A probabilistic interpretation of the Macdonald polynomials. Ann. Probab.40 1861-1896. · Zbl 1255.05194
[27] Donnelly, P. (1986). Partition structures, Pólya urns, the Ewens sampling formula, and the ages of alleles. Theoret. Population Biol.30 271-288. · Zbl 0608.92005
[28] Donnelly, P. and Grimmett, G. (1993). On the asymptotic distribution of large prime factors. J. Lond. Math. Soc. (2) 47 395-404. · Zbl 0839.11039
[29] Efron, B. and Thisted, R. (1976). Estimating the number of unseen species: How many words did Shakespeare know? Biometrika63 435-447. · Zbl 0344.62088
[30] Ethier, S. N. and Griffiths, R. C. (1993). The transition function of a Fleming-Viot process. Ann. Probab.21 1571-1590. · Zbl 0778.60038
[31] Etienne, R. (2005). A new sampling formula for neutral biodiversity. Ecology Letters8 253-260.
[32] Etienne, R. (2007). A neutral sampling formula for multiple samples and an “exact” text of neutrality. Ecology Letters10 608-618.
[33] Etienne, R. and Alonso, D. (2005). A dispersal-limited sampling theory for species and alleles. Ecology Letters8 1147-1156.
[34] Ewens, W. and Tavaré, S. (1998). The Ewens sampling formula. In Encyclopedia of Statistical Science (S. Kotz, C. B. Read and D. L. Banks, eds.) Wiley, New York.
[35] Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theoret. Population Biology3 87-112; erratum, ibid. 3 (1972), 240; erratum, ibid. 3 (1972), 376. · Zbl 0245.92009
[36] Favaro, S., Lijoi, A. and Prünster, I. (2013). Conditional formulae for Gibbs-type exchangeable random partitions. Ann. Appl. Probab.23 1721-1754. · Zbl 1287.60046
[37] Feng, S. (2010). The Poisson-Dirichlet Distribution and Related Topics: Models and Asymptotic Behaviors. Springer, Heidelberg. · Zbl 1214.60001
[38] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist.1 209-230. · Zbl 0255.62037
[39] Fisher, R. (1922). On the dominance ratio. Proceedings of the Royal Society of Edinburgh42 321-341.
[40] Fisher, R., Corbet, A. and Williams, C. (1943). The relation between the number of species and the number of individuals in a random sample of an animal population. The Journal of Animal Ecology12 42-58.
[41] Fleming, W. H. and Viot, M. (1979). Some measure-valued Markov processes in population genetics theory. Indiana Univ. Math. J.28 817-843. · Zbl 0444.60064
[42] Gnedin, A. (2010). A species sampling model with finitely many types. Electron. Commun. Probab.15 79-88. · Zbl 1202.60056
[43] Good, I. J. and Toulmin, G. H. (1956). The number of new species, and the increase in population coverage, when a sample is increased. Biometrika43 45-63. · Zbl 0070.14403
[44] Griffiths, R. C. (1979). Exact sampling distributions from the infinite neutral alleles model. Adv. in Appl. Probab.11 326-354. · Zbl 0406.92016
[45] Hartigan, J. A. (1990). Partition models. Comm. Statist. Theory Methods19 2745-2756.
[46] Higgs, P. (1995). Frequency distributions in population genetics parallel those in statistical physics. Phys. Rev. E (3) 51 1-7.
[47] Hoppe, F. M. (1984). Pólya-like urns and the Ewens’ sampling formula. J. Math. Biol.20 91-94. · Zbl 0547.92009
[48] Hough, J. B., Krishnapur, M., Peres, Y. and Virág, B. (2006). Determinantal processes and independence. Probab. Surv.3 206-229. · Zbl 1189.60101
[49] Hubbell, S., (2001). The Unified Neutral Theory of Biodiversity and Biogeography. Princeton Univ. Press, Princeton, NJ.
[50] Ishwaran, H. and James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc.96 161-173. · Zbl 1014.62006
[51] Ishwaran, H. and James, L. F. (2003). Generalized weighted Chinese restaurant processes for species sampling mixture models. Statist. Sinica13 1211-1235. · Zbl 1086.62036
[52] James, L. F. (2013). Stick-breaking \(\text{PG}(α,ζ)\)-generalized Gamma processes. Available at arXiv:1308.6570v3.
[53] Johnson, W. (1932). Probability: The deductive and inductive problems. Mind41 409-423. · Zbl 0005.25401
[54] Karlin, S. and McGregor, J. (1972). Addendum to a paper of W. Ewens. Theoret. Population Biology3 113-116. · Zbl 0245.92010
[55] Kerov, S. (2005). Coherent random allocations, and the Ewens-Pitman formula. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 325 127-145. · Zbl 1077.60007
[56] Kerov, S., Okounkov, A. and Olshanski, G. (1998). The boundary of the Young graph with Jack edge multiplicities. Int. Math. Res. Not. IMRN4 173-199. · Zbl 0960.05107
[57] Kimura, M. (1968). Evolutionary rate at the molecular level. Nature217 624-626.
[58] Kingman, J. F. C. (1977). The population structure associated with the Ewens sampling formula. Theoret. Population Biology11 274-283. · Zbl 0421.92011
[59] Kingman, J. F. C. (1978a). Random partitions in population genetics. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci.361 1-20. · Zbl 0393.92011
[60] Kingman, J. F. C. (1978b). The representation of partition structures. J. Lond. Math. Soc. (2) 18 374-380. · Zbl 0415.92009
[61] Kingman, J. F. C. (1980). Mathematics of Genetic Diversity. CBMS-NSF Regional Conference Series in Applied Mathematics34. SIAM, Philadelphia, PA. · Zbl 0458.92009
[62] Kingman, J. F. C. (1982). The coalescent. Stochastic Process. Appl.13 235-248. · Zbl 0491.60076
[63] Knuth, D. E. and Trabb Pardo, L. (1976/77). Analysis of a simple factorization algorithm. Theoret. Comput. Sci.3 321-348. · Zbl 0362.10006
[64] Lijoi, A., Mena, R. H. and Prünster, I. (2007). Bayesian nonparametric estimation of the probability of discovering new species. Biometrika94 769-786. · Zbl 1156.62374
[65] Macchi, O. (1975). The coincidence approach to stochastic point processes. Adv. in Appl. Probab.7 83-122. · Zbl 0366.60081
[66] Macdonald, I. G. (1995). Symmetric Functions and Hall Polynomials, 2nd ed. Clarendon Press, New York. · Zbl 0824.05059
[67] McCullagh, P. and Yang, J. (2006). Stochastic classification models. In International Congress of Mathematicians. Vol. III 669-686. Eur. Math. Soc., Zürich. · Zbl 1112.62058
[68] McCullagh, P. and Yang, J. (2008). How many clusters? Bayesian Anal.3 101-120. · Zbl 1330.62033
[69] Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Statist.9 249-265.
[70] Olshanski, G. (2011). Random permutations and related topics. In The Oxford Handbook of Random Matrix Theory 510-533. Oxford Univ. Press, Oxford. · Zbl 1242.05006
[71] Perman, M., Pitman, J. and Yor, M. (1992). Size-biased sampling of Poisson point processes and excursions. Probab. Theory Related Fields92 21-39. · Zbl 0741.60037
[72] Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Probab. Theory Related Fields102 145-158. · Zbl 0821.60047
[73] Pitman, J. (1996). Random discrete distributions invariant under size-biased permutation. Adv. in Appl. Probab.28 525-539. · Zbl 0853.62018
[74] Pitman, J. (2003). Poisson-Kingman partitions. In Statistics and Science: A Festschrift for Terry Speed. Institute of Mathematical Statistics Lecture Notes—Monograph Series40 1-34. IMS, Beachwood, OH.
[75] Pitman, J. (2006). Combinatorial Stochastic Processes. Lecture Notes in Math.1875. Springer, Berlin. · Zbl 1103.60004
[76] Pitman, J. and Yor, M. (1997). The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab.25 855-900. · Zbl 0880.60076
[77] Sibuya, M. (2014). Prediction in Ewens-Pitman sampling formula and random samples from number partitions. Ann. Inst. Statist. Math.66 833-864. · Zbl 1309.62026
[78] Simon, H. A. (1955). On a class of skew distribution functions. Biometrika42 425-440. · Zbl 0066.11201
[79] Sloane, N. Online Encyclopedia of Integer Sequences. Published electronically at http://www.oeis.org/. · Zbl 1044.11108
[80] Spielman, R., McGinnis, R. and Ewens, W. (1993). Transmission test for linkage disequilibrium: The insulin gene region and insulin-dependent diabetes mellitus (IDDM). American Journal of Human Genetics52 506-516.
[81] Tavaré, S. and Ewens, W. (1997). The Multivariate Ewens Distribution. In Discrete Multivariate Distributions (N. L. Johnson, S. Kotz and N. Balakrishnan, eds.) Wiley, New York.
[82] Thisted, R. and Efron, B. (1987). Did Shakespeare write a newly-discovered poem? Biometrika74 445-455. · Zbl 0635.62115
[83] Valiant, L. G. (1979). The complexity of computing the permanent. Theoret. Comput. Sci.8 189-201. · Zbl 0415.68008
[84] Vere-Jones, D. (1988). A generalization of permanents and determinants. Linear Algebra Appl.111 119-124. · Zbl 0665.15007
[85] Vershik, A. M. (1986). Asymptotic distribution of decompositions of natural numbers into prime divisors. Dokl. Akad. Nauk SSSR289 269-272.
[86] Wakeley, J. (2008). Coalescent Theory: An Introduction. Roberts and Company Publishers, Greenwood Village, CO. · Zbl 1366.92001
[87] Watterson, G. (1978). The homozygosity test of neutrality. Genetics88 405-417.
[88] Watterson, G. A. (1977). Heterosis or neutrality? Genetics85 789-814.
[89] Wright, S. Evolution in Mendelian populations. Genetics16 97-159.
[90] Yule, G. (1925). A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis. F. R. S. Phil. Trans. Roy. Soc. London, B213 21-87.
[91] Zabell, S. (1988). Symmetry and its discontents. In Causation, Chance, and Credence1 155-190. Kluwer Academic, Norwell.
[92] Zabell, S. (1992). Predicting the unpredictable. Synthese90 205-232. · Zbl 0757.62006
[93] Zabell, S.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.