An asymptotic sampling formula for the coalescent with recombination.

*(English)*Zbl 1193.92077Summary: The W. J. Ewens sampling formula (ESF) [Theor. Popul. Biol. 3, 87–112 (1972; Zbl 0245.92009)] is a one-parameter family of probability distributions with a number of intriguing combinatorial connections. This elegant closed-form formula first arose in biology as the stationary probability distribution of a sample configuration at one locus under the infinite-alleles model of mutations. Since its discovery, the ESF has been used in various biological applications, and has sparked several interesting mathematical generalizations. In the population genetics community, extending the underlying random-mating model to include recombination has received much attention in the past, but no general closed-form sampling formula is currently known even for the simplest extension, that is, a model with two loci.

We show that it is possible to obtain useful closed-form results in the case the population-scaled recombination rate \(\rho \) is large but not necessarily infinite. Specifically, we consider an asymptotic expansion of the two-locus sampling formula in inverse powers of \(\rho \) and obtain closed-form expressions for the first few terms in the expansion. Our asymptotic sampling formula applies to arbitrary sample sizes and configurations.

We show that it is possible to obtain useful closed-form results in the case the population-scaled recombination rate \(\rho \) is large but not necessarily infinite. Specifically, we consider an asymptotic expansion of the two-locus sampling formula in inverse powers of \(\rho \) and obtain closed-form expressions for the first few terms in the expansion. Our asymptotic sampling formula applies to arbitrary sample sizes and configurations.

##### MSC:

92D15 | Problems related to evolution |

92D10 | Genetics and epigenetics |

60C05 | Combinatorial probability |

65C50 | Other computational problems in probability (MSC2010) |

##### Keywords:

Ewens sampling formula; coalescent theory; recombination; two-locus model; infinite-alleles model##### References:

[1] | Arratia, A., Barbour, A. D. and Tavaré, S. (2003). Logarithmic Combinatorial Structures : A Probabilistic Approach . European Mathematical Society Publishing House, Switzerland. · Zbl 1040.60001 |

[2] | De Iorio, M. and Griffiths, R. C. (2004a). Importance sampling on coalescent histories. I. Adv. in Appl. Probab. 36 417-433. · Zbl 1045.62111 |

[3] | De Iorio, M. and Griffiths, R. C. (2004b). Importance sampling on coalescent histories. II. Adv. in Appl. Probab. 36 434-454. · Zbl 1045.62111 |

[4] | Ethier, S. N. and Griffiths, R. C. (1990). On the two-locus sampling distribution. J. Math. Biol. 29 131-159. · Zbl 0729.92012 |

[5] | Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3 87-112. · Zbl 0245.92009 |

[6] | Fearnhead, P. and Donnelly, P. (2001). Estimating recombination rates from population genetic data. Genetics 159 1299-1318. |

[7] | Golding, G. B. (1984). The sampling distribution of linkage disequilibrium. Genetics 108 257-274. |

[8] | Griffiths, R. C. (1981). Neutral two-locus multiple allele models with recombination. Theor. Popul. Biol. 19 169-186. · Zbl 0512.92012 |

[9] | Griffiths, R. C. (1991). The two-locus ancestral graph. In Selected Proceedings of the Sheffield Symposium on Applied Probability. IMS Lecture Notes-Monograph Series (I. V. Basawa and R. L. Taylor, eds.) 18 100-117. IMS, Hayward, CA. · Zbl 0781.92022 |

[10] | Griffiths, R. C., Jenkins, P. A. and Song, Y. S. (2008). Importance sampling and the two-locus model with subdivided population structure. Adv. in Appl. Probab. 40 473-500. · Zbl 1144.62092 |

[11] | Griffiths, R. C. and Lessard, S. (2005). Ewens’ sampling formula and related formulae: Combinatorial proofs, extensions to variable population size and applications to ages of alleles. Theor. Popul. Biol. 68 167-177. · Zbl 1085.92027 |

[12] | Griffiths, R. C. and Marjoram, P. (1996). Ancestral inference from samples of DNA sequences with recombination. J. Comput. Biol. 3 479-502. |

[13] | Hoppe, F. (1984). Pólya-like urns and the Ewens’ sampling formula. J. Math. Biol. 20 91-94. · Zbl 0547.92009 |

[14] | Hudson, R. R. (1985). The sampling distribution of linkage disequilibrium under an infinite allele model without selection. Genetics 109 611-631. |

[15] | Hudson, R. R. (2001). Two-locus sampling distributions and their application. Genetics 159 1805-1817. |

[16] | Jenkins, P. A. and Song, Y. S. (2009). Closed-form two-locus sampling distributions: Accuracy and universality. Genetics 183 1087-1103. |

[17] | Kingman, J. F. C. (1982a). The coalescent. Stochastic Process. Appl. 13 235-248. · Zbl 0491.60076 |

[18] | Kingman, J. F. C. (1982b). On the genealogy of large populations. J. Appl. Probab. 19 27-43. · Zbl 0516.92011 |

[19] | Kuhner, M. K., Yamato, J. and Felsenstein, J. (2000). Maximum likelihood estimation of recombination rates from population data. Genetics 156 1393-1401. |

[20] | McVean, G. A. T., Myers, S., Hunt, S., Deloukas, P., Bentley, D. R. and Donnelly, P. (2004). The fine-scale structure of recombination rate variation in the human genome. Science 304 581-584. |

[21] | Myers, S., Bottolo, L., Freeman, C., McVean, G. and Donnelly, P. (2005). A fine-scale map of recombination rates and hotspots across the human genome. Science 310 321-324. · Zbl 1073.65036 |

[22] | Nielsen, R. (2000). Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154 931-942. |

[23] | Pitman, J. (1992). The two-parameter generalization of Ewens’ random partition structure. Technical Report 345, Dept. Statistics, Univ. California, Berkeley. |

[24] | Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Probab. Theory Related Fields 102 145-158. · Zbl 0821.60047 |

[25] | Slatkin, M. (1994). An exact test for neutrality based on the Ewens sampling distribution. Genet. Res. 64 71-74. |

[26] | Slatkin, M. (1996). A correction to an exact test based on the Ewens sampling distribution. Genet. Res. 68 259-260. |

[27] | Stephens, M. (2001). Inference under the coalescent. In Handbook of Statistical Genetics (D. Balding, M. Bishop and C. Cannings, eds.) 213-238. Wiley, Chichester, UK. |

[28] | Stephens, M. and Donnelly, P. (2000). Inference in molecular population genetics. J. R. Stat. Soc. Ser. B Stat. Methodol. 62 605-655. JSTOR: · Zbl 0962.62107 |

[29] | Wang, Y. and Rannala, B. (2008). Bayesian inference of fine-scale recombination rates using population genomic data. Philos. Trans. R. Soc. 363 3921-3930. |

[30] | Watterson, G. A. (1977). Heterosis or neutrality? Genetics 85 789-814. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.