×

Multi-locus data distinguishes between population growth and multiple merger coalescents. (English) Zbl 1398.92207

Summary: We introduce a low dimensional function of the site frequency spectrum that is tailor-made for distinguishing coalescent models with multiple mergers from Kingman coalescent models with population growth, and use this function to construct a hypothesis test between these model classes. The null and alternative sampling distributions of the statistic are intractable, but its low dimensionality renders them amenable to Monte Carlo estimation. We construct kernel density estimates of the sampling distributions based on simulated data, and show that the resulting hypothesis test dramatically improves on the statistical power of a current state-of-the-art method. A key reason for this improvement is the use of multi-locus data, in particular averaging observed site frequency spectra across unlinked loci to reduce sampling variance. We also demonstrate the robustness of our method to nuisance and tuning parameters. Finally we show that the same kernel density estimates can be used to conduct parameter estimation, and argue that our method is readily generalisable for applications in model selection, parameter inference and experimental design.

MSC:

92D25 Population dynamics (general)
92D10 Genetics and epigenetics
92D15 Problems related to evolution
62P10 Applications of statistics to biology and medical sciences; meta analysis
62M02 Markov processes: hypothesis testing
62F03 Parametric hypothesis testing
62G07 Density estimation
PDFBibTeX XMLCite
Full Text: DOI arXiv Link

References:

[1] Achaz, G. (2008): “Testing for neutrality in samples with sequencing errors,” Genetics, 179, 1409-1424.
[2] Árnason, E. (2004): “Mitochondrial cytochrome b variation in the high-fecundity Atlantic cod: trans-Atlantic clines and shallow gene genealogy.” Genetics, 166, 1871-1885.
[3] Beaumont, M. A. (2010): “Approximate Bayesian computation in evolution and ecology,” Annu. Rev. Ecol. Evol. Syst., 41, 379-406.
[4] Beckenbach, A. T. (1994): “Mitochondrial haplotype frequencies in oysters: neutral alternatives to selection models,” In: Golding, B. (Ed.), Non-neutral evolution. New York: Chapman & Hall, pp. 188-198.
[5] Birkner, M. and J. Blath (2008): “Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model,” J. Math. Biol., 57, 435-465. · Zbl 1274.92039
[6] Birkner, M., J. Blath, M. Möhle, M. Steinrücken, and J. Tams (2009): “A modified lookdown construction for the Xi-Fleming-Viot process with mutation and populations with recurrent bottlenecks,” ALEA Lat. Am. J. Probab. Math. Stat., 6, 25-61. · Zbl 1162.60342
[7] Birkner, M., J. Blath, and M. Steinrücken (2011): “Importance sampling for Lambda-coalescents in the infinitely many sites model,” Theor. Popul. Biol., 79, 155-173. · Zbl 1338.92073
[8] Birkner, M., J. Blath, and B. Eldon (2013a): “An ancestral recombination graph for diploid populations with skewed offspring distribution,” Genetics, 193, 255-290.
[9] Birkner, M., J. Blath, and B. Eldon (2013b): “Statistical properties of the site-frequency spectrum associated with Lambda-coalescents,” Genetics, 195, 1037-1053.
[10] Birkner, M., H. Liu, and A. Sturm (2017): “A note on coalescent results for diploid exchangeable population models,” Preprint, arXiv:1709.02563v2.
[11] Blath, J., M. C. Cronjäger, B. Eldon, and M. Hammer (2016): “The site-frequency spectrum associated with Ξ-coalescents,” Theor. Popul. Biol., 110, 36-50. · Zbl 1365.92072
[12] Depaulis, F. and M. Veuille (1998): “Neutrality tests based on the distribution of haplotypes under an infinite-site model,” Mol. Biol. Evol., 15, 1788.
[13] Diggle, P. J. and R. J. Gratton (1984): “Monte Carlo methods of inference for implicit statistical models,” J. R. Stat. Soc. B, 46, 193-227. · Zbl 0561.62035
[14] Donnelly, P. and T. G. Kurtz (1999): “Particle representations for measure-valued population models,” Ann. Probab., 27, 166-205. · Zbl 0956.60081
[15] Donnelly, P. and S. Tavaré (1995): “Coalescents and genealogical structure under neutrality,” Annu. Rev. Genet., 29, 401-421.
[16] Duong, T. and M. L. Hazelton (2003): “Plug-in bandwidth matrices for bivariate kernel density estimation,” J. Nonparametr Stat., 15, 17-30. · Zbl 1019.62032
[17] Durrett, R. and J. Schweinsberg (2005): “A coalescent model for the effect of advantageous mutations on the genealogy of a population,” Stoch. Proc. Appl., 115, 1628-1657. · Zbl 1082.92031
[18] Eldon, B. (2011): “Estimation of parameters in large offspring number models and ratios of coalescence times,” Theor. Popul. Biol., 80, 16-28.
[19] Eldon, B. and J. Wakeley (2006): “Coalescent processes when the distribution of offspring number among individuals is highly skewed,” Genetics, 172, 2621-2633.
[20] Eldon, B. and J. Wakeley (2009): “Coalescence times and F_{ST} under a skewed offspring distribution among individuals in a population,” Genetics, 181, 615-629.
[21] Eldon, B., M. Birkner, J. Blath, and F. Freund (2015): “Can the site frequency spectrum distinguish exponential population growth from multiple-merger coalescents,” Genetics, 199, 841-856.
[22] Fay, J. C. and C.-I. Wu (2000): “Hitchhiking under positive Darwinian selection,” Genetics, 155, 1405-1413.
[23] Fu, Y. X. (1995): “Statistical properties of segregating sites,” Theor. Popul. Biol., 48, 172-197. · Zbl 0854.92014
[24] Fu, Y. X. and W. H. Li (1993): “Statistical tests of neutrality of mutations,” Genetics, 133, 693-709.
[25] Hedgecock, D. and A. I. Pudovkin (2011): “Sweepstakes reproductive success in highly fecund marine fish and shellfish: a review and commentary,” Bull. Mar. Sci., 87, 971-1002.
[26] Hein, J., M. H. Schierup, and C. Wiuf (2005): Gene genealogies, variation and evolution. Oxford, UK: Oxford University Press. · Zbl 1113.92048
[27] Hudson, R. R. (1983a): “Properties of a neutral allele model with intragenic recombination,” Theor. Popul. Biol., 23, 183-201. · Zbl 0505.62090
[28] Hudson, R. R. (1983b): “Testing the constant-rate neutral allele model with protein sequence data,” Evolution, 37, 203-217.
[29] Hudson, R. R. (1990): “Gene genealogies and the coalescent process,” In: Futuyma, D. J., Antonovics, J. (Eds.), Oxford surveys in evolutionary biology, Vol. 7. Oxford: Oxford University Press, pp. 1-44.
[30] Kingman, J. F. C. (1982a): “The coalescent,” Stoch. Proc. Appl., 13, 235-248. · Zbl 0491.60076
[31] Kingman, J. F. C. (1982b): “Exchangeability and the evolution of large populations,” In: Koch, G., Spizzichino, F., (Eds.), Exchangeability in probability and statistics. Amsterdam: North-Holland, pp. 97-112.
[32] Kingman, J. F. C. (1982c): “On the genealogy of large populations,” J. Appl. Probab., 19A, 27-43. · Zbl 0516.92011
[33] Koskela, J., P. Jenkins, and D. Spanò (2015): “Computational inference beyond Kingman’s coalescent,” J. Appl. Probab., 52, 519-537. · Zbl 1347.60120
[34] Koskela, J., P. Jenkins, and D. Spanò (2018): “Bayesian non-parametric inference for Λ-coalescents: posterior consistency and a parametric method,” Bernoulli, 24, 2122-2153. · Zbl 1419.62063
[35] Möhle, M. (1998): “Robustness results for the coalescent,” J. Appl. Probab., 35, 438-447. · Zbl 0913.60022
[36] Nordborg, M. (2001): “Coalescent theory,” In: Balding, D. J., Bishop, M. J., Cannings, C. (Eds.), Handbook of statistical genetics, chapter 25, 2nd edn. Chichester, UK: John Wiley & Sons, pp. 179-212.
[37] Pitman, J. (1999): “Coalescents with multiple collisions,” Ann. Probab., 27, 1870-1902. · Zbl 0963.60079
[38] Ramos-Onsins, S. E. and J. Rozas (2002): “Statistical properties of new neutrality tests against population growth,” Mol. Biol. Evol., 19, 2092-2100.
[39] Sagitov, S. (1999): “The general coalescent with asynchronous mergers of ancestral lines,” J. Appl. Probab., 36, 1116-1125. · Zbl 0962.92026
[40] Sargsyan, O. and J. Wakeley (2008): “A coalescent process with simultaneous multiple mergers for approximating the gene genealogies of many marine organisms,” Theor. Popul. Biol., 74, 104-114. · Zbl 1210.92028
[41] Schweinsberg, J. (2003): “Coalescent processes obtained from supercritical Galton-Watson processes,” Stoch. Proc. Appl., 106, 107-139. · Zbl 1075.60571
[42] Scott, D. W. (1992): Multivariate density estimation: theory, practice and visualization. New York: John Wiley & Sons. · Zbl 0850.62006
[43] Steinrücken, M., M. Birkner, and J. Blath (2013): “Analysis of DNA sequence variation within marine species using beta-coalescents,” Theor. Popul. Biol., 87, 15-24. · Zbl 1296.92191
[44] Tajima, F. (1983): “Evolutionary relationship of DNA sequences in finite populations,” Genetics, 105, 437-460.
[45] Tajima, F. (1989): “The effect of change in population size on DNA polymorphism,” Genetics, 123, 597-601.
[46] Tellier, A. and C. Lemaire (2014): “Coalescence 2.0: a multiple branching of recent theoretical developments and their applications,” Mol. Ecol., 23, 2637-2652.
[47] Tørresen, O. K., B. Star, S. Jentoft, W. B. Reinar, H. Grove, J. R. Miller, B. P. Walenz, J. Knight, J. M. Ekholm, P. Peluso, R. B. Edvardsen, A. Tooming-Klunderud, M. Skage, S. Lien, K. S. Jakobsen, and A. J. Nederbragt (2017): “An improved genome assembly uncovers prolific tandem repeats in Atlantic cod,” BMC Genomics, 18, 95.
[48] Wakeley, J. (2007): Coalescent theory. Greenwood Village: Roberts & Co.
[49] Watterson, G. A. (1975): “On the number of segregating sites in genetical models without recombination,” Theor. Pop. Biol., 7, 1539-1546. · Zbl 0294.92011
[50] Zhu, S., J. H. Degnan, S. J. Goldstein, and B. Eldon (2015): “Hybrid-Lambda: simulation of multiple merger and Kingman gene genealogies in species networks and species trees,” BMC Bioinformatics, 16.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.