×

Inference in population genetics using forward and backward, discrete and continuous time processes. (English) Zbl 1397.92449

Summary: A central aim of population genetics is the inference of the evolutionary history of a population. To this end, the underlying process can be represented by a model of the evolution of allele frequencies parametrized by e.g., the population size, mutation rates and selection coefficients. A large class of models use forward-in-time models, such as the discrete Wright-Fisher and Moran models and the continuous forward diffusion, to obtain distributions of population allele frequencies, conditional on an ancestral initial allele frequency distribution. Backward-in-time diffusion processes have been rarely used in the context of parameter inference. Here, we demonstrate how forward and backward diffusion processes can be combined to efficiently calculate the exact joint probability distribution of sample and population allele frequencies at all times in the past, for both discrete and continuous population genetics models. This procedure is analogous to the forward-backward algorithm of hidden Markov models. While the efficiency of discrete models is limited by the population size, for continuous models it suffices to expand the transition density in orthogonal polynomials of the order of the sample size to infer marginal likelihoods of population genetic parameters. Additionally, conditional allele trajectories and marginal likelihoods of samples from single populations or from multiple populations that split in the past can be obtained. The described approaches allow for efficient maximum likelihood inference of population genetic parameters in a wide variety of demographic scenarios.

MSC:

92D10 Genetics and epigenetics
92D15 Problems related to evolution
60J20 Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.)
60J70 Applications of Brownian motions and diffusion theory (population genetics, absorption problems, etc.)
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] (Abramowitz, M.; Stegun, I., Handbook of Mathematical Functions (1970), Dover) · Zbl 0171.38503
[2] Baake, E.; Bialowons, R., Ancestral Processes with Selection: Branching and Moran Models, 80, 33-52 (2008), Banach Center Publications · Zbl 1137.92025
[3] Bayin, S., Mathematical Methods in Science and Engineering (2006), Wiley, N.Y · Zbl 1180.00002
[4] Bollback, J. P.; York, T. L.; Nielsen, R., Estimation of 2\(n_e\) s from temporal allele frequency data, Genetics, 179, 1, 497-502 (2008)
[5] Carlin, B.; Louis, T., Bayes and Empirical Bayes Methods (2000), Chapman and Hall · Zbl 1017.62005
[6] Durbin, R.; Eddy, S.; Krogh, A.; Mitchison, G., Biological Sequence Analysis (1998), Cambridge University Press, Cambridge · Zbl 0929.92010
[7] Etheridge, A.; Griffiths, R., A coalescent dual process in a moran model with genic selection, Theor. Popul. Biol., 75, 320-330 (2009) · Zbl 1213.92038
[8] Evans, S.; Shvets, Y.; Slatkin, M., Non-equilibrium theory of the allele frequency spectrum, Theor. Popul. Biol., 71, 109-119 (2007) · Zbl 1118.92041
[9] Ewens, W., A note on the sampling theory for infinite alleles and infinite sites models, Theor. Popul.n Biology, 6, 143-148 (1974) · Zbl 0291.92006
[10] Ewens, W., Mathematical Population Genetics (2004), Springer: Springer N.Y · Zbl 1060.92046
[11] Felsenstein, J., Evolutionary trees from dna sequences: a maximum likelihood approach, J. Mol. Evol., 17, 368-376 (1981)
[12] Gutenkunst, R.; Hernandez, R.; Williamson, S.; Bustamante, C., Inferring the joint demographic history of multiple populations from multidimensional snp frequency data, PLoS Genetics, 5, e1000695 (2009)
[13] Hein, J.; Schierup, M.; Wiuf, C., Gene Genealogies, Variation, and Evolution: A Primer in Coalescent Theory (2005), Oxford University Press · Zbl 1113.92048
[14] Jewett, E. M.; Steinrücken, M.; Song, Y. S., The effects of population size histories on estimates of selection coefficients from time-series genetic data, Mol. Biol. Evol., 33, 11, 3002-3027 (2016)
[15] Kimura, M., Solution of a process of random genetic drift with a continuous model, Proc. Natl. Acad. Sci. USA, 41, 144-150 (1955) · Zbl 0064.39101
[16] Kimura, M., The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations, Genetics, 61, 893-903 (1969)
[17] Kingman, J., On the genealogy of large populations, J. Appl. Probab., 19A, 27-43 (1982) · Zbl 0516.92011
[18] Kofler, R.; Schlötterer, C., A guide for the design of evolve and resequencing studies, Mol. Biol. Evol., 31, 474-483 (2014)
[19] Li, H.; Stephan, W., Inferring the demographic history and rate of adaptive substitution in drosophila, PLOS Genetics, 10, e166 (2006)
[20] Lukić, S.; Hey, J., Demographic inference using spectral methods on snp data, with an analysis of the human out-of-africa expansion, Genetics, 192, 2, 619-639 (2012)
[21] Lukić, S.; Hey, J.; Chen, K., Non-equilibrium allele frequency spectra via spectral methods, Theor. Popul. Biol., 79, 4, 203-219 (2011) · Zbl 1338.92079
[22] Malaspinas, A.-S.; Malaspinas, O.; Evans, S. N.; Slatkin, M., Estimating allele age and selection coefficient from time-serial data, Genetics, 192, 2, 599-607 (2012)
[23] McKane, A.; Waxman, D., Singular solutions of the diffusion equation of population genetics, J. Theor. Biol., 247, 849-858 (2007) · Zbl 1455.92105
[24] Pool, J. E.; Corbett-Detig, R. B.; Sugino, R. P.; Stevens, K. A.; Cardeno, C. M.; Crepeau, M. W.; Duchen, P.; Emerson, J. J.; Saelao, P.; Begun, D. J.; Langley, C. H., Population genomics of sub-saharan Drosophila melanogaster: African diversity and non-african admixture, PLOS Genet, 8, 12, e1003080 (2012)
[25] Rabiner, L.; Juang, B., An introduction to hidden markov models, IEEE ASSP Mag., 3, 4-16 (1986)
[26] Sawyer, S.; Hartl, D., Population genetics of polymorphism and divergence, Genetics, 132, 1161-1176 (1992)
[27] Schraiber, J. G.; Evans, S. N.; Slatkin, M., Bayesian inference of natural selection from allele frequency time series, Genetics, 203, 1, 493-511 (2016)
[28] Schrempf, D.; Minh, B. Q.; De Maio, N.; von Haeseler, A.; Kosiol, C., Reversible polymorphism-aware phylogenetic models and their application to tree inference, J. Theor. Biol., 407, 362-370 (2016) · Zbl 1344.92117
[29] Song, Y.; Steinrücken, M., A simple method for finding explicit analytic transition densities of diffusion processes with general diploid selection, Genetics, 190, 1117-1129 (2012)
[30] Steinrücken, M.; Bhaskar, A.; Song, Y., A novel method for inferring general diploid selection from time series genetic data, Ann. Appl. Stat., 8, 2203-2222 (2014) · Zbl 1454.62405
[31] Steinrücken, M.; Jewett, E. M.; Song, Y. S., Spectraltdf: transition densities of diffusion processes with time-varying selection parameters, mutation rates and effective population sizes, Bioinformatics, 32, 5, 795-797 (2015)
[32] Steinrücken, M.; Wang, R.; Song, Y., An explicit transition density expansion for a multi-allelic wrightfisher diffusion with general diploid selection, Theor. Popul. Biol., 83, 1-14 (2013) · Zbl 1275.92090
[33] Tran, T.; Hofrichter, J.; Jost, J., An introduction to the mathematical structure of the wright-fisher model of population genetics, Theory Biosci., 132, 73-82 (2013)
[34] Vogl, C., Estimating the scaled mutation rate and mutation bias with site frequency data, Theor. Popul. Biol., 98, 19-27 (2014) · Zbl 1304.92100
[35] Vogl, C.; Bergman, J., Inference of directional selection and mutation parameters assuming equilibrium, Theor. Popul. Biol., 106, 71-82 (2015) · Zbl 1343.92325
[36] Vogl, C.; Bergman, J., Computation of the likelihood of joint site frequency spectra using orthogonal polynomials, Computation, 4, 6 (2016)
[37] Vogl, C.; Futschik, A., Hidden markov models in biology., (Carugo, O.; EIsenhaber, F., Biological Data Mining (2010), Humana Press)
[38] Wakeley, J., Coalescent Theory, an Introduction (2009), Roberts and Co. · Zbl 1366.92001
[39] Watterson, G., On the number of segregating sites in genetical models without recombination, Theor. Popul. Biol., 7, 256-276 (1975) · Zbl 0294.92011
[40] Waxman, D., Comparison and content of the wrightfisher model of random genetic drift, the diffusion approximation, and an intermediate model, J. Theor. Biol., 269, 79-87 (2011) · Zbl 1307.92297
[41] Wright, S., The roles of mutation, inbreeding, crossbreeding, and selection in evolution, Proceedings of the sixth international congress of genetics, 1, 356-366 (1932)
[42] Xu, S.; Jiao, S.; Jiang, P.; Ao, P., Two-time-scale population evolution on a singular landscape, Phys. Rev. E, 89, 1, 012724 (2014)
[43] Zeng, K.; Charlesworth, B., Studying patterns of recent evolution at synonymous sites and intronic sites in drosophila melanogaster, J. Mol. Evol., 183, 651-662 (2010)
[44] Zhao, L.; Lascoux, M.; Overall, A.; Waxman, D., The characteristic trajectory of a fixing allele: a consequence of fictitious selection that arises from conditioning, Genetics, 195, 993-1006 (2013)
[45] Zhao, L.; Yue, X.; Waxman, D., Complete numerical solution of the diffusion equation of random genetic drift, Genetics, 194, 419-426 (2013)
[46] Zhao, L.; Yue, X.; Waxman, D., Exact solution of conditioned wright-fisher models, J. Theor. Biol., 194, 973-985 (2014)
[47] Zivkovic, D.; Steinrücken, M.; Song, Y.; Stephan, W., Transition densities and sample frequency spectra of diffusion processes with selection and variable population size, Genetics, 200, 601-617 (2015)
[48] Zivkovic, D.; Stephan, W., Analytical results on the neutral non-equilibrium allele frequency spectrum based on diffusion theory, Theor. Popul. Biol., 79, 184-191 (2011) · Zbl 1338.92083
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.