×

AABC: approximate approximate Bayesian computation for inference in population-genetic models. (English) Zbl 1331.92011

Summary: Approximate Bayesian computation (ABC) methods perform inference on model-specific parameters of mechanistically motivated parametric models when evaluating likelihoods is difficult. Central to the success of ABC methods, which have been used frequently in biology, is computationally inexpensive simulation of data sets from the parametric model of interest. However, when simulating data sets from a model is so computationally expensive that the posterior distribution of parameters cannot be adequately sampled by ABC, inference is not straightforward. We present “approximate approximate Bayesian computation” (AABC), a class of computationally fast inference methods that extends ABC to models in which simulating data is expensive. In AABC, we first simulate a number of data sets small enough to be computationally feasible to simulate from the parametric model. Conditional on these data sets, we use a statistical model that approximates the correct parametric model and enables efficient simulation of a large number of data sets. We show that under mild assumptions, the posterior distribution obtained by AABC converges to the posterior distribution obtained by ABC, as the number of data sets simulated from the parametric model and the sample size of the observed data set increase. We demonstrate the performance of AABC on a population-genetic model of natural selection, as well as on a model of the admixture history of hybrid populations. This latter example illustrates how, in population genetics, AABC is of particular utility in scenarios that rely on conceptually straightforward but potentially slow forward-in-time simulations.

MSC:

92B15 General biostatistics
92-04 Software, source code, etc. for problems pertaining to biology
92D10 Genetics and epigenetics
62F15 Bayesian inference

Software:

Frappe; AABC; STRUCTURE
PDFBibTeX XMLCite
Full Text: DOI arXiv Link

References:

[1] Beaumont, M. A., Estimation of population growth or decline in genetically monitored populations, Genetics, 164, 1139-1160 (2003)
[2] Beaumont, M. A.; Cornuet, J.-M.; Marin, J.-M.; Robert, C. P., Adaptive approximate Bayesian computation, Biometrika, 96, 983-990 (2009) · Zbl 1437.62393
[3] Beaumont, M. A.; Zhang, W.; Balding, D. J., Approximate Bayesian computation in population genetics, Genetics, 162, 2025-2035 (2002)
[4] Becquet, C.; Przeworski, M., A new approach to estimate parameters of speciation models with application to apes, Genome Res., 17, 1505-1519 (2007)
[5] Blum, M. G.B.; François, O., Non-linear regression models for approximate Bayesian computation, Stat. Comput., 20, 63-73 (2010)
[6] Blum, M. G.B.; Jakobsson, M., Deep divergences of human gene trees and models of human origins, Mol. Biol. Evol., 28, 889-898 (2010)
[7] Blum, M. G.B.; Nunes, M. A.; Prangle, D.; Sisson, S. A., A comparative review of dimension reduction methods in approximate Bayesian computation, Statist. Sci., 28, 189-208 (2013) · Zbl 1331.62123
[8] Bonassi, F. V.; Lingchong, Y.; West, M., Bayesian learning from marginal data in bionetwork models, Stat. Appl. Genet. Mol. Biol., 10 (2011), Article 1
[9] Buerkle, C. A.; Lexer, C., Admixture as the basis for genetic mapping, Trends Ecol. Evol., 23, 686-694 (2008)
[10] Estoup, A.; Beaumont, M. A.; Sennedot, F.; Moritz, C.; Cornuet, J.-M., Genetic analysis of complex demographic scenarios: spatially expanding populations of the cane toad, Bufo marinus, Evolution, 58, 2021-2036 (2004)
[11] Fagundes, N. J.R.; Ray, N.; Beaumont, M. A.; Neuenschwander, S.; Salzano, F. M.; Bonatto, S. L.; Excoffier, L., Statistical evaluation of alternative models of human evolution, Proc. Natl. Acad. Sci., 104, 17614-17619 (2007)
[12] Falush, D.; Stephens, M.; Pritchard, J. K., Inference of population structure using multilocus genetic data: linked loci and correlated allele frequencies, Genetics, 164, 1567-1587 (2003)
[13] Fearnhead, P.; Prangle, D., Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation, J. R. Stat. Soc. Ser. B Stat. Methodol., 74, 1-28 (2012) · Zbl 1411.62057
[14] François, O.; Blum, M. G.B.; Jakobsson, M.; Rosenberg, N. A., Demographic history of European populations Arabidopsis thaliana, PLoS Genet., 4, e1000075 (2008)
[15] Genz, A.; Joyce, P., Computation of the normalizing constant for exponentially weighted Dirichlet distribution integrals, Comput. Sci. Statist., 35, 557-563 (2003)
[16] Grelaud, A.; Robert, C. P.; Marin, J.-M.; Rodolphe, F.; Taly, J.-F., ABC likelihood-free methods for model choice in Gibbs random fields, Bayesian Anal., 4, 317-336 (2009) · Zbl 1330.62126
[17] Joyce, P.; Genz, A.; Buzbas, E. O., Efficient simulation and likelihood methods for non-neutral multi-allele models, J. Comput. Biol., 19, 650-661 (2012)
[18] Joyce, P.; Marjoram, P., Approximately sufficient statistics and Bayesian computation, Stat. Appl. Genet. Mol. Biol., 7 (2008), Article 26 · Zbl 1276.62077
[19] Kennedy, M. C.; O’Hagan, A., Bayesian calibration of computer models, J. R. Stat. Soc. Ser. B Stat. Methodol., 63, 425-464 (2001) · Zbl 1007.62021
[20] Kotz, S.; Balakrishnan, N.; Johnson, N. L., Continuous Multivariate Distributions (2000), Wiley-Interscience: Wiley-Interscience New York · Zbl 0946.62001
[21] Liu, J. S., Monte Carlo Strategies in Scientific Computing (2008), Springer: Springer New York · Zbl 1132.65003
[22] Marjoram, P.; Molitor, J.; Plagnol, V.; Tavaré, S., Markov chain Monte Carlo without likelihoods, Proc. Natl. Acad. Sci., 100, 15324-15328 (2003)
[23] Nunes, M. A.; Balding, D. J., On optimal selection of summary statistics for approximate Bayesian computation, Stat. Appl. Genet. Mol. Biol., 34 (2010), Article 34 · Zbl 1304.92047
[24] Plagnol, V.; Tavaré, S., Approximate Bayesian computation and MCMC, (Niederreiter, H., Monte Carlo and Quasi-Monte Carlo Methods (2004), Springer-Verlag), 99-114 · Zbl 1041.65011
[25] Pritchard, J. K.; Seielstad, M. T.; Perez-Lezaun, A.; Feldman, M. W., Population growth of human Y chromosomes: a study of Y chromosome microsatellites, Mol. Biol. Evol., 16, 1791-1798 (1999)
[26] Pritchard, J. K.; Stephens, M.; Donnelly, P., Inference on population structure using multilocus genotype data, Genetics, 155, 945-959 (2000)
[27] Ratmann, O.; Andrieu, C.; Wiuf, C.; Richardson, S., Model criticism based on likelihood-free inference, with an application to protein network evolution, Proc. Natl. Acad. Sci., 106, 10576-10581 (2009)
[28] Robert, C. P.; Casella, G., Monte Carlo Statistical Methods (2004), Springer: Springer New York · Zbl 1096.62003
[29] Robert, C. P.; Cornuet, J.-M.; Marin, J.-M.; Pillai, N. S., Lack of confidence in approximate Bayesian computation model choice, Proc. Natl. Acad. Sci., 108, 15112-15117 (2011)
[30] Siegmund, K. D.; Marjoram, P.; Shibata, D., Modeling DNA methylation in a population of cancer cells, Stat. Appl. Genet. Mol. Biol., 7 (2008), Article 18 · Zbl 1276.92092
[31] Sisson, S. A.; Fan, Y.; Tanaka, M. M., Sequential Monte Carlo without likelihoods, Proc. Natl. Acad. Sci., 104, 1760-1765 (2007) · Zbl 1160.65005
[32] Sisson, S. A.; Fan, Y.; Tanaka, M. M., Correction for Sisson et al., Sequential Monte Carlo without likelihoods, Proc. Natl. Acad. Sci., 106, 16889 (2009)
[34] Tang, H.; Peng, J.; Wang, P.; Risch, N. J., Estimation of individual admixture: analytical and study design considerations, Genet. Epidemiol., 28, 289-301 (2005)
[35] Tavaré, S., Ancestral inference for branching processes, (Haccou, P.; Jagers, P.; Vatutin, V., Branching Processes in Biology: Variation, Growth, Extinction (2005), Cambridge University Press: Cambridge University Press Cambridge), 208-217
[36] Tavaré, S.; Balding, D. J.; Griffiths, R. C.; Donnelly, P., Inferring coalescence times from DNA sequence data, Genetics, 145, 505-518 (1997)
[37] Verdu, P.; Austerlitz, F.; Estoup, A.; Vitalis, R.; Georges, M.; Théry, S.; Froment, A.; Le Bomin, S.; Gessain, A.; Hombert, J.-M.; Van der Veen, L.; Quintana-Murci, L.; Bahuchet, S.; Heyer, E., Origins and genetic diversity of Pygmy hunter-gatherers from western Central Africa, Curr. Biol., 19, 312-318 (2009)
[38] Verdu, P.; Rosenberg, N. A., A general mechanistic model for admixture histories of hybrid populations, Genetics, 189, 1413-1426 (2011)
[39] Wegmann, D.; Leuenberger, C.; Excoffier, L., Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood, Genetics, 182, 1207-1218 (2009)
[40] Wilkinson, R. D., Approximate Bayesian computation (ABC) gives exact results under the assumption of model error, Stat. Appl. Genet. Mol. Biol., 12, 129-141 (2008)
[41] Wilkinson, R. D.; Steiper, M.; Soligo, C.; Martin, R.; Yang, Z.; Tavaré, S., Dating primate divergences through an integrated analysis of palaeontological and molecular data, Syst. Biol., 60, 16-31 (2010)
[42] Wright, S., Adaptation and selection, (Jepson, G. L.; Simpson, G. G.; Mayr., E., Genetics, Paleontology, and Evolution (1949), Princeton University Press: Princeton University Press Princeton, NJ)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.