×

Fitting stochastic epidemic models to gene genealogies using linear noise approximation. (English) Zbl 07656964

Summary: Phylodynamics is a set of population genetics tools that aim at reconstructing demographic history of a population based on molecular sequences of individuals sampled from the population of interest. One important task in phylodynamics is to estimate changes in (effective) population size. When applied to infectious disease sequences, such estimation of population size trajectories can provide information about changes in the number of infections. To model changes in the number of infected individuals, current phylodynamic methods use nonparametric approaches (e.g., Bayesian curve-fitting based on change-point models or Gaussian process priors), parametric approaches (e.g., based on differential equations), and stochastic modeling in conjunction with likelihood-free Bayesian methods. The first class of methods yields results that are hard to interpret epidemiologically. The second class of methods provides estimates of important epidemiological parameters, such as infection and removal/recovery rates, but ignores variation in the dynamics of infectious disease spread. The third class of methods is the most advantageous statistically but relies on computationally intensive particle filtering techniques that limits its applications. We propose a Bayesian model that combines phylodynamic inference and stochastic epidemic models and achieves computational tractability by using a linear noise approximation (LNA) – a technique that allows us to approximate probability densities of stochastic epidemic model trajectories. LNA opens the door for using modern Markov chain Monte Carlo tools to approximate the joint posterior distribution of the disease transmission parameters and of high dimensional vectors describing unobserved changes in the stochastic epidemic model compartment sizes (e.g., numbers of infectious and susceptible individuals). In a simulation study we show that our method can successfully recover parameters of stochastic epidemic models. We apply our estimation technique to Ebola genealogies estimated using viral genetic data from the 2014 epidemic in Sierra Leone and Liberia.

MSC:

62Pxx Applications of statistics

Software:

BEAST; GMRFLib

References:

[1] ALTHAUS, C. L. (2014). Estimating the reproduction number of Ebola virus (EBOV) during the 2014 outbreak in West Africa. PLoS Curr. 6. · doi:10.1371/currents.outbreaks.91afb5e0f279e7f29e7056095255b288
[2] ANDERSON, R. and MAY, R. (1992). Infectious Diseases of Humans: Dynamics and Control 28. Wiley, New York.
[3] BAILEY, N. T. J. (1975). The Mathematical Theory of Infectious Diseases and Its Applications, 2nd ed. Hafner Press, New York. · Zbl 0334.92024
[4] BOUCKAERT, R., HELED, J., KÜHNERT, D., VAUGHAN, T., WU, C., XIE, D., SUCHARD, M., RAMBAUT, A. and DRUMMOND, A. (2014). BEAST 2: A software platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 10 1-6.
[5] BUCKINGHAM-JEFFERY, E., ISHAM, V. and HOUSE, T. (2018). Gaussian process approximations for fast inference from infectious disease data. Math. Biosci. 301 111-120. · Zbl 1392.92097 · doi:10.1016/j.mbs.2018.02.003
[6] DEARLOVE, B. and WILSON, D. (2013). Coalescent inference for infectious disease: Meta-analysis of hepatitis C. Philos. Trans. R. Soc. B 368 20120314.
[7] DONNELLY, P. and TAVARE, S. (1995). Coalescents and genealogical structure under neutrality. Annu. Rev. Genet. 29 401-421.
[8] DRUMMOND, A., NICHOLLS, G., RODRIGO, A. and SOLOMON, W. (2002). Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics 161 1307-1320.
[9] DRUMMOND, A., RAMBAUT, A., SHAPIRO, B. and PYBUS, O. (2005). Bayesian coalescent inference of past population dynamics from molecular sequences. Mol. Biol. Evol. 22 1185-1192.
[10] DUDAS, G., CARVALHO, L., BEDFORD, T., TATEM, A., BAELE, G., FARIA, N., PARK, D., LADNER, J., ARIAS, A. et al. (2017). Virus genomes reveal factors that spread and sustained the Ebola epidemic. Nature 544 309-315.
[11] FEARNHEAD, P., GIAGOS, V. and SHERLOCK, C. (2014). Inference for reaction networks using the linear noise approximation. Biometrics 70 457-466. · Zbl 1419.62346 · doi:10.1111/biom.12152
[12] FROST, S. D. and VOLZ, E. M. (2010). Viral phylodynamics and the search for an ‘effective number of infections’. Philos. Trans. R. Soc. Lond. B, Biol. Sci. 365 1879-1890.
[13] GIAGOS, V. (2010). Inference for Auto-Regulatory Genetic Networks Using Diffusion Process Approximations Ph.D. thesis Lancaster Univ.
[14] GILL, M., LEMEY, P., FARIA, N., RAMBAUT, A., SHAPIRO, B. and SUCHARD, M. (2013). Improving Bayesian population dynamics inference: A coalescent-based model for multiple loci. Mol. Biol. Evol. 30 713-724.
[15] GILLESPIE, D. (1977). Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81 2340-2361.
[16] GRENFELL, B., PYBUS, O., GOG, J., WOOD, J., DALY, J., MUMFORD, J. and HOLMES, E. (2004). Unifying the epidemiological and evolutionary dynamics of pathogens. Science 303 327-332.
[17] GRIFFITHS, R. and TAVARÉ, S. (1994). Sampling theory for neutral alleles in a varying environment. Philos. Trans. R. Soc. Lond. B, Biol. Sci. 344 403-410.
[18] HÖHNA, S., LANDIS, M., HEATH, T., BOUSSAU, B., LARTILLOT, N., MOORE, B., HUELSENBECK, J. and RONQUIST, F. (2016). RevBayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Syst. Biol. 65 726-736.
[19] JOMBART, T., CORI, A., DIDELOT, X., CAUCHEMEZ, S., FRASER, C. and FERGUSON, N. (2014). Bayesian reconstruction of disease outbreaks by combining epidemiologic and genomic data. PLoS Comput. Biol. 10 e1003457. · doi:10.1371/journal.pcbi.1003457
[20] KARCHER, M., PALACIOS, J., BEDFORD, T., SUCHARD, M. and MININ, V. (2016). Quantifying and mitigating the effect of preferential sampling on phylodynamic inference. PLoS Comput. Biol. 12 e1004789.
[21] KEELING, M. and ROHANI, P. (2011). Modeling Infectious Diseases in Humans and Animals. Princeton Univ. Press, Princeton, NJ. · Zbl 1279.92038
[22] Kingman, J. F. C. (1982). The coalescent. Stochastic Process. Appl. 13 235-248. · Zbl 0491.60076 · doi:10.1016/0304-4149(82)90011-4
[23] KLINKENBERG, D., BACKER, J. A., DIDELOT, X., COLIJN, C. and WALLINGA, J. (2017). Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks. PLoS Comput. Biol. 13 e1005495. · doi:10.1371/journal.pcbi.1005495
[24] KOMOROWSKI, M., FINKENSTÄDT, B., HARPER, C. V. and RAND, D. A. (2009). Bayesian inference of biochemical kinetic parameters using the linear noise approximation. BMC Bioinform. 10 343. · doi:10.1186/1471-2105-10-343
[25] KUHNER, M., YAMATO, J. and FELSENSTEIN, J. (1998). Maximum likelihood estimation of population growth rates based on the coalescent. Genetics 149 429-434.
[26] KÜHNERT, D., STADLER, T., VAUGHAN, T. G. and DRUMMOND, A. J. (2014). Simultaneous reconstruction of evolutionary history and epidemiological dynamics from viral sequences with the birth-death SIR model. J. R. Soc. Interface 11 20131106. · doi:10.1098/rsif.2013.1106
[27] Kurtz, T. G. (1970). Solutions of ordinary differential equations as limits of pure jump Markov processes. J. Appl. Probab. 7 49-58. · Zbl 0191.47301 · doi:10.2307/3212147
[28] Kurtz, T. G. (1971). Limit theorems for sequences of jump Markov processes approximating ordinary differential processes. J. Appl. Probab. 8 344-356. · Zbl 0219.60060 · doi:10.1017/s002190020003535x
[29] LEVENTHAL, G., GÜNTHARD, H., BONHOEFFER, S. and STADLER, T. (2013). Using an epidemiological model for phylogenetic inference reveals density dependence in HIV transmission. Mol. Biol. Evol. 31 6-17.
[30] MININ, V., BLOOMQUIST, E. and SUCHARD, M. (2008). Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. Mol. Biol. Evol. 25 1459-1471.
[31] Müller, N. F., Rasmussen, D. A. and Stadler, T. (2017). The structured coalescent and its approximations. Mol. Biol. Evol. 34 2970-2981. · doi:10.1093/molbev/msx186
[32] MURRAY, I., ADAMS, R. and MACKAY, D. (2010). Elliptical slice sampling. In AISTATS 13 541-548.
[33] O’NEILL, P. and ROBERTS, G. (1999). Bayesian inference for partially observed stochastic epidemics. J. Roy. Statist. Soc. Ser. A 162 121-129.
[34] PALACIOS, J. A. and MININ, V. N. (2013). Gaussian process-based Bayesian nonparametric inference of population size trajectories from gene genealogies. Biometrics 69 8-18. · Zbl 1274.62852 · doi:10.1111/biom.12003
[35] PAPASPILIOPOULOS, O., ROBERTS, G. O. and SKÖLD, M. (2007). A general framework for the parametrization of hierarchical models. Statist. Sci. 22 59-73. · Zbl 1246.62195 · doi:10.1214/088342307000000014
[36] PYBUS, O., CHARLESTON, M., GUPTA, S., RAMBAUT, A., HOLMES, E. and HARVEY, P. (2001). The epidemic behavior of the hepatitis C virus. Science 292 2323-2325.
[37] RASMUSSEN, D. A., RATMANN, O. and KOELLE, K. (2011). Inference for nonlinear epidemiological models using genealogies and time series. PLoS Comput. Biol. 7 e1002136. · doi:10.1371/journal.pcbi.1002136
[38] RASMUSSEN, D. A., VOLZ, E. M. and KOELLE, K. (2014). Phylodynamic inference for structured epidemiological models. PLoS Comput. Biol. 10 e1003570. · doi:10.1371/journal.pcbi.1003570
[39] RUE, H. (2001). Fast sampling of Gaussian Markov random fields. J. R. Stat. Soc. Ser. B. Stat. Methodol. 63 325-338. · Zbl 0979.62075 · doi:10.1111/1467-9868.00288
[40] Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theory and Applications. Monographs on Statistics and Applied Probability 104. CRC Press/CRC, Boca Raton, FL. · Zbl 1093.60003 · doi:10.1201/9780203492024
[41] SCARPINO, S., IAMARINO, A., WELLS, C., YAMIN, D., NDEFFO-MBAH, M., WENZEL, N., FOX, S., NYENSWAH, T., ALTICE, F. et al. (2014). Epidemiological and viral genomic sequence analysis of the 2014 Ebola outbreak reveals clustered transmission. Clin. Infect. Dis. 60 1079-1082.
[42] SMITH, R. A., IONIDES, E. L. and KING, A. A. (2017). Infectious disease dynamics inferred from genetic data via sequential Monte Carlo. Mol. Biol. Evol. 34 2065-2084. · doi:10.1093/molbev/msx124
[43] STADLER, T., KÜHNERT, D., BONHOEFFER, S. and DRUMMOND, A. (2013). Birth-death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV). Proc. Natl. Acad. Sci. USA 110 228-233.
[44] STADLER, T., KÜHNERT, D., RASMUSSEN, D. and DU PLESSIS, L. (2014). Insights into the early epidemic spread of Ebola in Sierra Leone provided by viral sequence data. PLoS Curr. 6.
[45] Suchard, M. A., Lemey, P., Baele, G., Ayres, D. L., Drummond, A. J. and Rambaut, A. (2018). Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4 vey016. · doi:10.1093/ve/vey016
[46] TANG, M., DUDAS G., BEDFORD, T. and N. MININ, V. (2023). Supplement to “Fitting stochastic epidemic models to gene genealogies using linear noise approximation.” https://doi.org/10.1214/21-AOAS1583SUPPA, https://doi.org/10.1214/21-AOAS1583SUPPB
[47] TEAM, W. E. R. (2014). Ebola virus disease in West Africa—the first 9 months of the epidemic and forward projections. N. Engl. J. Med. 371 1481-1495.
[48] TOWERS, S., PATTERSON-LOMBA, O. and CASTILLO-CHAVEZ, C. (2014). Temporal variations in the effective reproduction number of the 2014 West Africa Ebola outbreak. PLoS Curr. 6. · doi:10.1371/currents.outbreaks.9e4c4294ec8ce1adad283172b16bc908
[49] VAN KAMPEN, N. and REINHARDT, W. (1981). Stochastic Processes in Physics and Chemistry. North-Holland, Amsterdam. · Zbl 0511.60038
[50] VAUGHAN, T. G., LEVENTHAL, G. E., RASMUSSEN, D. A., DRUMMOND, A. J., WELCH, D. and STADLER, T. (2019). Estimating epidemic incidence and prevalence from genomic data. Mol. Biol. Evol. 36 1804-1816. · doi:10.1093/molbev/msz106
[51] VOLZ, E. M. (2012). Complex population dynamics and the coalescent under neutrality. Genetics 190 187-201. · doi:10.1534/genetics.111.134627
[52] VOLZ, E. M., KOELLE, K. and BEDFORD, T. (2013b). Viral phylodynamics. PLoS Comput. Biol. 9 e1002947. · doi:10.1371/journal.pcbi.1002947
[53] VOLZ, E. and POND, S. (2014). Phylodynamic analysis of Ebola virus in the 2014 Sierra Leone epidemic. PLoS Curr. 6. · doi:10.1371/currents.outbreaks.6f7025f1271821d4c815385b08f5f80e
[54] VOLZ, E. and SIVERONI, I. (2018). Bayesian phylodynamic inference with complex models. PLoS Comput. Biol. 14 e1006546.
[55] VOLZ, E., POND, S., WARD, M., BROWN, A. and FROST, S. (2009). Phylodynamics of infectious disease epidemics. Genetics 183 1421-1430.
[56] WALLACE, E. (2010). A simplified derivation of the linear noise approximation. Arxiv preprint. Available at arXiv:1004.4280.
[57] WEARING, H. J., ROHANI, P. and KEELING, M. J. (2005). Appropriate models for the management of infectious diseases. PLoS Med. 2 e174. · doi:10.1371/journal.pmed.0020174
[58] WILKINSON, D. (2011). Stochastic Modelling for Systems Biology. CRC press, Boca Raton, FL.
[59] WRIGHT, S. (1931). Evolution in Mendelian populations. Genetics 16 97-159. · doi:10.1093/genetics/16.2.97
[60] XU, X., KYPRAIOS, T. and O’NEILL, P. D. (2016). Bayesian non-parametric inference for stochastic epidemic models using Gaussian processes. Biostatistics 17 619-633. · doi:10.1093/biostatistics/kxw011
[61] YPMA, R. J. F., VAN BALLEGOOIJEN, W. M. and WALLINGA, J. (2013). Relating phylogenetic trees to transmission trees of infectious disease outbreaks. Genetics 195 1055-1062.
[62] CENTERS FOR DISEASE CONTROL AND PREVENTION (2019). 2014-2016 Ebola outbreak in West Africa. https://www.cdc.gov/vhf/ebola/history/2014-2016-outbreak/index.html. Last accessed: Oct, 09, 2022.
[63] WORLD HEALTH ORGANIZATION (2016). Ebola data and statistics. http://apps.who.int/gho/data/node.ebola-sitrep.quick-downloads?lang=en. Last accessed: February 28, 2018
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.