×

Adaptive preferential sampling in phylodynamics with an application to SARS-CoV-2. (English) Zbl 07547631

Summary: Longitudinal molecular data of rapidly evolving viruses and pathogens provide information about disease spread and complement traditional surveillance approaches based on case count data. The coalescent is used to model the genealogy that represents the sample ancestral relationships. The basic assumption is that coalescent events occur at a rate inversely proportional to the effective population size \(N_e(t)\), a time-varying measure of genetic diversity. When the sampling process (collection of samples over time) depends on \(N_e(t)\), the coalescent and the sampling processes can be jointly modeled to improve estimation of \(N_e(t)\). Failing to do so can lead to bias due to model misspecification. However, the way that the sampling process depends on the effective population size may vary over time. We introduce an approach where the sampling process is modeled as an inhomogeneous Poisson process with rate equal to the product of \(N_e(t)\) and a time-varying coefficient, making minimal assumptions on their functional shapes via Markov random field priors. We provide efficient algorithms for inference, show the model performance vis-a-vis alternative methods in a simulation study, and apply our model to SARS-CoV-2 sequences from Los Angeles and Santa Clara counties. The methodology is implemented and available in the R package adapref. Supplementary files for this article are available online.

MSC:

62-XX Statistics
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Bouckaert, R.; Vaughan, T. G.; Barido-Sottani, J.; Duchêne, S.; Fourment, M.; Gavryushkina, A.; Heled, J.; Jones, G.; Kühnert, D.; De Maio, N.; Matschiner, M.; Mendes, F. K.; Müller, N. F.; Ogilvie, H. A.; du Plessis, L.; Popinga, A.; Rambaut, A.; Rasmussen, D.; Siveroni, I.; Suchard, M. A.; Wu, C.-H.; Xie, D.; Zhang, C.; Stadler, T.; Drummond, A. J., “BEAST 2.5: An Advanced Software Platform for Bayesian Evolutionary Analysis, PLoS Computational Biology, 15, e1006650 (2019)
[2] Cappello, L., Veber, A. and Palacios, J. A. (2020), “The Tajima Heterochronous n-Coalescent: Inference From Heterochronously Sampled Molecular Data,” arXiv:2004.06826.
[3] Carpenter, B.; Gelman, A.; Hoffman, M. D.; Lee, D.; Goodrich, B.; Betancourt, M.; Brubaker, M.; Guo, J.; Li, P.; Riddell, A., Stan: A Probabilistic Programming Language, Journal of Statistical Software, 76 (2017)
[4] Carvalho, C. M.; Polson, N. G.; Scott, J. G., “The Horseshoe Estimator for Sparse Signals, Biometrika, 97, 465-480 (2010) · Zbl 1406.62021 · doi:10.1093/biomet/asq017
[5] Diggle, P. J.; Menezes, R.; Su, T.-l., “Geostatistical Inference Under Preferential Sampling, Journal of the Royal Statistical Society, Series C, 59, 191-232 (2010) · doi:10.1111/j.1467-9876.2009.00701.x
[6] Drummond, A. J.; Rambaut, A.; Shapiro, B.; Pybus, O. G., “Bayesian Coalescent Inference of Past Population Dynamics From Molecular Sequences, Molecular Biology and Evolution, 22, 1185-1192 (2005) · doi:10.1093/molbev/msi103
[7] Faria, N. R.; da Silva Azevedo, R. d. S.; Kraemer, M. U.; Souza, R.; Cunha, M. S.; Hill, S. C.; Thézé, J.; Bonsall, M. B.; Bowden, T. A.; Rissanen, I.; Rocco, I. M.; Nogueira, J. S.; Maeda, A. Y.; Vasami, F. G. D. S.; Macedo, F. L. L.; Suzuki, A.; Rodrigues, S. G.; Cruz, A. C. R.; Nunes, B. T.; Medeiros, D. B. A.; Rodrigues, D. S. G.; Queiroz, A. L. N.; da Silva, E. V. P.; Henriques, D. F.; da Rosa, E. S. T.; de Oliveira, C. S.; Martins, L. C.; Vasconcelos, H. B.; Casseb, L. M. N.; Simith, D. B.; Messina, J. P.; Abade, L.; Lourenço, J.; Alcantara, L. C. J.; de Lima, M. M.; Giovanetti, M.; Hay, S. I.; de Oliveira, R. S.; Lemos, P. D. S.; de Oliveira, L. F.; de Lima, C. P. S.; da Silva, S. P.; de Vasconcelos, J. M.; Franco, L.; Cardoso, J. F.; Vianez-Júnior, J. L. D. S. G.; Mir, D.; Bello, G.; Delatorre, E.; Khan, K.; Creatore, M.; Coelho, G. E.; de Oliveira, W. K.; Tesh, R.; Pybus, O. G.; Nunes, M. R. T.; Vasconcelos, P. F. C., “Zika Virus in the Americas: Early Epidemiological and Genetic Findings, Science, 352, 345-349 (2016) · doi:10.1126/science.aaf5036
[8] Faulkner, J. R.; Magee, A. F.; Shapiro, B.; Minin, V. N., “Horseshoe-Based Bayesian Nonparametric Estimation of Effective Population Size Trajectories, Biometrics, 76, 677-690 (2020) · Zbl 1468.62381 · doi:10.1111/biom.13276
[9] Felsenstein, J.; Rodrigo, A. G.; Crandall, Keith A., The Evolution of HIV, Coalescent Approaches to HIV Population Genetics, 233-272 (1999), Baltimore, MD: Johns Hopkins University Press, Baltimore, MD
[10] Frost, S. D.; Volz, E. M., “Viral Phylodynamics and the Search for an Effective Number of Infections, Philosophical Transactions of the Royal Society, Series B, 365, 1879-1890 (2010) · doi:10.1098/rstb.2010.0060
[11] Gire, S. K.; Goba, A.; Andersen, K. G.; Sealfon, R. S.; Park, D. J.; Kanneh, L.; Jalloh, S.; Momoh, M.; Fullah, M.; Dudas, G.; Wohl, S.; Moses, L. M.; Yozwiak, N. L.; Winnicki, S.; Matranga, C. B.; Malboeuf, C. M.; Qu, J.; Gladden, A. D.; Schaffner, S. F.; Yang, X.; Jiang, P. P.; Nekoui, M.; Colubri, A.; Coomber, M. R.; Fonnie, M.; Moigboi, A.; Gbakie, M.; Kamara, F. K.; Tucker, V.; Konuwa, E.; Saffa, S.; Sellu, J.; Jalloh, A. A.; Kovoma, A.; Koninga, J.; Mustapha, I.; Kargbo, K.; Foday, M.; Yillah, M.; Kanneh, F.; Robert, W.; Massally, J. L.; Chapman, S. B.; Bochicchio, J.; Murphy, C.; Nusbaum, C.; Young, S.; Birren, B. W.; Grant, D. S.; Scheiffelin, J. S.; Lander, E. S.; Happi, C.; Gevao, S. M.; Gnirke, A.; Rambaut, A.; Garry, R. F.; Khan, S. H.; Sabeti, P. C., “Genomic Surveillance Elucidates Ebola Virus Origin and Transmission During the 2014 Outbreak, Science, 345, 1369-1372 (2014) · doi:10.1126/science.1259657
[12] Grenfell, B. T.; Pybus, O. G.; Gog, J. R.; Wood, J. L.; Daly, J. M.; Mumford, J. A.; Holmes, E. C., “Unifying the Epidemiological and Evolutionary Dynamics of Pathogens, Science, 303, 327-332 (2004) · doi:10.1126/science.1090727
[13] Hadfield, J.; Megill, C.; Bell, S. M.; Huddleston, J.; Potter, B.; Callender, C.; Sagulenko, P.; Bedford, T.; Neher, R. A., “Nextstrain: Real-Time Tracking of Pathogen Evolution, Bioinformatics, 34, 4121-4123 (2018) · doi:10.1093/bioinformatics/bty407
[14] Hasegawa, M.; Kishino, H.; Yano, T., “Dating of the Human-Ape Splitting by a Molecular Clock of Mitochondrial DNA, Journal of Molecular Evolution, 2, 160-164 (1985) · doi:10.1007/BF02101694
[15] Heled, J.; Drummond, A. J., “Bayesian Inference of Population Size History From Multiple Loci, BMC Evolutionary Biology, 8, 289 (2008) · doi:10.1186/1471-2148-8-289
[16] Hudson, R. R., Gene Genealogies and the Coalescent Process, Oxford Surveys in Evolutionary Biology, 7, 1-44 (1990)
[17] Huff, C. D.; Xing, J.; Rogers, A. R.; Witherspoon, D.; Jorde, L. B., “Mobile Elements Reveal Small Population Size in the Ancient Ancestors of Homo Sapiens, Proceedings of the National Academy of Sciences, 107, 2147-2152 (2010) · doi:10.1073/pnas.0909000107
[18] Karcher, M. D.; Palacios, J. A.; Bedford, T.; Suchard, M. A.; Minin, V. N., “Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference, PLoS Computational Biology, 12, e1004789 (2016) · doi:10.1371/journal.pcbi.1004789
[19] Karcher, M. D.; Palacios, J. A.; Lan, S.; Minin, V. N., “Phylodyn: An R Package for Phylodynamic Simulation and Inference, Molecular Ecology Resources, 17, 96-100 (2017) · doi:10.1111/1755-0998.12630
[20] Karcher, M. D.; Suchard, M. A.; Dudas, G.; Minin, V. N., “Estimating Effective Population Size Changes From Preferentially Sampled Genetic Sequences, PloS Computational Biology, 16 (2020) · doi:10.1371/journal.pcbi.1007774
[21] Kim, S.-J.; Koh, K.; Boyd, S.; Gorinevsky, D., l_1 Trend Filtering, SIAM Eeview, 51, 339-360 (2009) · Zbl 1171.37033 · doi:10.1137/070690274
[22] Kingman, J. F., “On the Genealogy of Large Populations, Journal of Applied Probability, 19, 27-43 (1982) · Zbl 0516.92011 · doi:10.2307/3213548
[23] Kingman, J. F., “The Coalescent, Stochastic Processes and Their Applications, 13, 235-248 (1982) · Zbl 0491.60076
[24] Lan, S.; Palacios, J. A.; Karcher, M.; Minin, V. N.; Shahbaba, B., “An Efficient Bayesian Inference Framework for Coalescent-Based Nonparametric Phylodynamics, Bioinformatics, 31, 3282-3289 (2015) · doi:10.1093/bioinformatics/btv378
[25] Lewis, P. W.; Shedler, G. S., “Simulation of Nonhomogeneous Poisson Processes by Thinning, Naval Research Logistics Quarterly, 26, 403-413 (1979) · Zbl 0497.60003 · doi:10.1002/nav.3800260304
[26] Lorenzen, E. D.; Nogués-Bravo, D.; Orlando, L.; Weinstock, J.; Binladen, J.; Marske, K. A.; Ugan, A.; Borregaard, M. K.; Gilbert, M. T. P.; Nielsen, R.; Ho, S. Y.; Goebel, T.; Graf, K. E.; Byers, D.; Stenderup, J. T.; Rasmussen, M.; Campos, P. F.; Leonard, J. A.; Koepfli, K. P.; Froese, D.; Zazula, G.; Stafford, T. W. Jr; Aaris-Sørensen, K.; Batra, P.; Haywood, A. M.; Singarayer, J. S.; Valdes, P. J.; Boeskorov, G.; Burns, J. A.; Davydov, S. P.; Haile, J.; Jenkins, D. L.; Kosintsev, P.; Kuznetsova, T.; Lai, X.; Martin, L. D.; McDonald, H. G.; Mol, D.; Meldgaard, M.; Munch, K.; Stephan, E.; Sablin, M.; Sommer, RS; Sipko, T.; Scott, E.; Suchard, M. A.; Tikhonov, A.; Willerslev, R.; Wayne, R. K.; Cooper, A.; Hofreiter, M.; Sher, A.; Shapiro, B.; Rahbek, C.; Willerslev, E., Species-Specific Responses of Late Quaternary Megafauna to Climate and Humans, Nature, 479, 359-364 (2011) · doi:10.1038/nature10574
[27] Meredith, L. W.; Hamilton, W. L.; Warne, B.; Houldcroft, C. J.; Hosmillo, M.; Jahun, A. S.; Curran, M. D.; Parmar, S.; Caller, L. G.; Caddy, S. L.; Khokhar, F. A.; Yakovleva, A.; Hall, G.; Feltwell, T.; Forrest, S.; Sridhar, S.; Weekes, M. P.; Baker, S.; Brown, N.; Moore, E.; Popay, A.; Roddick, I.; Reacher, M.; Gouliouris, T.; Peacock, S. J.; Dougan, G.; Török, M. E.; Goodfellow, I., “Rapid Implementation of SARS-CoV-2 Sequencing to Investigate Cases of Health-Care Associated Covid-19: A Prospective Genomic Surveillance Study, The Lancet Infectious Diseases, 20, 1263-1272 (2020) · doi:10.1016/S1473-3099(20)30562-4
[28] Minin, V. N.; Bloomquist, E. W.; Suchard, M. A., “Smooth Skyride Through a Rough Skyline: Bayesian Coalescent-Based Inference of Population Dynamics, Molecular Biology and Evolution, 25, 1459-1471 (2008) · doi:10.1093/molbev/msn090
[29] Munnink, B. B. O.; Nieuwenhuijse, D. F.; Stein, M.; OToole, Á.; Haverkate, M.; Mollers, M.; Kamga, S. K.; Schapendonk, C.; Pronk, M.; Lexmond, P.; van der Linden, A.; Bestebroer, T.; Chestakova, I.; Overmars, R. J.; van Nieuwkoop, S.; Molenkamp, R.; van der Eijk, A. A.; GeurtsvanKessel, C.; Vennema, H.; Meijer, A.; Rambaut, A.; van Dissel, J.; Sikkema, R. S.; Timen, A.; Koopmans, M., “Rapid SARS-CoV-2 Whole-Genome Sequencing and Analysis for Informed Public Health Decision-Making in the Netherlands, Nature Medicine, 26, 1405-1410 (2020) · doi:10.1038/s41591-020-0997-y
[30] Palacios, J. A.; Minin, V. N., Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI’12), “Integrated Nested Laplace Approximation for Bayesian Nonparametric Phylodynamics,”, 726-735 (2012), Arlington, VA: AUAI Press, Arlington, VA
[31] Palacios, J. A.; Minin, V. N., “Gaussian Process-Based Bayesian Nonparametric Inference of Population Size Trajectories From Gene Genealogies, Biometrics, 69, 8-18 (2013) · Zbl 1274.62852 · doi:10.1111/biom.12003
[32] Parag, K. V.; du Plessis, L.; Pybus, O. G., “Jointly Inferring the Dynamics of Population Size and Sampling Intensity From Molecular Sequences, Molecular Biology and Evolution, 37, 2414-2429 (2020) · doi:10.1093/molbev/msaa016
[33] Parag, K. V.; Pybus, O. G., “Robust Design for Coalescent Model Inference, Systematic Biology, 68, 730-743 (2019) · doi:10.1093/sysbio/syz008
[34] Polonsky, J. A.; Baidjoe, A.; Kamvar, Z. N.; Cori, A.; Durski, K.; Edmunds, W. J.; Eggo, R. M.; Funk, S.; Kaiser, L.; Keating, P.; de Waroux, O. L. P.; Marks, M.; Moraga, P.; Morgan, O.; Nouvellet, P.; Ratnayake, R.; Roberts, C. H.; Whitworth, J.; Jombart, T., “Outbreak Analytics: A Developing Data Science for Informing the Response to Emerging Pathogens, Philosophical Transactions of the Royal Society, Series B, 374, 20180276 (2019) · doi:10.1098/rstb.2018.0276
[35] Pybus, O. G.; Rambaut, A.; Harvey, P. H., “An Integrated Framework for the Inference of Viral Population History From Reconstructed Genealogies, Genetics, 155, 1429-1437 (2000) · doi:10.1093/genetics/155.3.1429
[36] Rambaut, A.; Pybus, O. G.; Nelson, M. I.; Viboud, C.; Taubenberger, J. K.; Holmes, E. C., “The Genomic and Epidemiological Dynamics of Human Influenza A Virus, Nature, 453, 615-619 (2008) · doi:10.1038/nature06945
[37] Rothenberg, T. J., “Identification in Parametric Models, Econometrica: Journal of the Econometric Society, 39, 577-591 (1971) · Zbl 0231.62081 · doi:10.2307/1913267
[38] Rue, H.; Held, L., Gaussian Markov Random Fields: Theory and Applications (2005), London: Chapman and Hall-CRC Press, London · Zbl 1093.60003
[39] Rue, H.; Martino, S.; Chopin, N., “Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested Laplace Approximations, Journal of the Royal Statistical Society, Series B, 71, 319-392 (2009) · Zbl 1248.62156 · doi:10.1111/j.1467-9868.2008.00700.x
[40] Sainudiin, R.; Stadler, T.; Véber, A., “Finding the Best Resolution for the Kingman-Tajima Coalescent: Theory and Applications, Journal of Mathematical Biology, 70, 1207-1247 (2015) · Zbl 1342.92144 · doi:10.1007/s00285-014-0796-5
[41] Shapiro, B.; Drummond, A. J.; Rambaut, A.; Wilson, M. C.; Matheus, P. E.; Sher, A. V.; Pybus, O. G.; Gilbert, M. T. P.; Barnes, I.; Binladen, J.; Willerslev, E.; Hansen, A. J.; Baryshnikov, G. F.; Burns, J. A.; Davydov, S.; Driver, J. C.; Froese, D. G.; Harington, C. R.; Keddie, G.; Kosintsev, P.; Kunz, M. L.; Martin, L. D.; Stephenson, R. O.; Storer, J.; Tedford, R.; Zimov, S.; Cooper, A., “Rise and Fall of the Beringian Steppe Bison, Science, 306, 1561-1565 (2004) · doi:10.1126/science.1101074
[42] Shu, Y.; McCauley, J., “GISAID: Global Initiative on Sharing All Influenza Data-From Vision to Reality, Eurosurveillance, 22, 30494 (2017) · doi:10.2807/1560-7917.ES.2017.22.13.30494
[43] Slatkin, M.; Hudson, R., “Pairwise Comparisons of Mitochondrial DNA Sequences in Stable and Exponentially Growing Populations, Genetics, 129, 555-562 (1991) · doi:10.1093/genetics/129.2.555
[44] Stadler, T., Sampling-Through-Time in Birth-Death Trees, Journal of Theoretical Biology, 267, 396-404 (2010) · Zbl 1410.92080 · doi:10.1016/j.jtbi.2010.09.010
[45] Stadler, T.; Kühnert, D.; Bonhoeffer, S.; Drummond, A. J., Birth-Death Skyline Plot Reveals Temporal Changes of Epidemic Spread in HIV and Hepatitis C Virus (HCV), Proceedings of the National Academy of Sciences, 110, 228-233 (2013)
[46] Strimmer, K.; Pybus, O. G., “Exploring the Demographic History of DNA Sequences Using the Generalized Skyline Plot, Molecular Biology and Evolution, 18, 2298-2305 (2001) · doi:10.1093/oxfordjournals.molbev.a003776
[47] Volz, E. M.; Frost, S. D., “Sampling Through Time and Phylodynamic Inference With Coalescent and Birth-Death Models, Journal of The Royal Society Interface, 11, 20140945 (2014) · doi:10.1098/rsif.2014.0945
[48] Volz, E. M.; Pond, S. L. K.; Ward, M. J.; Brown, A. J. L.; Frost, S. D., “Phylodynamics of Infectious Disease Epidemics, Genetics, 183, 1421-1430 (2009) · doi:10.1534/genetics.109.106021
[49] Wakeley, J., Coalescent Theory: An Introduction (2009), Greenwood Village, Colorado: Roberts and Co, Greenwood Village, Colorado · Zbl 1366.92001
[50] WHO (2021), “Genomic Sequencing of SARS-CoV-2: A Guide to Implementation for Maximum Impact on Public Health, 8 January 2021”.
[51] Wu, F.; Zhao, S.; Yu, B.; Chen, Y.-M.; Wang, W.; Song, Z.-G.; Hu, Y.; Tao, Z.-W.; Tian, J.-H.; Pei, Y.-Y., “A New Coronavirus Associated With Human Respiratory Disease in China, Nature, 579, 265-269 (2020) · doi:10.1038/s41586-020-2008-3
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.