Reconstructing transmission trees for communicable diseases using densely sampled genetic data. (English) Zbl 1358.62110

Summary: Whole genome sequencing of pathogens from multiple hosts in an epidemic offers the potential to investigate who infected whom with unparalleled resolution, potentially yielding important insights into disease dynamics and the impact of control measures. We considered disease outbreaks in a setting with dense genomic sampling, and formulated stochastic epidemic models to investigate person-to-person transmission, based on observed genomic and epidemiological data. We constructed models in which the genetic distance between sampled genotypes depends on the epidemiological relationship between the hosts. A data-augmented Markov chain Monte Carlo algorithm was used to sample over the transmission trees, providing a posterior probability for any given transmission route. We investigated the predictive performance of our methodology using simulated data, demonstrating high sensitivity and specificity, particularly for rapidly mutating pathogens with low transmissibility. We then analyzed data collected during an outbreak of methicillin-resistant Staphylococcus aureus in a hospital, identifying probable transmission routes and estimating epidemiological parameters. Our approach overcomes limitations of previous methods, providing a framework with the flexibility to allow for unobserved infection times, multiple independent introductions of the pathogen and within-host genetic diversity, as well as allowing forward simulation.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62F15 Bayesian inference
92C60 Medical epidemiology


Full Text: DOI arXiv Euclid


[1] Albrich, W. C. and Harbarth, S. (2008). Health-care workers: Source, vector, or victim of MRSA? Lancet , Infect. Dis. 8 289-301.
[2] Bryant, J. M., Schürch, A. C., van Deutekom, H., Harris, S. R., de Beer, J. L., de Jager, V., Kremer, K., van Hijum, S. A. F. T., Siezen, R. J., Borgdorff, M., Bentley, S. D., Parkhill, J. and van Soolingen, D. (2013). Inferring patient to patient transmission of mycobacterium tuberculosis from whole genome sequencing data. BMC Infect. Dis. 13 1-12.
[3] Cooper, B. S., Medley, G. F. and Scott, G. M. (1999). Preliminary analysis of the transmission dynamics of nosocomial infections: Stochastic and management effects. J. Hosp. Infect. 43 131-147.
[4] Cottam, E. M., Thébaud, G., Wadsworth, J., Gloster, J., Mansley, L., Paton, D. J., King, D. P. and Haydon, D. T. (2008). Integrating genetic and epidemiological data to determine transmission pathways of foot-and-mouth disease virus. Proceedings of the Royal Society ( Series B ) 275 887-895.
[5] Gardy, J. L., Johnston, J. C., Ho Sui, S. J., Cook, V. J., Shah, L., Brodkin, E., Rempel, S., Moore, R., Zhao, Y., Holt, R., Varhol, R., Birol, I., Lem, M., Sharma, M. K., Elwood, K., Jones, S. J. M., Brinkman, F. S. L., Brunham, R. C. and Tang, P. (2011). Whole-genome sequencing and social-network analysis of a tuberculosis outbreak. New England Journal of Medicine 364 730-739.
[6] Harris, S. R., Feil, E. J., Holden, M. T. G., Quail, M. A., Nickerson, E. K., Chantratita, N., Gardete, S., Tavares, A., Day, N., Lindsay, J. A., Edgeworth, J. D., de Lencastre, H., Parkhill, J., Peacock, S. J. and Bentley, S. D. (2010). Evolution of MRSA during hospital transmission and intercontinental spread. Science 327 469-474.
[7] Harris, S. R., Cartwright, E. J. P., Török, M. E., Holden, M. T. G., Brown, N. M., Ogilvy-Stuart, A. L., Ellington, M. J., Quail, M. A., Bentley, S. D., Parkhill, J. and Peacock, S. J. (2013). Whole-genome sequencing for analysis of an outbreak of meticillin-resistant Staphyloccus aureus: A descriptive study. Lancet , Infect. Dis. 13 130-136.
[8] Jombart, T., Eggo, R. M., Dodd, P. J. and Balloux, F. (2011). Reconstructing disease outbreaks from genetic data: A graph approach. Heredity ( Edinb ) 106 383-390.
[9] Jombart, T., Cori, A., Didelot, X., Cauchemez, S., Fraser, C. and Ferguson, N. (2014). Bayesian reconstruction of disease outbreaks by combining epidemiologic and genomic data. PLoS Comput. Biol. 10 e1003457.
[10] Köser, C. U., Holden, M. T. G., Ellington, M. J., Cartwright, E. J. P., Brown, N. M., Ogilvy-Stuart, A. L., Yang Hsu, L., Chewapreecha, C., Croucher, N. J., Harris, S. R., Sanders, M., Enright, M. C., Dougan, G., Bentley, S. D., Parkhill, J., Fraser, L. J., Betley, J. R., Schulz-Trieglaff, O. B., Smith, G. P. and Peacock, S. J. (2012). Rapid whole-genome sequencing for investigation of a neonatal MRSA outbreak. New England Journal of Medicine 366 2267-2275.
[11] Krzanowski, W. J. and Hand, D. J. (2009). ROC Curves for Continuous Data. Monographs on Statistics and Applied Probability 111 . CRC Press, Boca Raton, FL. · Zbl 1288.62005 · doi:10.1201/9781439800225
[12] Kypraios, T., O’Neill, P. D., Huang, S. S., Rifas-Shiman, S. L. and Cooper, B. (2010). Assessing the role of undetected colonisation and isolation precautions in reducing methicillin-resistant Staphyloccus aureus transmission in intensive care units. BMC Infect. Dis. 10 .
[13] Mollentze, N., Nel, L. H., Townsend, S., le Roux, K., Hampson, K., Haydon, D. T. and Soubeyrand, S. (2014). A Bayesian approach for inferring the dynamics of partially observed endemic infectious diseases from space-time-genetic data. Proceedings of the Royal Society ( Series B ) 281 1782.
[14] Morelli, M. J., Thébaud, G., Chadœuf, J., King, D. P., Haydon, D. T. and Soubeyrand, S. (2012). A Bayesian inference framework to reconstruct transmission trees using epidemiological and genetic data. PLoS Comput. Biol. 8 e1002768, 14. · doi:10.1371/journal.pcbi.1002768
[15] Numminen, E., Chewapreecha, C., Sirén, J., Turner, C., Turner, P., Bentley, S. D. and Corander, J. (2014). Two-phase importance sampling for inference about transmission trees. J. R. Soc. Interface 281 20141324.
[16] O’Neill, P. and Roberts, G. (1999). Bayesian inference for partially observed stochastic epidemics. J. Roy. Statist. Soc. Ser. A 162 121-129.
[17] Perry, J. D., Davies, A., Butterworth, L. A., Hopley, A. L. J., Nicholson, A. and Gould, F. K. (2004). Development and evaluation of a chromogenic agar medium for methicillin-resistant staphylococcus aureus. J. Clin. Microbiol. 42 4519-4523.
[18] Pittet, D., Allegranzi, B., Storr, J., Nejad, S. B., Dziekan, G., Leotsakos, A. and Donaldson, L. (2008). Infection control as a major World Health Organization priority for developing countries. J. Hosp. Infect. 68 285-292.
[19] Pybus, O. G. and Rambaut, A. (2009). Evolutionary analysis of the dynamics of viral infectious disease. Nat. Rev. Genet. 10 540-550.
[20] Pybus, O. G., Charleston, M. A., Gupta, S., Rambaut, A., Holmes, E. C. and Harvey, P. H. (2001). The epidemic behavior of the hepatitis C virus. Science 292 2323-2325.
[21] Rasmussen, D. A., Ratmann, O. and Koelle, K. (2011). Inference for nonlinear epidemiological models using genealogies and time series. PLoS Comput. Biol. 7 e1002136, 11. · doi:10.1371/journal.pcbi.1002136
[22] Romero-Severson, E., Skar, H., Bulla, I., Albert, J. and Leitner, T. (2014). Timing and order of transmission events is not directly reflected in a pathogen phylogeny. Mol. Biol. Evol. 31 2472-2482.
[23] Snitkin, E. S., Zelazny, A. M., Thomas, P. J., Stock, F., NISC Comparative Sequencing Program Group, Henderson, D. K., Palmore, T. N. and Segre, J. A. (2012). Tracking a hospital outbreak of carbapenem-resistant Klebsiella pneumoniae with whole-genome sequencing. Science Translational Medicine 4 148ra116.
[24] Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. J. Amer. Statist. Assoc. 82 528-550. With discussion and with a reply by the authors. · Zbl 0619.62029 · doi:10.2307/2289457
[25] Volz, E. M., Pond, S. L. K., Ward, M. J., Brown, A. J. L. and Frost, S. D. W. (2009). Phylodynamics of infectious disease epidemics. Genetics 183 1421-1430.
[26] Watterson, G. A. (1975). On the number of segregating sites in genetical models without recombination. Theoret. Population Biology 7 256-276. · Zbl 0294.92011 · doi:10.1016/0040-5809(75)90020-9
[27] Worby, C. J., Lipsitch, M. and Hanage, W. P. (2014). Within-host bacterial diversity hinders accurate reconstruction of transmission networks from genomic distance data. PLoS Comput. Biol. 10 e1003549.
[28] Worby, C. J. and Read, T. D. (2015). “seedy” (simulation of evolution and epidemiological dynamics): An R package to follow within-host mutation in pathogens. PLOS One 10 e0129745.
[29] Worby, C. J., Jeyaratnam, D., Robotham, J. V., Kypraios, T., O’Neill, P. D., Angelis, D. D., French, G. and Cooper, B. S. (2013). Estimating the effectiveness of isolation and decolonization measures in reducing transmission of methicillin-resistant Staphylococcus aureus in hospital general wards. Am. J. Epidemiol. 177 1306-1313.
[30] Worby, C. J., Chang, H. H., Hanage, W. P. and Lipsitch, M. (2014). The distribution of pairwise genetic distances: A tool for investigating disease transmission. Genetics 198 1395-1404.
[31] Worby, C. J., O’Neill, P. D., Kypraios, T., Robotham, J. V., De Angelis, D., Cartwright E. J. P., Peacock, S. J. and Cooper, B. S. (2016). Supplement to “Reconstructing transmission trees for communicable diseases using densely sampled genetic data.” , DOI:10.1214/15-AOAS898SUPPB . · Zbl 1358.62110 · doi:10.1214/15-AOAS898
[32] Ypma, R. J. F., van Ballegooijen, W. M. and Wallinga, J. (2013). Relating phylogenetic trees to transmission trees of infectious disease outbreaks. Genetics 195 1055-1062.
[33] Ypma, R. J. F., Bataille, A. M. A., Stegeman, A., Koch, G., Wallinga, J. and van Ballegooijen, W. M. (2012). Unravelling transmission trees of infectious diseases by combining genetic and epidemiological data. Proceedings of the Royal Society ( Series B ) 279 444-450.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.