Zhou, Tianjian; Sengupta, Subhajit; Müller, Peter; Ji, Yuan TreeClone: reconstruction of tumor subclone phylogeny based on mutation pairs using next generation sequencing data. (English) Zbl 1423.62155 Ann. Appl. Stat. 13, No. 2, 874-899 (2019). Summary: We present TreeClone, a latent feature allocation model to reconstruct tumor subclones subject to phylogenetic evolution that mimics tumor evolution. Similar to most current methods, we consider data from next-generation sequencing of tumor DNA. Unlike most methods that use information in short reads mapped to single nucleotide variants (SNVs), we consider subclone phylogeny reconstruction using pairs of two proximal SNVs that can be mapped by the same short reads. As part of the Bayesian inference model, we construct a phylogenetic tree prior. The use of the tree structure in the prior greatly strengthens inference. Only subclones that can be explained by a phylogenetic tree are assigned non-negligible probabilities. The proposed Bayesian framework implies posterior distributions on the number of subclones, their genotypes, cellular proportions and the phylogenetic tree spanned by the inferred subclones. The proposed method is validated against different sets of simulated and real-world data using single and multiple tumor samples. An open source software package is available at http://www.compgenome.org/treeclone. Cited in 3 Documents MSC: 62P10 Applications of statistics to biology and medical sciences; meta analysis 92D20 Protein sequences, DNA sequences 62F15 Bayesian inference 62H12 Estimation in multivariate analysis Keywords:latent feature model; mutation pair; NGS data; phylogenetic tree; subclone; tumor heterogeneity; Bayesian inference Software:PhyloWGS; GATK; BWA; SciClone; PyClone; TreeClone; PairClone × Cite Format Result Cite Review PDF Full Text: DOI arXiv Euclid References: [1] Adams, R. P., Ghahramani, Z. and Jordan, M. I. (2010). Tree-structured stick breaking for hierarchical data. In Advances in Neural Information Processing Systems 19-27. [2] Aparicio, S. and Caldas, C. (2013). The implications of clonal genome evolution for cancer medicine. N. Engl. J. Med.368 842-851. [3] Bafna, V., Gusfield, D., Lancia, G. and Yooseph, S. (2003). Haplotyping as perfect phylogeny: A direct approach. J. Comput. Biol.10 323-340. [4] Bonavia, R., Cavenee, W. K., Furnari, F. B. et al. (2011). Heterogeneity maintenance in glioblastoma: A social network. Cancer Res.71 4055-4060. [5] Brocks, D., Assenov, Y., Minner, S., Bogatyrova, O., Simon, R., Koop, C., Oakes, C., Zucknick, M., Lipka, D. B., Weischenfeldt, J. et al. (2014). Intratumor DNA methylation heterogeneity reflects clonal evolution in aggressive prostate cancer. Cell Rep.8 798-806. [6] Broderick, T., Kulis, B. and Jordan, M. (2013). MAD-Bayes: MAP-based asymptotic derivations from Bayes. In Proceedings of the 30th International Conference on Machine Learning 226-234. [7] Carter, S. L., Cibulskis, K., Helman, E., McKenna, A., Shen, H., Zack, T., Laird, P. W., Onofrio, R. C., Winckler, W. (2012). Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol.30 413-421. [8] Chipman, H. A., George, E. I. and McCulloch, R. E. (1998). Bayesian CART model search. J. Amer. Statist. Assoc.93 935-948. [9] Dagum, L. and Menon, R. (1998). OpenMP: An industry standard API for shared-memory programming. IEEE Comput. Sci. Eng.5 46-55. [10] Denison, D. G. T., Mallick, B. K. and Smith, A. F. M. (1998). A Bayesian CART algorithm. Biometrika85 363-377. · Zbl 1048.62502 [11] Deshwar, A. G., Vembu, S., Yung, C. K., Jang, G. H., Stein, L. and Morris, Q. (2015). PhyloWGS: Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome Biol.16 35. [12] Fan, X., Zhou, W., Chong, Z., Nakhleh, L. and Chen, K. (2014). Towards accurate characterization of clonal heterogeneity based on structural variation. BMC Bioinform.15 1. [13] Fischer, A., Vázquez-García, I., Illingworth, C. J. R. and Mustonen, V. (2014). High-definition reconstruction of clonal composition in cancer. Cell Rep.7 1740-1752. [14] Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statist. Sci. 457-472. · Zbl 1386.65060 [15] Geyer, C. J. (1991). Markov chain Monte Carlo maximum likelihood. In Computing Science and Statistics, Proceedings of the 23rd Symposium on the Interface 156-163. Interface Foundation of North America, Fairfax Station, VA. [16] Giordano, R. J., Broderick, T. and Jordan, M. I. (2015). Linear response methods for accurate covariance estimates from mean field variational Bayes. In Advances in Neural Information Processing Systems 1441-1449. [17] Jiao, W., Vembu, S., Deshwar, A. G., Stein, L. and Morris, Q. (2014). Inferring clonal evolution of tumors from single nucleotide somatic mutations. BMC Bioinform.15 35. [18] Johnson, V. E. (2004). A Bayesian \(\chi^2\) test for goodness-of-fit. Ann. Statist.32 2361-2384. · Zbl 1068.62028 [19] Lee, J., Müller, P., Gulukota, K. and Ji, Y. (2015). A Bayesian feature allocation model for tumor heterogeneity. Ann. Appl. Stat.9 621-639. · Zbl 1397.62457 [20] Li, H. and Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics25 1754-1760. [21] Marass, F., Mouliere, F., Yuan, K., Rosenfeld, N. and Markowetz, F. (2016). A phylogenetic latent feature model for clonal deconvolution. Ann. Appl. Stat.10 2377-2404. · Zbl 1454.62360 [22] Marusyk, A., Almendro, V. and Polyak, K. (2012). Intra-tumour heterogeneity: A looking glass for cancer? Nat. Rev. Cancer12 323-334. [23] McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M. et al. (2010). The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res.20 1297-1303. [24] Miller, K. T., Griffiths, T. L. and Jordan, M. I. (2008). The phylogenetic Indian buffet process: A non-exchangeable nonparametric prior for latent features. In Proceedings of the 24th Conference in Uncertainty in Artificial Intelligence 403-410. [25] Miller, C. A., White, B. S., Dees, N. D., Griffith, M., Welch, J. S., Griffith, O. L., Vij, R., Tomasson, M. H., Graubert, T. A., Walter, M. J. et al. (2014). SciClone: Inferring clonal architecture and tracking the spatial and temporal patterns of tumor evolution. PLoS Comput. Biol.10 e1003665. [26] Nik-Zainal, S., Van Loo, P., Wedge, D. C., Alexandrov, L. B., Greenman, C. D., Lau, K. W., Raine, K., Jones, D., Marshall, J., Ramakrishna, M. et al. (2012). The life history of 21 breast cancers. Cell149 994-1007. [27] Nowell, P. C. (1976). The clonal evolution of tumor cell populations. Science194 23-28. [28] O’Hagan, A. (1995). Fractional Bayes factors for model comparison. J. Roy. Statist. Soc. Ser. B57 99-138. · Zbl 0813.62026 [29] Roth, A., Khattra, J., Yap, D., Wan, A., Laks, E., Biele, J., Ha, G., Aparicio, S., Bouchard-Côté, A. and Shah, S. P. (2014). PyClone: Statistical inference of clonal population structure in cancer. Nat. Methods11 396-398. [30] Schwarz, R. F., Ng, C. K., Cooke, S. L., Newman, S., Temple, J., Piskorz, A. M., Gale, D., Sayal, K., Murtaza, M., Baldwin, P. J. et al. (2015). Spatial and temporal heterogeneity in high-grade serous ovarian cancer: A phylogenetic analysis. PLoS Med.12 e1001789. [31] Sengupta, S., Wang, J., Lee, J., Müller, P., Gulukota, K., Banerjee, A. and Ji, Y. (2015). BayClone: Bayesian nonparametric inference of tumor subclones using NGS data. In Proceedings of the Pacific Symposium on Biocomputing (PSB) 20 467-478. [32] Sengupta, S., Gulukota, K., Zhu, Y., Ober, C., Naughton, K., Wentworth-Sheilds, W. and Ji, Y. (2016). Ultra-fast local-haplotype variant calling using paired-end DNA-sequencing data reveals somatic mosaicism in tumor and normal blood samples. Nucleic Acids Res.44 e25. [33] Van Loo, P., Nordgard, S. H., Lingjærde, O. C., Russnes, H. G., Rye, I. H., Sun, W., Weigman, V. J., Marynen, P., Zetterberg, A., Naume, B. et al. (2010). Allele-specific copy number analysis of tumors. Proc. Natl. Acad. Sci. USA107 16910-16915. [34] Xu, Y., Müller, P., Yuan, Y., Gulukota, K. and Ji, Y. (2015). MAD Bayes for tumor heterogeneity—feature allocation with exponential family sampling. J. Amer. Statist. Assoc.110 503-514. · Zbl 1373.62556 [35] Zare, H., Wang, J., Hu, A., Weber, K., Smith, J., Nickerson, D., Song, C., Witten, D., Blau, C. A. and Noble, W. S. (2014). Inferring clonal composition from multiple sections of a breast cancer. PLoS Comput. Biol.10 e1003703. [36] Zhou, T., Müller, P., Sengupta, S. and Ji, Y. (2019). PairClone: A Bayesian subclone caller based on mutation pairs. J. R. Stat. Soc. Ser. C. Appl. Stat.68 705-725. · Zbl 07948003 [37] Zhou, T., Sengupta, S., Müller, P. and Ji, Y. (2019). Supplement to “TreeClone: Reconstruction of Tumor Subclone Phylogeny Based on Mutation Pairs using Next Generation Sequencing Data.” DOI:10.1214/18-AOAS1224SUPP. · Zbl 1423.62155 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.