×

Hierarchical Bayesian analysis of somatic mutation data in cancer. (English) Zbl 1288.62157

Summary: Identifying genes underlying cancer development is critical to cancer biology and has important implications across prevention, diagnosis and treatment. Cancer sequencing studies aim at discovering genes with high frequencies of somatic mutations in specific types of cancer, as these genes are potential driving factors (drivers) for cancer development. We introduce a hierarchical Bayesian methodology to estimate gene-specific mutation rates and driver probabilities from somatic mutation data and to shed light on the overall proportion of drivers among sequenced genes. Our methodology applies to different experimental designs used in practice, including one-stage, two-stage and candidate gene designs. Also, sample sizes are typically small relative to the rarity of individual mutations. Via a shrinkage method borrowing strength from the whole genome in assessing individual genes, we reinforce inference and address the selection effects induced by multistage designs.
Our simulation studies show that the posterior driver probabilities provide a nearly unbiased false discovery rate estimate. We apply our methods to pancreatic and breast cancer data, contrast our results to previous estimates and provide estimated proportions of drivers for these two types of cancer.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62F15 Bayesian inference
92C50 Medical applications (general)
65C60 Computational problems in statistics (MSC2010)

References:

[1] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 57 289-300. · Zbl 0809.62014
[2] Boca, S. M., Kinzler, K. W., Velculescu, V. E., Vogelstein, B. and Parmigiani, G. (2010). Patient-oriented gene set analysis for cancer mutation data. Genome Biol. 11 R112.
[3] Cancer Genome Atlas Research Network (2008). Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455 1061-1068.
[4] Cancer Genome Atlas Research Network (2011). Integrated genomic analyses of ovarian carcinoma. Nature 474 609-615.
[5] Ciriello, G., Cerami, E., Sander, C. and Schultz, N. (2012). Mutual exclusivity analysis identifies oncogenic network modules. Genome Res. 22 398-406.
[6] Ding, J., Trippa, L., Zhong, X. and Parmigiani, G. (2013). Supplement to “Hierarchical Bayesian analysis of somatic mutation data in cancer.” . · Zbl 1288.62157
[7] Dunson, D. B. (2010). Nonparametric Bayes applications to biostatistics. In Bayesian Nonparametrics (N. L. Hjort, C. Holmes, P. Müller and S. G. Walker, eds.) 223-273. Cambridge Univ. Press, Cambridge. · doi:10.1017/CBO9780511802478.008
[8] Efron, B. and Morris, C. (1973). Combining possibly related estimation problems (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 35 379-421. · Zbl 0281.62030
[9] Efron, B. and Tibshirani, R. (2002). Empirical Bayes methods and false discovery rates for microarrays. Genet. Epidemiol. 23 70-86.
[10] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209-230. · Zbl 0255.62037 · doi:10.1214/aos/1176342360
[11] Getz, G., Höfling, H., Mesirov, J. P., Golub, T. R., Meyerson, M. L., Tibshirani, R. and Lander, E. S. (2007). Comment on “The consensus coding sequences of human breast and colorectal cancers.” Science 317 1500b.
[12] Greenman, C., Wooster, R., Futreal, P. A., Stratton, M. R. and Easton, D. F. (2006). Statistical analysis of pathogenicity of somatic mutations in cancer. Genetics 173 2187-2198.
[13] Greenman, C., Stephens, P., Smith, R., Dalgliesh, G. L., Hunter, C., Bignell, G., Davies, H., Teague, J., Butler, A., Stevens, C., Edkins, S., O’Meara, S., Vastrik, I., Schmidt, E. E., Avis, T., Barthorpe, S., Bhamra, G., Buck, G., Choudhury, B., Clements, J., Cole, J., Dicks, E., Forbes, S., Gray, K., Halliday, K., Harrison, R., Hills, K., Hinton, J., Jenkinson, A., Jones, D., Menzies, A., Mironenko, T., Perry, J., Raine, K., Richardson, D., Shepherd, R., Small, A., Tofts, C., Varian, J., Webb, T., West, S., Widaa, S., Yates, A., Cahill, D. P., Louis, D. N., Goldstraw, P., Nicholson, A. G., Brasseur, F., Looijenga, L., Weber, B. L., Chiew, Y.-E., DeFazio, A., Greaves, M. F., Green, A. R., Campbell, P., Birney, E., Easton, D. F., Chenevix-Trench, G., Tan, M.-H., Khoo, S. K., Teh, B. T., Yuen, S. T., Leung, S. Y., Wooster, R., Futreal, P. A. and Stratton, M. R. (2007). Patterns of somatic mutation in human cancer genomes. Nature 446 153-158.
[14] Jones, S., Zhang, X., Parsons, D. W., Lin, J. C., Leary, R. J., Angenendt, P., Mankoo, P., Carter, H., Kamiyama, H., Jimeno, A., Hong, S., Fu, B., Lin, M., Calhoun, E. S., Kamiyama, M., Walter, K., Nikolskaya, T., Nikolsky, Y., Hartigan, J., Smith, D. R., Hidalgo, M., Leach, S. D., Klein, A. P., Jaffee, E. M., Goggins, M., Maitra, A., Iacobuzio-Donahue, C., Eshleman, J. R., Kern, S. E., Hruban, R. H., Karchin, R., Papadopoulos, N., Parmigiani, G., Vogelstein, B., Velculescu, V. E. and Kinzler, K. W. (2008). Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science 321 1801-1806.
[15] Kan, Z., Jaiswal, B. S., Stinson, J., Janakiraman, V., Bhatt, D., Stern, H. M., Yue, P., Haverty, P. M., Bourgon, R., Zheng, J., Moorhead, M., Chaudhuri, S., Tomsho, L. P., Peters, B. A., Pujara, K., Cordes, S., Davis, D. P., Carlton, V. E. H., Yuan, W., Li, L., Wang, W., Eigenbrot, C., Kaminker, J. S., Eberhard, D. A., Waring, P., Schuster, S. C., Modrusan, Z., Zhang, Z., Stokoe, D., de Sauvage, F. J., Faham, M. and Seshagiri, S. (2010). Diverse somatic mutation patterns and pathway alterations in human cancers. Nature 466 869-873.
[16] Kraft, P. (2006). Efficient two-stage genome-wide association designs based on false positive report probabilities. Pac. Symp. Biocomput. 523-534.
[17] Parmigiani, G., Boca, S., Lin, J., Kinzler, K. W., Velculescu, V. and Vogelstein, B. (2009). Design and analysis issues in genome-wide somatic mutation studies of cancer. Genomics 93 17-21.
[18] Parsons, D. W., Jones, S., Zhang, X., Lin, J. C., Leary, R. J., Angenendt, P., Mankoo, P., Carter, H., Siu, I., Gallia, G. L., Olivi, A., McLendon, R., Rasheed, B. A., Keir, S., Nikolskaya, T., Nikolsky, Y., Busam, D. A., Tekleab, H., Diaz, L. A., Hartigan, J., Smith, D. R., Strausberg, R. L., Marie, S. K. N., Shinjo, S. M. O., Yan, H., Riggins, G. J., Bigner, D. D., Karchin, R., Papadopoulos, N., Parmigiani, G., Vogelstein, B., Velculescu, V. E. and Kinzler, K. W. (2008). An integrated genomic analysis of human glioblastoma multiforme. Science 312 1807-1812.
[19] Prendergast, J. G. D., Campbell, H., Gilbert, N., Dunlop, M. G., Bickmore, W. A. and Semple, C. A. M. (2007). Chromatin structure and evolution in the human genome. BMC Evol. Biol. 7 72.
[20] Schuster-Böckler, B. and Lehner, B. (2012). Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature 488 504-507.
[21] Sjöblom, T., Jones, S., Wood, L. D., Parsons, D. W., Lin, J., Barber, T. D., Mandelker, D., Leary, R. J., Ptak, J., Silliman, N., Szabo, S., Buckhaults, P., Farrell, C., Meeh, P., Markowitz, S. D., Willis, J., Dawson, D., Willson, J. K. V., Gazdar, A. F., Hartigan, J., Wu, L., Liu, C., Parmigiani, G., Park, B. H., Bachman, K. E., Papadopoulos, N., Vogelstein, B., Kinzler, K. W. and Velculescu, V. E. (2006). The consensus coding sequences of human breast and colorectal cancers. Science 314 268-274.
[22] Skol, A. D., Scott, L. J., Abecasis, G. R. and Boehnke, M. (2006). Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat. Genet. 38 209-213.
[23] Stamatoyannopoulos, J. A., Adzhubei, I., Thurman, R. E., Kryukov, G. V., Mirkin, S. M. and Sunyaev, S. R. (2009). Human mutation rate associated with DNA replication timing. Nat. Genet. 41 393-395.
[24] Trippa, L. and Parmigiani, G. (2011). False discovery rates in somatic mutation studies of cancer. Ann. Appl. Stat. 5 1360-1378. · Zbl 1454.62410 · doi:10.1214/10-AOAS438
[25] Walter, M. J., Shen, D., Ding, L., Shao, J., Koboldt, D. C., Chen, K., Larson, D. E., McLellan, M. D., Dooling, D., Abbott, R., Fulton, R., Magrini, V., Schmidt, H., Kalicki-Veizer, J., O’Laughlin, M., Fan, X., Grillot, M., Witowski, S., Heath, S., Frater, J. L., Eades, W., Tomasson, M., Westervelt, P., DiPersio, J. F., Link, D. C., Mardis, E. R., Ley, T. J., Wilson, R. K. and Graubert, T. A. (2012). Clonal architecture of secondary acute myeloid leukemia. The New England Journal of Medicine 366 1090-1098.
[26] Wang, H. and Stram, D. O. (2006). Optimal two-stage genome-wide association designs based on false discovery rate. Comput. Statist. Data Anal. 51 457-465. · Zbl 1157.62543 · doi:10.1016/j.csda.2006.04.034
[27] Wolfe, K. H., Sharp, P. M. and Li, W. H. (1989). Mutation rates differ among regions of the mammalian genome. Nature 337 283-285.
[28] Wood, L. D., Parsons, D. W., Jones, S., Lin, J., Sjöblom, T., Leary, R. J., Shen, D., Boca, S. M., Barber, T., Ptak, J., Silliman, N., Szabo, S., Dezso, Z., Ustyanksky, V., Nikolskaya, T., Nikolsky, Y., Karchin, R., Wilson, P. A., Kaminker, J. S., Zhang, Z., Croshaw, R., Willis, J., Dawson, D., Shipitsin, M., Willson, J. K. V., Sukumar, S., Polyak, K., Park, B. H., Pethiyagoda, C. L., Pant, P. V. K., Ballinger, D. G., Sparks, A. B., Hartigan, J., Smith, D. R., Suh, E., Papadopoulos, N., Buckhaults, P., Markowitz, S. D., Parmigiani, G., Kinzler, K. W., Velculescu, V. E. and Vogelstein, B. (2007). The genomic landscapes of human breast and colorectal cancers. Science 318 1108-1113.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.