×

Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics. (English) Zbl 1445.92105

Summary: The Dirichlet process (DP) mixture model has become a popular choice for model-based clustering, largely because it allows the number of clusters to be inferred. The sequential updating and greedy search (SUGS) algorithm [L. Wang and D. B. Dunson, “Fast Bayesian inference in Dirichlet process mixture models”, J. Comput. Graph. Stat. 20, No. 1, 196–216 (2011; doi.org/10.1198/jcgs.2010.07081)] was proposed as a fast method for performing approximate Bayesian inference in DP mixture models, by posing clustering as a Bayesian model selection (BMS) problem and avoiding the use of computationally costly Markov chain Monte Carlo methods. Here we consider how this approach may be extended to permit variable selection for clustering, and also demonstrate the benefits of Bayesian model averaging (BMA) in place of BMS. Through an array of simulation examples and well-studied examples from cancer transcriptomics, we show that our method performs competitively with the current state-of-the-art, while also offering computational benefits. We apply our approach to reverse-phase protein array (RPPA) data from the cancer genome atlas (TCGA) in order to perform a pan-cancer proteomic characterisation of 5157 tumour samples. We have implemented our approach, together with the original SUGS algorithm, in an open-source R package named Sugsvarsel, which accelerates analysis by performing intensive computations in C++ and provides automated parallel processing. The R package is freely available from:

MSC:

92C40 Biochemistry, molecular biology
62P10 Applications of statistics to biology and medical sciences; meta analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDF BibTeX XML Cite
Full Text: DOI arXiv Link

References:

[1] Akbani, R., P. K. S. Ng, H. M. J. Werner, M. Shahmoradgoli, F. Zhang, Z. Ju, W. Liu, J.-Y. Yang, K. Yoshihara, J. Li, S. Ling, E. G. Seviour, P. T. Ram, J. D. Minna, L. Diao, P. Tong, J. V. Heymach, S. M. Hill, F. Dondelinger, N. Städler, L. A. Byers, F. Meric-Bernstam, J. N. Weinstein, B. M. Broom, R. G. W. Verhaak, H. Liang, S. Mukherjee, Y. Lu and G. B. Mills (2014): “A pan-cancer proteomic perspective on The Cancer Genome Atlas.” Nat. Commun., 5, 3887.; Akbani, R.; Ng, P. K. S.; Werner, H. M. J.; Shahmoradgoli, M.; Zhang, F.; Ju, Z.; Liu, W.; Yang, J.-Y.; Yoshihara, K.; Li, J.; Ling, S.; Seviour, E. G.; Ram, P. T.; Minna, J. D.; Diao, L.; Tong, P.; Heymach, J. V.; Hill, S. M.; Dondelinger, F.; Städler, N.; Byers, L. A.; Meric-Bernstam, F.; Weinstein, J. N.; Broom, B. M.; Verhaak, R. G. W.; Liang, H.; Mukherjee, S.; Lu, Y.; Mills, G. B., A pan-cancer proteomic perspective on The Cancer Genome Atlas.” Nat. Commun (2014)
[2] Antoniak, C. E. (1974): “Mixtures of dirichlet processes with applications to Bayesian nonparametric problems.” Ann. Statist., 2, 1152-1174.; Antoniak, C. E., Mixtures of dirichlet processes with applications to Bayesian nonparametric problems, Ann. Statist, 2, 1152-1174 (1974) · Zbl 0335.60034
[3] Attias, H. (1999): “Inferring parameters and structure of latent variable models by variational bayes.” In: Proc. 15th Conf. on Uncertainty in Artificial Intelligence. San Francisco, CA, USA, Morgan Kaufmann Publishers Inc., pp. 21-30.; Attias, H., Proc. 15th Conf. on Uncertainty in Artificial Intelligence, 21-30 (1999)
[4] Attias, H. (2000): “A variational Bayesian framework for graphical models.” In: Solla, S. A., Leen, T. K. Müller, K. editors, Advances in Neural Information Processing Systems 12. Denver, USA, MIT Press, pp. 209-215.; Attias, H.; Solla, S. A.; Leen, T. K.; Müller, K., Advances in Neural Information Processing Systems 12 (2000)
[5] Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: a practical and powerful approach to multiple testing.” J. Roy. Stat. Soc. B Met., 57, 289-300.; Benjamini, Y.; Hochberg, Y., Controlling the false discovery rate: a practical and powerful approach to multiple testing, J. Roy. Stat. Soc. B Met, 57, 289-300 (1995) · Zbl 0809.62014
[6] Berger, A. C., A. Korkut, R. S. Kanchi, A. M. Hegde, W. Lenoir, W. Liu, Y. Liu, H. Fan, H. Shen, V. Ravikumar, A. Rao, A. Schultz, X. Li, P. Sumazin, C. Williams, P. Mestdagh, P. H. Gunaratne, C. Yau, R. Bowlby, A. G. Robertson, D. G. Tiezzi, C. Wang, A. D. Cherniack, A. K. Godwin, N. M. Kuderer, J. S. Rader, R. E. Zuna, A. K. Sood, A. J. Lazar, A. I. Ojesina, C. Adebamowo, S. N. Adebamowo, K. A. Baggerly, T.-W. Chen, H.-S. Chiu, S. Lefever, L. Liu, K. MacKenzie, S. Orsulic, J. Roszik, C. S. Shelley, Q. Song, C. P. Vellano, N. Wentzensen, Cancer Genome Atlas Research Network, J. N. Weinstein, G. B. Mills, D. A. Levine and R. Akbani (2018): “A comprehensive pan-cancer molecular study of gynecologic and breast cancers.” Cancer Cell, 33, 690-705.e9.; Berger, A. C.; Korkut, A.; Kanchi, R. S.; Hegde, A. M.; Lenoir, W.; Liu, W.; Liu, Y.; Fan, H.; Shen, H.; Ravikumar, V.; Rao, A.; Schultz, A.; Li, X.; Sumazin, P.; Williams, C.; Mestdagh, P.; Gunaratne, P. H.; Yau, C.; Bowlby, R.; Robertson, A. G.; Tiezzi, D. G.; Wang, C.; Cherniack, A. D.; Godwin, A. K.; Kuderer, N. M.; Rader, J. S.; Zuna, R. E.; Sood, A. K.; Lazar, A. J.; Ojesina, A. I.; Adebamowo, C.; Adebamowo, S. N.; Baggerly, K. A.; Chen, T.-W.; Chiu, H.-S.; Lefever, S.; Liu, L.; MacKenzie, K.; Orsulic, S.; Roszik, J.; Shelley, C. S.; Song, Q.; Vellano, C. P.; Wentzensen, N.; , A comprehensive pan-cancer molecular study of gynecologic and breast cancers, Cancer Cell, 33, 690-705.e9 (2018)
[7] Blackwell, D. and J. B. MacQueen (1973): “Ferguson distributions via polya urn schemes.” Ann. Statist., 1, 353-355.; Blackwell, D.; MacQueen, J. B., Ferguson distributions via polya urn schemes, Ann. Statist, 1, 353-355 (1973) · Zbl 0276.62010
[8] Blei, D. M. and M. I. Jordan (2006): “Variational inference for Dirichlet process mixtures.” Bayesian Anal., 1, 121-143.; Blei, D. M.; Jordan, M. I., Variational inference for Dirichlet process mixtures, Bayesian Anal, 1, 121-143 (2006) · Zbl 1331.62259
[9] Blei, D. M., A. Kucukelbir and J. D. McAuliffe (2016): “Variational inference: a review for statisticians.” J. Am. Stat. Assoc., 112, 859-877.; Blei, D. M.; Kucukelbir, A.; McAuliffe, J. D., Variational inference: a review for statisticians, J. Am. Stat. Assoc, 112, 859-877 (2016)
[10] Chen, A. H., Y.-W. Tsau and C.-H. Lin (2010): “Novel methods to identify biologically relevant genes for leukemia and prostate cancer from gene expression profiles.” BMC Genomics, 11, 274.; Chen, A. H.; Tsau, Y.-W; Lin, C.-H., Novel methods to identify biologically relevant genes for leukemia and prostate cancer from gene expression profiles, BMC Genomics, 11, 274 (2010)
[11] Constantinopoulos, C., M. K. Titsias and A. Likas (2006): “Bayesian feature and model selection for Gaussian mixture models.” IEEE Trans. Pattern Anal. Mach. Intell., 28, 1013-1018.; Constantinopoulos, C.; Titsias, M. K.; Likas, A., Bayesian feature and model selection for Gaussian mixture models, IEEE Trans. Pattern Anal. Mach. Intell, 28, 1013-1018 (2006)
[12] Cooke, E. J., R. S. Savage, P. D. W. Kirk, R. Darkins and D. L. Wild (2011): “Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements.” BMC Bioinformatics, 12, 399.; Cooke, E. J.; Savage, R. S.; Kirk, P. D. W.; Darkins, R.; Wild, D. L., Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements, BMC Bioinformatics, 12, 399 (2011)
[13] Darkins, R., E. J. Cooke, Z. Ghahramani, P. D. W. Kirk, D. L. Wild and R. S. Savage (2013): “Accelerating Bayesian hierarchical clustering of time series data with a randomised algorithm.” PLoS One, 8, e59795.; Darkins, R.; Cooke, E. J.; Ghahramani, Z.; Kirk, P. D. W.; Wild, D. L.; Savage, R. S., Accelerating Bayesian hierarchical clustering of time series data with a randomised algorithm, PLoS One, 8, e59795 (2013)
[14] Daumé III, H. (2007): Fast search for Dirichlet process mixture models. In: Meila M., Shen, X. editors, AISTATS. San Juan, Puerto Rico, pp. 83-90.; Daumé III, H.; Meila, M.; Shen, X., AISTATS. San Juan, Puerto Rico, 83 (2007)
[15] Dudoit, S., J. Fridlyand and T. P. Speed (2002): “Comparison of discrimination methods for the classification of tumors using gene expression data.” J. Am. Stat. Assoc., 97, 77-87.; Dudoit, S.; Fridlyand, J.; Speed, T. P., Comparison of discrimination methods for the classification of tumors using gene expression data, J. Am. Stat. Assoc, 97, 77-87 (2002) · Zbl 1073.62576
[16] Escobar, M. D. (1994): “Estimating normal means with a dirichlet process prior.” J. Am. Stat. Assoc., 89, 268-277.; Escobar, M. D., Estimating normal means with a dirichlet process prior, J. Am. Stat. Assoc, 89, 268-277 (1994) · Zbl 0791.62039
[17] Escobar, M. D. and M. West (1995): “Bayesian density estimation and inference using mixtures.” J. Am. Stat. Assoc., 90, 577-588.; Escobar, M. D.; West, M., Bayesian density estimation and inference using mixtures, J. Am. Stat. Assoc, 90, 577-588 (1995) · Zbl 0826.62021
[18] Ferguson, T. S. (1973): “A Bayesian analysis of some nonparametric problems.” Ann. Statist., 1, 209-230.; Ferguson, T. S., A Bayesian analysis of some nonparametric problems, Ann. Statist, 1, 209-230 (1973) · Zbl 0255.62037
[19] Ferguson, T. S. (1974): “Prior distributions on spaces of probability measures.” Ann. Statist., 2, 615-629.; Ferguson, T. S., Prior distributions on spaces of probability measures, Ann. Statist, 2, 615-629 (1974) · Zbl 0286.62008
[20] Fop, M. and T. B. Murphy (2018): “Variable selection methods for model-based clustering.” Stat. Surv., 12, 1-48.; Fop, M.; Murphy, T. B., Variable selection methods for model-based clustering, Stat. Surv, 12, 1-48 (2018) · Zbl 06875306
[21] Fraley, C. and A. E. Raftery (2002): “Model-based clustering, discriminant analysis and density estimation.” J. Am. Stat. Assoc., 97, 611-631.; Fraley, C.; Raftery, A. E., Model-based clustering, discriminant analysis and density estimation, J. Am. Stat. Assoc, 97, 611-631 (2002) · Zbl 1073.62545
[22] Fraley, C., A. E. Raftery, T. B. Murphy and L. Scrucca (2012). mclust Version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation.; Fraley, C.; Raftery, A. E.; Murphy, T. B.; Scrucca, L., mclust Version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation (2012)
[23] Fritsch, A. and K. Ickstadt (2009): “Improved criteria for clustering based on the posterior similarity matrix.” Bayesian Anal., 4, 367-391.; Fritsch, A.; Ickstadt, K., Improved criteria for clustering based on the posterior similarity matrix, Bayesian Anal, 4, 367-391 (2009) · Zbl 1330.62249
[24] Golub, T. R., D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield and E. S. Lander (1999): “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.” Science, 286, 531-537.; Golub, T. R.; Slonim, D. K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J. P.; Coller, H.; Loh, M. L.; Downing, J. R.; Caligiuri, M. A.; Bloomfield, C. D.; Lander, E. S., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, 286, 531-537 (1999)
[25] Heller, K. and Z. Ghahramani (2005): “Bayesian hierarchical clustering.” In: Proceedings of the 22nd International Conference on Machine Learning. Bonn, Germany.; Heller, K.; Ghahramani, Z., Proceedings of the 22nd International Conference on Machine (2005)
[26] Hoadley, K. A., C. Yau, D. M. Wolf, A. D. Cherniack, D. Tamborero, S. Ng, M. D. Leiserson, B. Niu, M. D. McLellan, V. Uzunangelov, J. Zhang, C. Kandoth, R. Akbani, H. Shen, L. Omberg, A. Chu, A. A. Margolin, L. J. Van’t Veer, N. Lopez-Bigas, P. W. Laird, B. J. Raphael, L. Ding, A. G. Robertson, L. A. Byers, G. B. Mills, J. N. Weinstein, C. Van Waes, Z. Chen, E. A. Collisson, Cancer Genome Atlas Research Network, C. C. Benz, C. M. Perou, J. M. Stuart (2014): “Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin.” Cell, 158, 929-944.; Hoadley, K. A.; Yau, C.; Wolf, D. M.; Cherniack, A. D.; Tamborero, D.; Ng, S.; Leiserson, M. D.; Niu, B.; McLellan, M. D.; Uzunangelov, V.; Zhang, J.; Kandoth, C.; Akbani, R.; Shen, H.; Omberg, L.; Chu, A.; Margolin, A. A.; Van’t Veer, L. J.; Lopez-Bigas, N.; Laird, P. W.; Raphael, B. J.; Ding, L.; Robertson, A. G.; Byers, L. A.; Mills, G. B.; Weinstein, J. N.; Van Waes, C.; Chen, Z.; Collisson, EA; , Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin, Cell, 158, 929-944 (2014)
[27] Hoadley, K. A., C. Yau, T. Hinoue, D. M. Wolf, A. J. Lazar, E. Drill, R. Shen, A. M. Taylor, A. D. Cherniack, V. Thorsson, R. Akbani, R. Bowlby, C. K. Wong, M. Wiznerowicz, F. Sanchez-Vega, A. G. Robertson, B. G. Schneider, M. S. Lawrence, H. Noushmehr, T. M. Malta, Cancer Genome Atlas Network, J. M. Stuart, C. C. Benz and P. W. Laird (2018): “Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer.” Cell, 173, 291-304.; Hoadley, K. A.; Yau, C.; Hinoue, T.; Wolf, D. M.; Lazar, A. J.; Drill, E.; Shen, R.; Taylor, A. M.; Cherniack, A. D.; Thorsson, V.; Akbani, R.; Bowlby, R.; Wong, C. K.; Wiznerowicz, M.; Sanchez-Vega, F.; Robertson, A. G.; Schneider, B. G.; Lawrence, M. S.; Noushmehr, H.; Malta, T. M.; , Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer, Cell, 173, 291-304 (2018)
[28] Hoeting, J. A., D. Madigan, A. E. Raftery and C. T. Volinsky (1999): “Bayesian model averaging: a tutorial.” Statist. Sci., 14, 382-417.; Hoeting, J. A.; Madigan, D.; Raftery, A. E.; Volinsky, C. T., Bayesian model averaging: a tutorial, Statist. Sci, 14, 382-417 (1999) · Zbl 1059.62525
[29] Hubert, L. and P. Arabie (1985): “Comparing partitions.” Journal of Classification, 2, 193-218.; Hubert, L.; Arabie, P., Comparing partitions, Journal of Classification, 2, 193-218 (1985) · Zbl 0587.62128
[30] Jain, S. and R. M. Neal (2004): “A split-merge markov chain monte carlo procedure for the dirichlet process mixture model.” J. Comput. Graph. Stat., 13, 158-182.; Jain, S.; Neal, R. M., A split-merge markov chain monte carlo procedure for the dirichlet process mixture model, J. Comput. Graph. Stat, 13, 158-182 (2004)
[31] Jiang, K., B. Kulis and M. I. Jordan (2012): “Small-variance asymptotics for exponential family dirichlet process mixture models.” In: Advances in Neural Information Processing Systems 25. Lake Tahoe, Nevada.; Jiang, K.; Kulis, B.; Jordan, M. I., Small-variance asymptotics for exponential family dirichlet process mixture models, Adv. Neural Inf. Process. Syst.,, 25 (2012)
[32] Jiang, L., Y. Dong, N. Chen and T. Chen (2016): “DACE: a scalable DP-means algorithm for clustering extremely large sequence data.” Bioinformatics, 33, 834-842.; Jiang, L.; Dong, Y.; Chen, N.; Chen, T., DACE: a scalable DP-means algorithm for clustering extremely large sequence data, Bioinformatics, 33, 834-842 (2016)
[33] Kim, S., M. G. Tadesse and M. Vannucci (2006): “Variable selection in clustering via dirichlet process mixture models.” Biometrika, 93, 877-893.; Kim, S.; Tadesse, M. G.; Vannucci, M., Variable selection in clustering via dirichlet process mixture models, Biometrika, 93, 877-893 (2006) · Zbl 1436.62266
[34] Kuett, A., C. Rieger, D. Perathoner, T. Herold, M. Wagner, S. Sironi, K. Sotlar, H.-P. Horny, C. Deniffel, H. Drolle and M. Fiegl (2015): “Il-8 as mediator in the microenvironment-leukaemia network in acute myeloid leukaemia.” Sci. Rep., 5, 18411.; Kuett, A.; Rieger, C.; Perathoner, D.; Herold, T.; Wagner, M.; Sironi, S.; Sotlar, K.; Horny, H.-P.; Deniffel, C.; Drolle, H.; Fiegl, M., Il-8 as mediator in the microenvironment-leukaemia network in acute myeloid leukaemia, Sci. Rep, 5, 18411 (2015)
[35] Kulis, B. and M. I. Jordan (2012): “Revisiting k-means: new algorithms via Bayesian nonparametrics.” In: International Conference on Machine Learning.; Kulis, B.; Jordan, M. I., International Conference on Machine Learning (2012)
[36] Law, M. H. C., M. A. T. Figueiredo and A. K. Jain (2004): “Simultaneous feature selection and clustering using mixture models.” IEEE Trans. Pattern Anal. Mach. Intell., 26, 1154-1166.; Law, M. H. C.; Figueiredo, M. A. T.; Jain, A. K., Simultaneous feature selection and clustering using mixture models, IEEE Trans. Pattern Anal. Mach. Intell, 26, 1154-1166 (2004)
[37] Li, J., Y. Lu, R. Akbani, Z. Ju, P. L. Roebuck, W. Liu, J.-Y. Yang, B. M. Broom, R. G. Verhaak, D. W. Kane, C. Wakefield, J. N Weinstein, G. B. Mills and H. Liang (2013): “TCPA: a resource for cancer functional proteomics data.” Nat. Methods, 10, 1046-1047.; Li, J.; Lu, Y.; Akbani, R.; Ju, Z.; Roebuck, P. L.; Liu, W.; Yang, J.-Y.; Broom, B. M.; Verhaak, R. G.; Kane, D. W.; Wakefield, C.; N Weinstein, J.; Mills, G. B.; Liang, H., TCPA: a resource for cancer functional proteomics data, Nat. Methods, 10, 1046-1047 (2013)
[38] Liverani, S., D. I. Hastie, L. Azizi, M. Papathomas and S. Richardson (2015): “PReMiuM: An R package for profile regression mixture models using Dirichlet processes.” J. Stat. Softw., 64, 1.; Liverani, S.; Hastie, D. I.; Azizi, L.; Papathomas, M.; Richardson, S., PReMiuM: An R package for profile regression mixture models using Dirichlet processes, J. Stat. Softw, 64, 1 (2015)
[39] Lo, A. Y. (1984): “On a class of Bayesian nonparametric estimates: i. density estimates.” Ann. Statist., 12, 351-357.; Lo, A. Y., On a class of Bayesian nonparametric estimates: i. density estimates, Ann. Statist, 12, 351-357 (1984) · Zbl 0557.62036
[40] Lock, E. F. and D. B. Dunson (2013): “Bayesian consensus clustering.” Bioinformatics, 29, 2610-2616.; Lock, E. F.; Dunson, D. B., Bayesian consensus clustering, Bioinformatics, 29, 2610-2616 (2013)
[41] Madigan, D. and A. E. Raftery (1994): “Model selection and accounting for model uncertainty in graphical models using Occam’s window.” J. Am. Stat. Assoc., 89, 1535-1546.; Madigan, D.; Raftery, A. E., Model selection and accounting for model uncertainty in graphical models using Occam’s window, J. Am. Stat. Assoc, 89, 1535-1546 (1994) · Zbl 0814.62030
[42] Marbac, M. and M. Sedki (2017): “Variable selection for model-based clustering using the integrated complete-data likelihood.” Stat. Comput., 27, 1049-1063.; Marbac, M.; Sedki, M., Variable selection for model-based clustering using the integrated complete-data likelihood, Stat. Comput, 27, 1049-1063 (2017) · Zbl 1384.62199
[43] Marbac, M. and M. Sedki (2018): “VarSelLCM: an R/C++ package for variable selection in model-based clustering of mixed-data with missing values.” Bioinformatics, 35, 1255-1257.; Marbac, M.; Sedki, M., VarSelLCM: an R/C++ package for variable selection in model-based clustering of mixed-data with missing values, Bioinformatics, 35, 1255-1257 (2018)
[44] Maugis, C., G. Celeux and M.-L. Martin-Magniette (2009): “Variable selection for clustering with gaussian mixture models.” Biometrics, 65, 701-709.; Maugis, C.; Celeux, G.; Martin-Magniette, M.-L., Variable selection for clustering with gaussian mixture models, Biometrics, 65, 701-709 (2009) · Zbl 1172.62021
[45] Medvedovic, M., K. Y. Yeung and R. E. Bumgarner (2004): “Bayesian mixture model based clustering of replicated microarray data.” Bioinformatics, 20, 1222-1232.; Medvedovic, M.; Yeung, K. Y.; Bumgarner, R. E., Bayesian mixture model based clustering of replicated microarray data, Bioinformatics, 20, 1222-1232 (2004)
[46] Natsuka, S., S. Akira, Y. Nishio, S. Hashimoto, T. Sugita, H. Isshiki and T. Kishimoto (1992): “Macrophage differentiation-specific expression of NF-IL6, a transcription factor for interleukin-6.” Blood, 79, 460-466.; Natsuka, S.; Akira, S.; Nishio, Y.; Hashimoto, S.; Sugita, T.; Isshiki, H.; Kishimoto, T., Macrophage differentiation-specific expression of NF-IL6, a transcription factor for interleukin-6, Blood, 79, 460-466 (1992)
[47] Neal, R. M. (2000): “Markov chain sampling methods for dirichlet process mixture models.” J. Comput. Graph. Stat., 9, 249-265.; Neal, R. M., Markov chain sampling methods for dirichlet process mixture models, J. Comput. Graph. Stat, 9, 249-265 (2000)
[48] Network, C. G. A. (2012): “Comprehensive molecular portraits of human breast tumours.” Nature, 490, 61-70.; Network, C. G. A., Comprehensive molecular portraits of human breast tumours, Nature, 490, 61-70 (2012)
[49] Parker, J. S., M. Mullins, M. C. Cheang, S. Leung, D. Voduc, T. Vickery, S. Davies, C. Fauron, X. He, Z. Hu, J. F. Quackenbush, I. J. Stijleman, J. Palazzo, J. S. Marron, A. B. Nobel, E. Mardis, T. O. Nielsen, M. J. Ellis, C. M. Perou and P. S. Bernard (2009): “Supervised risk predictor of breast cancer based on intrinsic subtypes.” J. Clin. Oncol., 27, 1160-1167.; Parker, J. S.; Mullins, M.; Cheang, M. C.; Leung, S.; Voduc, D.; Vickery, T.; Davies, S.; Fauron, C.; He, X.; Hu, Z.; Quackenbush, J. F.; Stijleman, I. J.; Palazzo, J.; Marron, J. S.; Nobel, A. B.; Mardis, E.; Nielsen, T. O.; Ellis, M. J.; Perou, C. M.; Bernard, P. S., Supervised risk predictor of breast cancer based on intrinsic subtypes, J. Clin. Oncol, 27, 1160-1167 (2009)
[50] Pekarsky, Y., C. Hallas and C. M. Croce (2001): “The role of TCL1 in human T-cell leukemia.” Oncogene, 20, 5638.; Pekarsky, Y.; Hallas, C.; Croce, C. M., The role of TCL1 in human T-cell leukemia, Oncogene, 20, 5638 (2001)
[51] Raftery, A. E. and N. Dean (2006): “Variable selection for model-based clustering.” J. Am. Stat. Assoc., 101, 168-178.; Raftery, A. E.; Dean, N., Variable selection for model-based clustering, J. Am. Stat. Assoc, 101, 168-178 (2006) · Zbl 1118.62339
[52] Rand, W. M. (1971): “Objective criteria for the evaluation of clustering methods.” J. Am. Stat. Assoc., 66, 846-850.; Rand, W. M., Objective criteria for the evaluation of clustering methods, J. Am. Stat. Assoc, 66, 846-850 (1971)
[53] Rasmussen, C. E. (2000): “The infinite gaussian mixture model.” In: Advances in Neural Information Processing Systems 12, Denver, USA, volume 12, pp. 554-560.; Rasmussen, C. E., The infinite gaussian mixture model (2000)
[54] Raykov, Y. P., A. Boukouvalas and M. A. Little (2016a): “Simple approximate MAP inference for Dirichlet processes mixtures.” Electron. J. Statist., 10, 3548-3578.; Raykov, Y. P.; Boukouvalas, A.; Little, M. A., Simple approximate MAP inference for Dirichlet processes mixtures, Electron. J. Statist, 10, 3548-3578 (2016) · Zbl 1357.62227
[55] Raykov, Y. P., A. Boukouvalas, F. Baig and M. A. Little (2016b): “What to do when k-means clustering fails: a simple yet principled alternative algorithm.” PLoS One, 11, e0162259.; Raykov, Y. P.; Boukouvalas, A.; Baig, F.; Little, M. A., What to do when k-means clustering fails: a simple yet principled alternative algorithm, PLoS One, 11, e0162259 (2016)
[56] Russell, N., T. B. Murphy and A. E. Raftery (2015): “Bayesian model averaging in model-based clustering and density estimation.” arXiv preprint arXiv:1506.09035.; Russell, N.; Murphy, T. B.; Raftery, A. E., Bayesian model averaging in model-based clustering and density estimation, arXiv preprint arXiv:1506, 09035 (2015)
[57] Savage, R. S., K. Heller, Y. Xu, Z. Ghahramani, W. M. Truman, M. Grant, K. J. Denby and D. L. Wild (2009): “R/BHC: fast Bayesian hierarchical clustering for microarray data.” BMC Bioinformatics, 10, 242.; Savage, R. S.; Heller, K.; Xu, Y.; Ghahramani, Z.; Truman, W. M.; Grant, M.; Denby, K. J.; Wild, D. L., R/BHC: fast Bayesian hierarchical clustering for microarray data, BMC Bioinformatics, 10, 242 (2009)
[58] Schwarz, G. (1978): “Estimating the dimension of a model.” Ann. Statist., 6, 461-464.; Schwarz, G., Estimating the dimension of a model, Ann. Statist, 6, 461-464 (1978) · Zbl 0379.62005
[59] Scrucca, L. and A. E. Raftery (2014): “clustvarsel: a package implementing variable selection for model-based clustering in R.” J. Stat. Softw., 84, 1-28.; Scrucca, L.; Raftery, A. E., clustvarsel: a package implementing variable selection for model-based clustering in R, J. Stat. Softw, 84, 1-28 (2014)
[60] Scrucca, L., M. Fop, T. B. Murphy and A. E. Raftery (2016): “mclust 5: clustering, classification and density estimation using Gaussian finite mixture models.” R J, 8, 205-233.; Scrucca, L.; Fop, M.; Murphy, T. B.; Raftery, A. E., mclust 5: clustering, classification and density estimation using Gaussian finite mixture models, R J, 8, 205-233 (2016)
[61] Şenbabaoğlu, Y., S. O. Sümer, F. Sánchez-Vega, D. Bemis, G. Ciriello, N. Schultz and C. Sander (2016): “A multi-method approach for proteomic network inference in 11 human cancers.” PLoS Comput. Biol., 12, e1004765.; Şenbabaoğlu, Y.; Sümer, S. O.; Sánchez-Vega, F.; Bemis, D.; Ciriello, G.; Schultz, N.; Sander, C., A multi-method approach for proteomic network inference in 11 human cancers, PLoS Comput. Biol, 12, e1004765 (2016)
[62] Shochat, C., N. Tal, O. R. Bandapalli, C. Palmi, I. Ganmore, G. Te Kronnie, G. Cario, G. Cazzaniga, A. E. Kulozik, M. Stanulla, M. Schrappe, A. Biondi, G. Basso, D. Bercovich, M. U. Muckenthaler, S. Izraeli (2011): “Gain-of-function mutations in interleukin-7 receptor-α (IL7R) in childhood acute lymphoblastic leukemias.” J. Exp. Med., 208, 901-908.; Shochat, C.; Tal, N.; Bandapalli, O. R.; Palmi, C.; Ganmore, I.; Te Kronnie, G.; Cario, G.; Cazzaniga, G.; Kulozik, A. E.; Stanulla, M.; Schrappe, M.; Biondi, A.; Basso, G.; Bercovich, D.; Muckenthaler, M. U.; Izraeli, S., Gain-of-function mutations in interleukin-7 receptor-α (IL7R) in childhood acute lymphoblastic leukemias, J. Exp. Med, 208, 901-908 (2011)
[63] Städler, N., F. Dondelinger, S. M. Hill, R. Akbani, Y. Lu, G. B. Mills and S. Mukherjee (2017): “Molecular heterogeneity at the network level: high-dimensional testing, clustering and a TCGA case study.” Bioinformatics, 33, 2890-2896.; Städler, N.; Dondelinger, F.; Hill, S. M.; Akbani, R.; Lu, Y.; Mills, G. B.; Mukherjee, S., Molecular heterogeneity at the network level: high-dimensional testing, clustering and a TCGA case study, Bioinformatics, 33, 2890-2896 (2017)
[64] Tadesse, M. G., N. Sha and M. Vannucci (2005): “Bayesian variable selection in clustering high-dimensional data.” J. Am. Stat. Assoc., 100, 602-617.; Tadesse, M. G.; Sha, N.; Vannucci, M., Bayesian variable selection in clustering high-dimensional data, J. Am. Stat. Assoc, 100, 602-617 (2005) · Zbl 1117.62433
[65] Teh, Y. W., M. I. Jordan, M. J. Beal and D. M. Blei (2006): “Hierarchical dirichlet processes.” J. Am. Stat. Assoc., 101, 1566-1581.; Teh, Y. W.; Jordan, M. I.; Beal, M. J.; Blei, D. M., Hierarchical dirichlet processes, J. Am. Stat. Assoc, 101, 1566-1581 (2006) · Zbl 1171.62349
[66] Uhlen, M., C. Zhang, S. Lee, E. Sjöstedt, L. Fagerberg, G. Bidkhori, R. Benfeitas, M. Arif, Z. Liu, F. Edfors, K. Sanli, K. von Feilitzen, P. Oksvold, E. Lundberg, S. Hober, P. Nilsson, J. Mattsson, J. M. Schwenk, H. Brunnström, B. Glimelius, T. Sjöblom, P. H. Edqvist, D. Djureinovic, P. Micke, C. Lindskog, A. Mardinoglu and F. Ponten (2017): “A pathology atlas of the human cancer transcriptome.” Science, 357, eaan2507.; Uhlen, M.; Zhang, C.; Lee, S.; Sjöstedt, E.; Fagerberg, L.; Bidkhori, G.; Benfeitas, R.; Arif, M.; Liu, Z.; Edfors, F.; Sanli, K.; von Feilitzen, K.; Oksvold, P.; Lundberg, E.; Hober, S.; Nilsson, P.; Mattsson, J.; Schwenk, J. M.; Brunnström, H.; Glimelius, B.; Sjöblom, T.; Edqvist, P. H.; Djureinovic, D.; Micke, P.; Lindskog, C.; Mardinoglu, A.; Ponten, F., A pathology atlas of the human cancer transcriptome, Science, 357, eaan2507 (2017)
[67] Van der Velden, V., M. Brüggemann, P. Hoogeveen, M. de Bie, P. Hart, T. Raff, H. Pfeifer, S. Lüschen, T. Szczepański, E. Van Wering, M. Kneba and J. J. van Dongen (2004): “TCRB gene rearrangements in childhood and adult precursor-B-ALL: frequency, applicability as MRD-PCR target, and stability between diagnosis and relapse.” Leukemia, 18, 1971.; Van der Velden, V.; Brüggemann, M.; Hoogeveen, P.; de Bie, M.; Hart, P.; Raff, T.; Pfeifer, H.; Lüschen, S.; Szczepański, T.; Van Wering, E.; Kneba, M.; van Dongen, J. J., TCRB gene rearrangements in childhood and adult precursor-B-ALL: frequency, applicability as MRD-PCR target, and stability between diagnosis and relapse, Leukemia, 18, 1971 (2004)
[68] Wang, L. and D. B. Dunson (2011): “Fast Bayesian inference in dirichlet process mixture models.” J. Comput. Graph. Stat., 20, 196-216.; Wang, L.; Dunson, D. B., Fast Bayesian inference in dirichlet process mixture models, J. Comput. Graph. Stat, 20, 196-216 (2011)
[69] Weinstein, J. N., E. A. Collisson, G. B. Mills, K. R. M. Shaw, B. A. Ozenberger, K. Ellrott, I. Shmulevich, C. Sander, J. M. Stuart, Cancer Genome Atlas Research Network (2013): “The cancer genome atlas pan-cancer analysis project.” Nat. Genet., 45, 1113-1120.; Weinstein, J. N.; Collisson, E. A.; Mills, G. B.; Shaw, K. R. M.; Ozenberger, B. A.; Ellrott, K.; Shmulevich, I.; Sander, C.; Stuart, J. M.; , The cancer genome atlas pan-cancer analysis project, Nat. Genet, 45, 1113-1120 (2013)
[70] Welch, B. L. (1947): “The generalization of ‘student’s’ problem when several different population variances are involved.” Biometrika, 34, 28-35.; Welch, B. L., The generalization of ‘student’s’ problem when several different population variances are involved, Biometrika, 34, 28-35 (1947) · Zbl 0029.40802
[71] Witten, D. M. and R. Tibshirani (2010): “A framework for feature selection in clustering.” J. Am. Stat. Assoc., 105, 713-726.; Witten, D. M.; Tibshirani, R., A framework for feature selection in clustering, J. Am. Stat. Assoc, 105, 713-726 (2010) · Zbl 1392.62194
[72] Zhang, X., D. J. Nott, C. Yau and A. Jasra (2014): “A sequential algorithm for fast fitting of dirichlet process mixture models.” J. Comput. Graph. Stat., 23, 1143-1162.; Zhang, X.; Nott, D. J.; Yau, C.; Jasra, A., A sequential algorithm for fast fitting of dirichlet process mixture models, J. Comput. Graph. Stat, 23, 1143-1162 (2014)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.