×

BOPA: A Bayesian hierarchical model for outlier expression detection. (English) Zbl 1255.62065

Summary: In many cancer studies, a gene may be expressed in some but not all of the disease samples, reflecting the complexity of the underlying disease. The traditional t-test assumes a mean shift for the tumor samples compared to normal samples and is thus not structured to capture partial differential expressions. More powerful tests specially designed for this situation can find genes with heterogeneous expressions associated with possible subtypes of the cancer. This article proposes a Bayesian model for cancer outlier profile analysis (BOPA). We build on the Gamma-Gamma model introduced by M. A. Newton et al. (see, e.g., [Biostatistics 5, No. 2, 155–176 (2004; Zbl 1096.62124)]), by using a five-component mixture model to represent various differential expression patterns. The hierarchical mixture model explicitly accounts for outlier expressions, and inferences are based on samples from posterior distributions generated from the Markov chain Monte Carlo algorithm we have developed. We present simulation and real-life dataset analyses to demonstrate the proposed methodology.

MSC:

62F15 Bayesian inference
62P10 Applications of statistics to biology and medical sciences; meta analysis
92C50 Medical applications (general)

Citations:

Zbl 1096.62124

Software:

gaga; boa
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Baldi, P.; Long, A. D., A Bayesian framework for the analysis of microarray expression data: regularized \(t\)-test and statistical inferences of gene changes, Bioinformatics, 17, 6, 509-519 (2001)
[2] Do, K. A.; Müller, P.; Tang, F., A Bayesian mixture model for differential gene expression, Journal of the Royal Statistical Society Series C—Applied Statistics, 54, 627-644 (2005) · Zbl 1490.62353
[3] Efron, B., Large-scale simultaneous hypothesis testing, Journal of the American Statistical Association, 99, 465, 96-104 (2004) · Zbl 1089.62502
[4] Gelman, A.; Rubin, D. B., Inference from iterative simulation using multiple sequences, Statistical Science, 7, 457-472 (1992) · Zbl 1386.65060
[5] Geweke, J., (Bernardo, J. M.; Berger, J. O.; Dawid, A. P.; Smith, A. F.M., Evaluating the Accuracy of Sampling-Based Approaches to the Calculation of Posterior Moments. Evaluating the Accuracy of Sampling-Based Approaches to the Calculation of Posterior Moments, Bayesian Statistics, Vol. 4 (1992), Clarendon Press: Clarendon Press Oxford)
[6] Gottardo, R.; Raftery, A. E., Markov chain Monte Carlo with mixtures of mutually singular distributions, Journal of Computational and Graphical Statistics, 17, 4, 949-975 (2008)
[7] Gottardo, R.; Raftery, A.; Yeung, K.; Bumgarner, R., Bayesian robust inference for differential gene expression in microarrays with multiple samples, Biometrics, 62, 1, 10-18 (2006) · Zbl 1099.62128
[8] Hedenfalk, I.; Duggan, D.; Chen, Y.; Radmacher, M.; Bittner, M.; Simon, R.; Meltzer, P.; Gusterson, B.; Esteller, M.; Kallioniemi, O.; Wilfond, B.; Borg, A.; Trent, J., Gene-expression profiles in hereditary breast cancer, New England Journal of Medicine, 344, 8, 539-548 (2001)
[9] Ibrahim, J. G.; Chen, M. H.; Gray, R. J., Bayesian models for gene expression with DNA microarray data, Journal of the American Statistical Association, 97, 457, 88-99 (2002) · Zbl 1073.62578
[10] Jain, S.; Neal, R. M., A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model, Journal of Computational and Graphical Statistics, 13, 1, 158-182 (2004)
[11] Jensen, S. T.; Erkan, I.; Arnardottir, E. S.; Small, D. S., Bayesian testing of many hypotheses \(X\) many genes: a study of sleep apnea, Annals of Applied Statistics, 3, 3, 1080-1101 (2009) · Zbl 1196.62140
[12] Juin, P.; Hunt, A.; Littlewood, T.; Griffiths, B.; Swigart, L.; Korsmeyer, S.; Evan, G., C-Myc functionally cooperates with BAX to induce apoptosis, Molecular and Cellular Biology, 22, 17, 6158-6169 (2002)
[13] Kendziorski, C. M.; Newton, M. A.; Lan, H.; Gould, M. N., On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles, Statistics in Medicine, 22, 24, 3899-3914 (2003)
[14] Lewin, A.; Richardson, S.; Marshall, C.; Glazier, A.; Aitman, T., Bayesian modeling of differential gene expression, Biometrics, 62, 1, 1-9 (2006) · Zbl 1099.62131
[15] Lo, K.; Gottardo, R., Flexible empirical Bayes models for differential gene expression, Bioinformatics, 23, 3, 328-335 (2007)
[16] Newton, M. A.; Kendziorski, C. M.; Richmond, C. S.; Blattner, F. R.; Tsui, K. W., On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data, Journal of Computational Biology, 8, 1, 37-52 (2001)
[17] Newton, M. A.; Noueiry, A.; Sarkar, D.; Ahlquist, P., Detecting differential gene expression with a semiparametric hierarchical mixture method, Biostatistics, 5, 2, 155-176 (2004) · Zbl 1096.62124
[18] Pidgeon, G.; Barr, M.; Harmey, J.; Foley, D.; Bouchier-Hayes, D., Vascular endothelial growth factor(VEGF) upregulates BCL-2 and inhibits apoptosis in human and murine mammary adenocarcinoma cells, British Journal of Cancer, 85, 2, 273-278 (2001)
[19] Ronen, A.; Glickman, B., Human DNA repair genes, Environmental and Molecular Mutagenesis, 37, 3, 241-283 (2001)
[20] Rossell, D., GaGa: a parsimonious and flexible model for differential expression analysis, Annals of Applied Statistics, 3, 3, 1035-1051 (2009) · Zbl 1257.62111
[21] Singh, D.; Febbo, P.; Ross, K.; Jackson, D.; Manola, J.; Ladd, C.; Tamayo, P.; Renshaw, A.; D’Amico, A.; Richie, J.; Lander, E.; Loda, M.; Kantoff, P.; Golub, T.; Sellers, W., Gene expression correlates of clinical prostate cancer behavior, Cancer Cell, 1, 2, 203-209 (2002)
[22] Smith, B. J., Boa: an \(R\) package for MCMC output convergence assessment and posterior inference, Journal of Statistical Software, 21, 11 (2007)
[23] Smyth, G. K., Linear models and empirical Bayes methods for assessing differential expression in microarray experiments, Statistical applications in genetics and molecular biology, 3, 1, 1027 (2004)
[24] Storey, J. D.; Tibshirani, R., Statistical significance for genomewide studies, Proceedings of the National Academy of Sciences of the United States of America, 100, 16, 9440-9445 (2003) · Zbl 1130.62385
[25] Tibshirani, R.; Hastie, T., Outlier sums for differential gene expression analysis, Biostatistics, 8, 1, 2-8 (2007) · Zbl 1121.62102
[26] Tomlins, S. A.; Rhodes, D. R.; Perner, S.; Dhanasekaran, S. M.; Mehra, R.; Sun, X. W.; Varambally, S.; Cao, X. H.; Tchinda, J.; Kuefer, R.; Lee, C.; Montie, J. E.; Shah, R. B.; Pienta, K. J.; Rubin, M. A.; Chinnaiyan, A. M., Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer, Science, 310, 5748, 644-648 (2005)
[27] Wu, B. L., Cancer outlier differential gene expression detection, Biostatistics, 8, 3, 566-575 (2007) · Zbl 1121.62105
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.