The importance of distinct modeling strategies for gene and gene-specific treatment effects in hierarchical models for microarray data. (English) Zbl 1454.62357

Summary: When analyzing microarray data, hierarchical models are often used to share information across genes when estimating means and variances or identifying differential expression. Many methods utilize some form of the two-level hierarchical model structure suggested by Kendziorski et al. in which the first level describes the distribution of latent mean expression levels among genes and among differentially expressed treatments within a gene. The second level describes the conditional distribution, given a latent mean, of repeated observations for a single gene and treatment. Many of these models, including those used in Kendziorski et al.’s EBarrays package, assume that expression level changes due to treatment effects have the same distribution as expression level changes from gene to gene. We present empirical evidence that this assumption is often inadequate and propose three-level hierarchical models as extensions to the two-level log-normal based EBarrays models to address this inadequacy. We demonstrate that use of our three-level models dramatically changes analysis results for a variety of microarray data sets and verify the validity and improved performance of our suggested method in a series of simulation studies. We also illustrate the importance of accounting for the uncertainty of gene-specific error variance estimates when using hierarchical models to identify differentially expressed genes.


62P10 Applications of statistics to biology and medical sciences; meta analysis


EBarrays; gaga
Full Text: DOI arXiv Euclid


[1] Baldi, P. and Long, A. D. (2001). A Bayesian framework for the analysis of microarray expression data: Regularized \(t\)-test and statistical inferences of gene changes. Bioinformatics 17 509-519.
[2] Binder, H., Kirsten, T., Loeffler, M. and Stadle, P. F. (2004). Sensitivity of microarray oligonucleotide probes: Variability and effect of base composition. The Journal of Physical Chemistry B 108 18003-18014.
[3] Cui, X., Hwang, J. T. G., Qiu, J., Blades, N. J. and Churchill, G. A. (2005). Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics 6 59-75. · Zbl 1069.62090
[4] Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U. and Speed, T. P. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4 249-264. · Zbl 1141.62348
[5] Jensen, S. T., Erkan, I., Arnardottir, E. S. and Small, D. S. (2009). Bayesian testing of many hypotheses \(\times\) many genes: A study of sleep apnea. Ann. Appl. Stat. 3 1080-1101. · Zbl 1196.62140
[6] Keleş, S. (2007). Mixture modeling for genome-wide localization of transcription factors. Biometrics 63 10-21, 309. · Zbl 1206.62170
[7] Kendziorski, C. M., Newton, M., Lan, H. and Gould, M. N. (2003). On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat. Med. 22 3899-3914.
[8] Lo, K. and Gottardo, R. (2007). Flexible empirical Bayes models for differential gene expression. Bioinformatics 23 328-335.
[9] Lönnstedt, I. and Speed, T. (2002). Replicated microarray data. Statist. Sinica 12 31-46. · Zbl 1004.62086
[10] Lund, S. P. and Nettleton, D. (2012). Supplement to “The importance of distinct modeling strategies for gene and gene-specific treatment effects in hierarchical models for microarray data.” . · Zbl 1454.62357
[11] Nettleton, D., Hwang, J. T. G., Caldo, R. A. and Wise, R. P. (2006). Estimating the number of true null hypotheses from a histogram of \(p\)-values. J. Agric. Biol. Environ. Stat. 11 337-356.
[12] Newton, M. A., Noueiry, A., Sarkar, D. and Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5 155-176. · Zbl 1096.62124
[13] Rossell, D. (2009). Gaga: A parsimonious and flexible model for differential expression analysis. Ann. Appl. Stat. 3 1035-1051. · Zbl 1257.62111
[14] Selinger, D. W., Saxena, R. M., Cheung, K. J., Church, G. M. and Roseno, C. (2003). Global RNA half-life analysis in Escherichia coli reveals positional patterns of transcript degradation. Genome Research 13 216-223.
[15] Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3 Art. 3, 29 pp. (electronic). · Zbl 1038.62110
[16] Somel, M., Creely, H., Franz, H., Mueller, U., Lachmann, M., Khaitovich, P. and Pääbo, S. (2008). Human and chimpanzee gene expression differences replicated in mice fed different diets. PLoS ONE 3 e1504.
[17] Wei, Z. and Li, H. (2007). A Markov random field model for network-based analysis of genomic data. Bioinformatics 23 1537-1544.
[18] Wei, Z. and Li, H. (2008). A hidden spatial-temporal Markov random field model for network-based analysis of time course gene expression data. Ann. Appl. Stat. 2 408-429. · Zbl 1137.62081
[19] Wright, G. W. and Simon, R. M. (2003). A random variance model for detection of differential gene expression in small microarray experiments. Bioinformatics 19 2448-2455.
[20] Wu, H., Yuan, M., Kaech, S. M. and Halloran, M. E. (2007). A statistical analysis of memory CD8 T cell differentiation: An application of a hierarchical state space model to a short time course microarray experiment. Ann. Appl. Stat. 1 442-458. · Zbl 1126.62110
[21] Yuan, M. (2006). Flexible temporal expression profile modelling using the Gaussian process. Comput. Statist. Data Anal. 51 1754-1764. · Zbl 1157.62544
[22] Yuan, M. and Kendziorski, C. (2006a). Hidden Markov models for microarray time course data in multiple biological conditions. J. Amer. Statist. Assoc. 101 1323-1332. · Zbl 1171.62359
[23] Yuan, M. and Kendziorski, C. (2006b). A unified approach for simultaneous gene clustering and differential expression identification. Biometrics 62 1089-1098. · Zbl 1114.62130
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.