×

An empirical Bayes change-point model for transcriptome time-course data. (English) Zbl 1475.62268

Summary: Time-course experiments are commonly conducted to capture temporal changes. It is generally of interest to detect if any changes happen over time, which we define as a detection problem. If there is a change, it is informative to know when the change is, which we define as an identification problem. It is often desired to control Type I error rate at a nominal level while applying a testing procedure to detect or identify these changes. Quite a few analytic methods have been proposed. Most existing methods aim to solve either the detection problem or, more recently, the identification problem. Here, we propose to solve these two problems using a unified multiple-testing framework built upon an empirical Bayes change-point model. Our model provides a flexible framework that can account for sophisticated temporal gene expression patterns. We show that our testing procedure is valid and asymptotically optimal in the sense of rejecting the maximum number of null hypotheses, while the Bayesian false discovery rate (FDR) can be controlled at a predefined nominal level. Simulation studies and application to real transcriptome time-course data illustrate that our proposed model is a flexible and powerful method to capture various temporal patterns in analysis of time-course data.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62G10 Nonparametric hypothesis testing
92D20 Protein sequences, DNA sequences
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G. et al. (2016). TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) 265-283. USENIX Association, Savannah, GA, USA.
[2] Allaire, J. and Tang, Y. (2020). tensorflow: R Interface to ‘TensorFlow’. R package version 2.0.0.
[3] Arbeitman, M. N., Furlong, E. E., Imam, F., Johnson, E., Null, B. H., Baker, B. S., Krasnow, M. A., Scott, M. P., Davis, R. W. et al. (2002). Gene expression during the life cycle of Drosophila melanogaster. Science 297 2270-2275.
[4] Bar-Joseph, Z., Gitter, A. and Simon, I. (2012). Studying and modelling dynamic biological processes using time-series gene expression data. Nat. Rev. Genet. 13 552-564. · doi:10.1038/nrg3244
[5] Barry, D. and Hartigan, J. A. (1992). Product partition models for change point problems. Ann. Statist. 20 260-279. · Zbl 0780.62071 · doi:10.1214/aos/1176348521
[6] Barry, D. and Hartigan, J. A. (1993). A Bayesian analysis for change point problems. J. Amer. Statist. Assoc. 88 309-319. · Zbl 0775.62065
[7] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289-300. · Zbl 0809.62014
[8] Calvano, S. E., Xiao, W., Richards, D. R., Felciano, R. M., Baker, H. V., Cho, R. J., Chen, R. O., Brownstein, B. H., Cobb, J. P. et al. (2005). A network-based analysis of systemic inflammation in humans. Nature 437 1032-1037.
[9] Denison, D. G. T., Holmes, C. C., Mallick, B. K. and Smith, A. F. M. (2002). Bayesian Methods for Nonlinear Classification and Regression. Wiley Series in Probability and Statistics. Wiley, Chichester. · Zbl 0994.62019
[10] Diggle, P. J., Heagerty, P. J., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data, 2nd ed. Oxford Statistical Science Series 25. Oxford Univ. Press, Oxford.
[11] Efron, B. (2010). Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Institute of Mathematical Statistics (IMS) Monographs 1. Cambridge Univ. Press, Cambridge. · Zbl 1277.62016 · doi:10.1017/CBO9780511761362
[12] Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151-1160. · Zbl 1073.62511 · doi:10.1198/016214501753382129
[13] Gautier, L., Cope, L., Bolstad, B. M. and Irizarry, R. A. (2004). affy—analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20 307-315.
[14] Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U. and Speed, T. P. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4 249-264. · Zbl 1141.62348 · doi:10.1093/biostatistics/4.2.249
[15] Kalaitzis, A. A. and Lawrence, N. D. (2011). A simple approach to ranking differentially expressed gene expression time courses through Gaussian process regression. BMC Bioinform. 12 180. · doi:10.1186/1471-2105-12-180
[16] Kendziorski, C., Newton, M., Lan, H. and Gould, M. (2003). On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat. Med. 22 3899-3914.
[17] Kimeldorf, G. S. and Wahba, G. (1970). A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Ann. Math. Stat. 41 495-502. · Zbl 0193.45201 · doi:10.1214/aoms/1177697089
[18] Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. Preprint. Available at arXiv:1412.6980.
[19] Law, C. W., Chen, Y., Shi, W. and Smyth, G. K. (2014). Voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15 1.
[20] Lönnstedt, I., Grant, S., Begley, G. and Speed, T. (2005). Microarray analysis of two interacting treatments: A linear model and trends in expression over time.
[21] Love, M. I., Huber, W. and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15 550. · doi:10.1186/s13059-014-0550-8
[22] Murphy, K. P. (2007). Conjugate Bayesian analysis of the Gaussian distribution. Def. 1 16.
[23] Newton, M. A., Kendziorski, C. M., Richmond, C. S., Blattner, F. R. and Tsui, K. W. (2001). On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. J. Comput. Biol. 8 37-52. · doi:10.1089/106652701300099074
[24] Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010). edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26 139-140.
[25] Smyth, G. K. (2005). limma: Linear Models for Microarray Data. In Bioinformatics and Computational Biology Solutions Using R and Bioconductor 397-420. Springer, New York, NY.
[26] Sun, W. and Wei, Z. (2011). Multiple testing for pattern identification, with applications to microarray time-course experiments. J. Amer. Statist. Assoc. 106 73-88. · Zbl 1396.62261 · doi:10.1198/jasa.2011.ap09587
[27] Tai, Y. C. and Speed, T. P. (2005). DNA Microarrays. Chapter 20: Statistical Analysis of Microarray Time Course Data. CRC Press/CRC, New York.
[28] Tai, Y. C. and Speed, T. P. (2006). A multivariate empirical Bayes statistic for replicated microarray time course data. Ann. Statist. 34 2387-2412. · Zbl 1106.62008 · doi:10.1214/009053606000000759
[29] Tian, T., Cheng, R. and Wei, Z. (2021a). Supplement to “An empirical bayes change-point model for transcriptome time course data.” https://doi.org/10.1214/20-AOAS1403SUPPA
[30] Tian, T., Cheng, R. and Wei, Z. (2021b). Source code to “An empirical bayes change-point model for transcriptome time course data.” https://doi.org/10.1214/20-AOAS1403SUPPB
[31] Tian, B., Nowak, D. E. and Brasier, A. R. (2005). A TNF-induced gene expression program under oscillatory NF-\(κB\) control. BMC Genomics 6 1.
[32] Wang, Z., Gerstein, M. and Snyder, M. (2009). RNA-Seq: A revolutionary tool for transcriptomics. Nat. Rev. Genet. 10 57-63. · doi:10.1038/nrg2484
[33] Wang, W., Wei, Z. and Li, H. (2014). A change-point model for identifying 3’UTR switching by next-generation RNA sequencing. Bioinformatics 30 2162-2170.
[34] Xie, J., Cai, T. T., Maris, J. and Li, H. (2011). Optimal false discovery rate control for dependent data. Stat. Interface 4 417-430. · Zbl 1245.62091 · doi:10.4310/SII.2011.v4.n4.a1
[35] Xuan, X. and Murphy, K. (2007). Modeling changing dependency structure in multivariate time series. In Proceedings of the 24th International Conference on Machine Learning 1055-1062. ACM, New York.
[36] Yang, J., Penfold, C. A., Grant, M. R. and Rattray, M. (2016). Inferring the perturbation time from biological time course data. Bioinformatics 32 2956-2964.
[37] Yuan, M. and Kendziorski, C. (2006). Hidden Markov models for microarray time course data in multiple biological conditions. J. Amer. Statist. Assoc. 101 1323-1332. · Zbl 1171.62359 · doi:10.1198/016214505000000394
[38] Zhang, J. and Wei, Z. (2016). An empirical Bayes change-point model for identifying \[{3^{\prime }}\] and \[{5^{\prime }}\] alternative splicing by next-generation RNA sequencing. Bioinformatics 32 1823-1831.
[39] Zhao, L. P., Prentice, R. and Breeden, L. (2001). Statistical modeling of large microarray data sets to identify stimulus-response profiles. Proc. Natl. Acad. Sci. USA 98 5631-5636. · Zbl 0990.62093
[40] Zhao, Z., Wang, W. and Wei, Z. (2013). An empirical Bayes testing procedure for detecting variants in analysis of next generation sequencing data. Ann. Appl. Stat. 7 2229-2248 · Zbl 1283.62011 · doi:10.1214/13-AOAS660
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.