zbMATH — the first resource for mathematics

Identifying atypically expressed chromosome regions using RNA-Seq data. (English) Zbl 1458.62261
Summary: The number of studies dealing with RNA-Seq data analysis has experienced a fast increase in the past years making this type of gene expression a strong competitor to the DNA microarrays. This paper proposes a Bayesian model to detect low and highly-expressed chromosome regions using RNA-Seq data. The methodology is based on a recent work designed to detect highly-expressed (overexpressed) regions in the context of microarray data. A hidden Markov model is developed by considering a mixture of Gaussian distributions with ordered means in a way that first and last mixture components are supposed to accommodate the under and overexpressed genes, respectively. The model is flexible enough to efficiently deal with the highly irregular spaced configuration of the data by assuming a hierarchical Markov dependence structure. The analysis of four cancer data sets (breast, lung, ovarian and uterus) is presented. Results indicate that the proposed model is selective in determining the expression status, robust with respect to prior specifications and provides tools for a global or local search of under and overexpressed chromosome regions.
62P10 Applications of statistics to biology and medical sciences; meta analysis
62F15 Bayesian inference
92D20 Protein sequences, DNA sequences
Full Text: DOI
[1] Albert, JH, Bayesian estimation of normal ogive item response curves using Gibbs sampling, J Educ Behav Stat, 17, 251-269 (1992)
[2] Anders, S.; Huber, W., Differential expression analysis for sequencing count data, Genome Biol, 11, R106 (2010)
[3] Berger, MF; Levin, JZ; Vijayendran, K.; Sivachenko, A.; Maguire, XAJ; Johnson, LA; Robinson, J.; Verhaak, RG; Sougnez, C.; Onofrio, RC; Ziaugra, L.; Cibulskis, K.; Laine, E.; Barretina, J.; Winckler, W.; Fisher, DE; Getz, G.; Meyerson, M.; Jaffe, DB; Gabriel, SB; Lander, ES; Dummer, R.; Gnirke, A.; Nusbaum, C.; Garraway, LA, Integrative analysis of the melanoma transcriptome, Genome Res, 20, 413-427 (2010)
[4] Bivand, R.; Piras, G., Comparing implementations of estimation methods for spatial econometrics, J Stat Softw, 63, 18, 1-36 (2015)
[5] Broet, P.; Lewin, A.; Richardson, S.; Dalmasso, C.; Magdelenat, H., A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments, Bioinformatics, 20, 2562-2571 (2004)
[6] Bullard, JH; Purdom, E.; Hansen, KD; Dudoit, S., Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments, BMC Bioinform, 11, 94 (2010)
[7] Chu, Y.; Corey, DR, RNA sequencing: platform selection, experimental design and data interpretation, Nucl Acid Ther, 22, 4, 271-274 (2012)
[8] Conesa, A.; Madrigal, P.; Tarazona, S.; Gomez-Cabrero, D.; Cervera, A.; McPherson, A.; Szczesniak, MW; Gaffney, D.; Elo, LL; Zhang, X.; Mortazavi, A., A survey of best practices for RNA-Seq data analysis, Genome Biol, 17, 13 (2016)
[9] Dean, N.; Raftery, AE, Normal uniform mixture differential gene expression detection for cDNA microarrays, BMC Bioinform, 6, 1, 173-187 (2005)
[10] Dillies, MA; Rau, A.; Aubert, J.; Hennequet-Antier, C.; Jeanmougin, M.; Servant, N.; Keime, C.; Marot, G.; Castel, D.; Estelle, J.; Guernec, G.; Jagla, B.; Jouneau, L.; Laloe, D.; Le-Gall, C.; Schaeffer, B.; Le-Crom, S.; Guedj, M.; Jaffrezic, F., A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis, Brief Bioinform, 14, 6, 671-683 (2012)
[11] Do, KA; Muller, P.; Tang, F., A Bayesian mixture model for differential gene expression, J R Stat Soc Ser C, 54, 3, 627-644 (2005) · Zbl 05188702
[12] Frazee, AC; Sabunciyan, S.; Hansen, KD; Irizarry, RA; Leek, JT, Differential expression analysis of DNA-Seq data at single-base resolution, Biostatistics, 15, 3, 413-426 (2014)
[13] Gentleman, R.; Carey, V.; Bates, D.; Bolstad, B.; Dettling, M.; Dudoit, S.; Ellis, B.; Gautier, L.; Ge, Y.; Gentry, J.; Hornik, K.; Hothorn, T.; Huber, W.; Iacus, S.; Irizarry, R.; Leisch, F.; Li, C.; Maechler, M.; Rossini, A.; Sawitzki, G.; Smith, C.; Smyth, G.; Tierney, L.; Yang, J.; Zhang, J., Bioconductor: open software development for computational biology and bioinformatics, Genome Biol, 5, R80 (2004)
[14] Geweke, J.; Bernardo, JM; Berger, J.; Dawid, AP; Smith, AFM, Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments, Bayesian statistics, 169-193 (1992), Oxford: Oxford University Press, Oxford
[15] Green, PJ, Reversible jump MCMC and Bayesian model determination, Biometrika, 82, 4, 711-732 (1995) · Zbl 0861.62023
[16] Han, Y.; Chen, J.; Zhao, X.; Liang, C.; Wang, Y.; Sun, L.; Jiang, Z.; Zhang, Z.; Yang, R.; Chen, J.; Li, Z.; Tang, A.; Li, Z.; Ye, J.; Guan, Z.; Gui, Y.; Cai, Z., MicroRNA expression signatures of bladder cancer revealed by deep sequencing, PLoS One, 6, 3, e18286 (2011)
[17] Hansen, KD; Irizarry, RA; Wu, Z., Removing technical variability in RNA-seq data using conditional quantile normalization, Biostatistics, 41, 2, 204-216 (2012) · Zbl 1437.62486
[18] Hebenstreit, D.; Fang, M.; Gu, M.; Charoensawan, V.; Van-Oudenaarden, A.; Teichmann, SA, RNA sequencing reveal two major classes of gene expression levels in metazoan cells, Mol Syst Biol, 7, 497 (2011)
[19] Lewin, A.; Bochkina, N.; Richardson, S., Fully Bayesian mixture model for differential gene expression: simulations and model checks, Stat Appl Genet Mol Biol, 6, 36 (2007) · Zbl 1276.92090
[20] Liu, JS, The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem, J Am Stat Assoc, 89, 958-966 (1994) · Zbl 0804.62033
[21] Lucas, JE; Kung, HN; Chi, JTA, Latent factor analysis to discover pathway associated putative segmental aneuploidies in human cancers, PLoS Comput Biol, 6, e1000920 (2010)
[22] Maher, CA; Kumar-Sinha, C.; Cao, X.; Kalyana-Sundaram, S.; Han, B.; Jing, X.; Sam, L.; Barrette, T.; Palanisamy, N.; Chinnaiyan, AM, Transcriptome sequencing to detect gene fusions in cancer, Nature, 458, 7234, 97-101 (2009)
[23] Mayrink, VD; Gonçalves, FB, A Bayesian hidden Markov mixture model to detect overexpressed chromosome regions, J R Stat Soc Ser C, 66, 2, 387-412 (2017)
[24] McCarthy, DJ; Chen, Y.; Smyth, GK, Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation, Nucl Acids Res, 40, 4288-4297 (2012)
[25] Moran, PAP, Notes on continuous stochastic phenomena, Biometrika, 37, 1, 17-23 (1950) · Zbl 0041.45702
[26] Nueda, MJ; Tarazona, S.; Conesa, A., Next maSigPro: updating maSigPro bioconductor package for RNA-Seq time series, Bioinformatics, 30, 18, 2598-2602 (2014)
[27] Oshlack, A.; Robinson, MD; Young, MD, From RNA-Seq reads to differential expression results, Genome Biol, 11, 12, 220 (2010)
[28] Papastamoulis, P.; Rattray, M., A Bayesian model selection approach for identifying differentially expressed transcripts from RNA sequencing data, J R Stat Soc Ser C, 67, 1, 3-23 (2018)
[29] Plummer, M.; Best, N.; Cowles, K.; Vines, K., CODA: convergence diagnosis and output analysis for MCMC, R News, 6, 1, 7-11 (2006)
[30] Pollack, JR; Sorlie, T.; Perou, CM; Rees, CA; Jeffrey, SS; Lonning, PE; Tibshirani, R.; Botstein, D.; Dale, ALB; Brown, PO, Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors, Proc Natl Acad Sci USA, 99, 12963-12968 (2002)
[31] R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org. Accessed 10 Oct 2019
[32] Robinson, MD; Oshlack, A., A scaling normalization method for differential expression analysis of RNA-Seq data, Genome Biol, 11, R25 (2010)
[33] Robinson, MD; McCarthy, DJ; Smyth, GK, edgeR: a bioconductor package for differential expression analysis of digital gene expression data, Bioinformatics, 26, 139-140 (2010)
[34] Soneson, C.; Delorenzi, M., A comparison of methods for differential expression analysis of RNA-Seq data, BMC Bioinform, 14, 91 (2013)
[35] Van-De-Wiel, MA; Leday, GGR; Pardo, L.; Rue, H.; Van-Der-Vaart, AW; Van-Wieringen, WN, Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors, Biostatistics, 14, 1, 113-128 (2013)
[36] Wagner, GP; Kin, K.; Lynch, VJ, A model based criterion for gene expression calls using RNA-Seq data, Theory Biosci, 132, 3, 159-164 (2013)
[37] Wang, Z.; Gerstein, M.; Snyder, M., RNA-Seq: a revolutionary tool for transcriptomics, Nat Rev Genet, 10, 1, 57-63 (2009)
[38] Zhang, H.; Xu, J.; Jiang, N.; Hu, X.; Luo, Z., PLNseq: a multivariate Poisson lognormal distribution for high-throughput matched RNA-sequencing read count data, Stat Med, 34, 1577-1589 (2015)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.