zbMATH — the first resource for mathematics

Yanagi: transcript segment library construction for RNA-seq quantification. (English) Zbl 1443.92139
Schwartz, Russell (ed.) et al., 17th international workshop on algorithms in bioinformatics, WABI 2017, Boston, MA, USA, August 21–23, 2017. Proceedings. Wadern: Schloss Dagstuhl – Leibniz Zentrum für Informatik. LIPIcs – Leibniz Int. Proc. Inform. 88, Article 10, 14 p. (2017).
Summary: Analysis of differential alternative splicing from RNA-seq data is complicated by the fact that many RNA-seq reads map to multiple transcripts, and that annotated transcripts from a given gene are often a small subset of many possible complete transcripts for that gene. Here we describe Yanagi, a tool which segments a transcriptome into disjoint regions to create a segments library from a complete transcriptome annotation that preserves all of its consecutive regions of a given length \(L\) while distinguishing annotated alternative splicing events in the transcriptome. In this paper, we formalize this concept of transcriptome segmentation and propose an efficient algorithm for generating segment libraries based on a length parameter dependent on specific RNA-seq library construction. The resulting segment sequences can be used with pseudo-alignment tools to quantify expression at the segment level. We characterize the segment libraries for the reference transcriptomes of Drosophila melanogaster and Homo sapiens. Finally, we demonstrate the utility of quantification using a segment library based on an analysis of differential exon skipping in Drosophila melanogaster and Homo sapiens. The notion of transcript segmentation as introduced here and implemented in Yanagi will open the door for the application of lightweight, ultra-fast pseudo-alignment algorithms in a wide variety of analyses of transcription variation.
For the entire collection see [Zbl 1372.68022].
92D20 Protein sequences, DNA sequences
92-08 Computational methods for problems pertaining to biology
Full Text: DOI
[2] Simon Anders, Alejandro Reyes, and Wolfgang Huber. Detecting differential usage of exons from RNA-seq data. \it Genome research, 22(10):2008-2017, 2012.
[3] Nicolas L. Bray, Harold Pimentel, Páll Melsted, and Lior Pachter. Near-optimal probabil istic RNA-seq quantification. \it Nature biotechnology, 34(5):525-527, 2016.
[4] Brian J. Haas, Arthur L. Delcher, Stephen M. Mount, Jennifer R. Wortman, Roger K. Smith Jr., Linda I. Hannick, Rama Maiti, Catherine M. Ronning, Douglas B. Rusch, Chris topher D. Town, et al. Improving the arabidopsis genome annotation using maximal tran script alignment assemblies. \it Nucleic acids research, 31(19):5654-5666, 2003.
[5] Steffen Heber, Max Alekseyev, Sing-Hoi Sze, Haixu Tang, and Pavel A. Pevzner. Splicing graphs and EST assembly problem. \it Bioinformatics, 18(suppl_1):S181, 2002.
[6] Daehwan Kim, Geo Pertea, Cole Trapnell, Harold Pimentel, Ryan Kelley, and Steven L. Salzberg. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. \it Genome biology, 14(4):R36, 2013.
[7] Ben Langmead and Steven L. Salzberg. Fast gapped-read alignment with Bowtie 2. \it Nature \it methods, 9(4):357-359, 2012.
[8] Charity W. Law, Yunshun Chen, Wei Shi, and Gordon K. Smyth. Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. \it Genome biology, 15(2):R29, 2014.
[9] Bo Li and Colin N. Dewey. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. \it BMC bioinformatics, 12(1):323, 2011.
[10] Rob Patro, Geet Duggal, Michael I. Love, Rafael A Irizarry, and Carl Kingsford. Salmon provides fast and bias-aware quantification of transcript expression. \it Nature Methods, 2017.
[11] Rob Patro, Stephen M. Mount, and Carl Kingsford. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms.\it Nature biotechnology, 32(5):462-464, May 2014.
[12] :14
[13] :13
[14] Gordon K. Smyth et al. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. \it Stat Appl Genet Mol Biol, 3(1):3, 2004. · Zbl 1038.62110
[15] Charlotte Soneson, Katarina L. Matthes, Malgorzata Nowicka, Charity W. Law, and Mark D. Robinson. Isoform prefiltering improves performance of count-based methods for analysis of differential transcript usage. \it Genome biology, 17(1):12, 2016.
[16] Avi Srivastava, Hirak Sarkar, Nitish Gupta, and Rob Patro. RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes. \it Bioinformatics, 32(12):i192, 2016.
[17] Mingxiang Teng, Michael I. Love, Carrie A. Davis, Sarah Djebali, Alexander Dobin, Brenton R. Graveley, Sheng Li, Christopher E. Mason, Sara Olson, Dmitri Pervouchine, et al. A benchmark for RNA-seq quantification pipelines. \it Genome biology, 17(1):74, 2016.
[18] Hagen Tilgner, Fereshteh Jahanbani, Tim Blauwkamp, Ali Moshrefi, Erich Jaeger, Feng Chen, Itamar Harel, Carlos D. Bustamante, Morten Rasmussen, and Michael P. Snyder. Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecu lar co-association of distant splicing events. \it Nature biotechnology, 33(7):736-742, 2015.
[19] Cole Trapnell, Lior Pachter, and Steven L. Salzberg. TopHat: discovering splice junctions with RNA-Seq. \it Bioinformatics, 25(9):1105-1111, 2009.
[20] Cole Trapnell, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J. Van Baren, Steven L. Salzberg, Barbara J. Wold, and Lior Pachter. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. \it Nature biotechnology, 28(5):511-515, 2010.
[21] Jorge Vaquero-Garcia, Alejandro Barrera, Matthew R. Gazzara, Juan Gonzalez-Vallinas, Nicholas F. Lahens, John B. Hogenesch, Kristen W. Lynch, Yoseph Barash, and Juan Valcárcel. A new view of transcriptome complexity and regulation through the lens of local splicing variations. \it eLife, 5:e11752+, February 2016.
[22] S. Lawrence Zipursky, Woj M. Wojtowicz, and Daisuke Hattori. Got diversity? wiring the fly brain with dscam. \it Trends in biochemical sciences, 31(10):581-588, 2006.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.