zbMATH — the first resource for mathematics

De novo clustering of long-read transcriptome data using a greedy, quality-value based algorithm. (English) Zbl 1412.92105
Cowen, Lenore J. (ed.), Research in computational molecular biology. 23rd annual international conference, RECOMB 2019, Washington, DC, USA, May 5–8, 2019. Proceedings. Cham: Springer. Lect. Notes Comput. Sci. 11467, 227-242 (2019).
Summary: Long-read sequencing of transcripts with PacBio Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin. To address this challenge, we develop isONclust, a clustering algorithm that is greedy (in order to scale) and makes use of quality values (in order to handle variable error rates). We test isONclust on three simulated and five biological datasets, across a breadth of organisms, technologies, and read depths. Our results demonstrate that isONclust is a substantial improvement over previous approaches, both in terms of overall accuracy and/or scalability to large datasets. Our tool is available at https://github.com/ksahlin/isONclust.
For the entire collection see [Zbl 1408.92004].
92C40 Biochemistry, molecular biology
68W05 Nonnumerical algorithms
Full Text: DOI