Probabilistic models for error correction of nonuniform sequencing data. (English) Zbl 07229384
Elloumi, Mourad (ed.), Algorithms for next-generation sequencing data. Techniques, approaches, and applications. Cham: Springer (ISBN 978-3-319-59824-6/hbk; 978-3-319-59826-0/ebook). 131-145 (2017).
Summary: Sequencing error correction has become an important step in the analyses of next-generation sequencing (NGS) datasets in order to improve data quality for downstream applications. In this chapter, we discuss different formulations for sequencing read error corrections that are based on probabilistic models able to handle datasets with a nonuniform read coverage. Nonuniform coverage is common in several applications of NGS, including small RNA and messenger RNA sequencing, metagenomics, metatranscriptomics, and single-cell sequencing. Here, we review popular formulations based on the Hamming graph of \(k\)-mers found in sequencing reads and introduce a more complete formulation that can also handle insertion and deletion errors. as found in As the breadth of applications is steadily increasing to In this chapter, we will introduce different approaches to correct sequencing errors with probabilistic models. One common formulation is based on models over Hamming graphs. A particular focus will be on a more general formulation using hidden Markov models that can solve indel errors. These methods are suitable for the correction of reads from experiments with nonuniform coverage, like RNA-Seq, single-cell sequencing, or metagenomics, a topic of rising importance in the community.
For the entire collection see [Zbl 1383.68005].
68W32 Algorithms on strings
92D20 Protein sequences, DNA sequences
Full Text: DOI
