Comparative assessment of alignment algorithms for NGS data: features, considerations, implementations, and future. (English) Zbl 1457.68339

Elloumi, Mourad (ed.), Algorithms for next-generation sequencing data. Techniques, approaches, and applications. Cham: Springer. 187-202 (2017).
Summary: Due to the nature of massively parallel sequencing use of shorter reads, the algorithms developed for alignment have been crucial to the widespread adoption of Next-Generation Sequencing (NGS). There has been great progress in the development of a variety of different algorithms for different purposes. Researchers are now able to use sensitive and efficient alignment algorithms for a wide variety of applications, including genome-wide variation studies, quantitative RNA-seq expression analyses, the study of secondary RNA structure, microRNA discovery, identification of protein-binding sites using ChIP-sequencing, recognizing histone modification patterns for epigenetic studies, simultaneous alignment of multiple genomes for comparative genomics, and the assembly of de novo genomes and transcriptomes. In clinical settings, alignment to reference genomes has led to rapid pathogen discovery, identification of causative mutations for rare genetic diseases, detection of chromosomal abnormalities in tumor genomes, and many other advances which similarly depend on rapid and cost-effective genome-wide sequencing.
For the entire collection see [Zbl 1383.68005].


68W32 Algorithms on strings
92D10 Genetics and epigenetics
92D20 Protein sequences, DNA sequences
Full Text: DOI


[1] Dalca, A.V., Brudno, M.: Genome variation discovery with high-throughput sequencing data. Brief. Bioinform. 11(1), 3-14 (2010)
[2] Engstrom, P.G., et al.: Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods. 10(12), 1185-1191 (2013)
[3] Zhong, C., Zhang, S.: Efficient alignment of RNA secondary structures using sparse dynamic programming. BMC Bioinformatics. 14, 269 (2013)
[4] Sun, Z., et al.: CAP-miRSeq: a comprehensive analysis pipeline for microRNA sequencing data. BMC Genomics. 15, 423 (2014)
[5] Johnson, D.S., et al.: Genome-wide mapping of in vivo protein-DNA interactions. Science. 316(5830), 1497-1502 (2007)
[6] Hong, C., et al.: Probabilistic alignment leads to improved accuracy and read coverage for bisulfite sequencing data. BMC Bioinformatics. 14, 337 (2013)
[7] Kim, J., Ma, J.: PSAR-align: improving multiple sequence alignment using probabilistic sampling. Bioinformatics. 30(7), 1010-1012 (2014)
[8] Li, R., et al.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20(2), 265-272 (2010)
[9] Naccache, S.N., et al.: A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res. 24(7), 1180-1192 (2014)
[10] Ng, B.G., et al.: Mosaicism of the UDP-galactose transporter SLC35A2 causes a congenital disorder of glycosylation. Am. J. Hum. Genet. 92(4), 632-636 (2013)
[11] Green, R.C., et al.: Exploring concordance and discordance for return of incidental findings from clinical sequencing. Genet. Med. 14(4), 405-410 (2012)
[12] Goh, V., et al.: Next-generation sequencing facilitates the diagnosis in a child with twinkle mutations causing cholestatic liver failure. J. Pediatr. Gastroenterol. Nutr. 54(2), 291-294 (2012)
[13] Schroder, J., et al.: Socrates: identification of genomic rearrangements in tumour genomes by re-aligning soft clipped reads. Bioinformatics. 30(8), 1064-1072 (2014)
[14] Rizzo, J.M., Buck, M.J.: Key principles and clinical applications of “next-generation” DNA sequencing. Cancer Prev. Res. (Phila.) 5(7), 887-900 (2012)
[15] Shang, J., et al.: Evaluation and comparison of multiple aligners for next-generation sequencing data analysis. Biomed. Res. Int. 2014, 16 (2014)
[16] Metzker, M.L.: Sequencing technologies—the next generation. Nat. Rev. Genet. 11(1), 31-46 (2010)
[17] Lander, E.S.: Initial impact of the sequencing of the human genome. Nature. 470(7333), 187-197 (2011)
[18] Li, H., Homer, N.: A survey of sequence alignment algorithms for next-generation sequencing. Brief. Bioinform. 11(5), 473-483 (2010)
[19] Li, R., et al.: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 25(15), 1966-1967 (2009)
[20] Margulies, M., et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature. 437(7057), 376-380 (2005)
[21] David, M., et al.: SHRiMP2: Sensitive yet Practical Short Read Mapping. Bioinformatics. 27(7), 1011-1012 (2011)
[22] Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 25(14), 1754-1760 (2009)
[23] Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), R25 (2009)
[24] Bentley, D.R., et al.: Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 456(7218), 53-59 (2008)
[25] Smith, A.D., Xuan, Z., Zhang, M.Q.: Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics. 9(128), 128 (2008)
[26] Hoffmann, S., et al.: Fast mapping of short sequences with mismatches, insertions and deletions using index structures. PLoS Comput. Biol. 5(9), e1000502 (2009)
[27] Ondov, B.D., et al.: Efficient mapping of applied biosystems SOLiD sequence data to a reference genome for functional genomic applications. Bioinformatics. 24(23), 2776-2777 (2008)
[28] Kim, D., et al.: TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14(4), R36 (2013)
[29] Rothberg, J.M., et al.: An integrated semiconductor device enabling non-optical genome sequencing. Nature. 475(7356), 348-352 (2011)
[30] Quail, M.A., et al.: A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics. 13, 341 (2012)
[31] Novocraft Technologies: Novoalign 30 June 2014. Available from: http://www.novocraft.com/main/index.php (2014). Accessed 20 September 2014
[32] Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods. 9(4), 357-359 (2012)
[33] Otto, C., Stadler, P.F., Hoffmann, S.: Lacking alignments? The next-generation sequencing mapper segemehl revisited. Bioinformatics. 30(13), 1837-1843 (2014)
[34] Caboche, S., et al.: Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data. BMC Genomics. 15, 264 (2014)
[35] Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 8 (1990)
[36] Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195-197 (1981)
[37] Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics. 18(3), 440-445 (2002)
[38] Ruffalo, M., LaFramboise, T., Koyutürk, M.: Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics. 27(20), 2790-2796 (2011)
[39] Cao, X., Cheng, L.S., Tung, A.K.H.: Indexing DNA sequences using q-Grams. DASFAA, Lecture Notes in Computer Science, vol. 3453: p. 13 (2005)
[40] Weese, D., et al.: RazerS—fast read mapping with sensitivity control. Genome Res. 19(9), 1646-1654 (2009)
[41] Ferragina, P., Manzini, G.: Opportunistic data structures with applications. Proceedings of the 41st symposium on foundations of computer science, Redondo Beach, CA, USA, p. 9. (2000)
[42] Liu, Y., Schmidt, B., Maskell, D.L.: CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform. Bioinformatics. 28(14), 1830-1837 (2012)
[43] Santana-Quintero, L., et al.: HIVE-hexagon: high-performance, parallelized sequence alignment for next-generation sequencing data analysis. PLoS One. 9(6), e99033 (2014)
[44] Li, H., Durbin, R.: Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 26(5), 589-595 (2010)
[45] Lindner, R., Friedel, C.C.: A comprehensive evaluation of alignment algorithms in the context of RNA-Seq. PLoS One. 7(12), e52403 (2012)
[46] Wu, T.D., Nacu, S.: Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics. 26(7), 873-881 (2010)
[47] Wang, K., et al.: MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 38(18), e178 (2010)
[48] Dobin, A., et al.: STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29(1), 15-21 (2013)
[49] Kertesz, M., et al.: Genome-wide measurement of RNA secondary structure in yeast. Nature. 467(7311), 103-107 (2010)
[50] Underwood, J.G., et al.: FragSeq: transcriptome-wide RNA structure probing using high-throughput sequencing. Nat. Methods. 7(12), 995-1001 (2010)
[51] Lucks, J.B., et al.: Multiplexed RNA structure characterization with selective 2’-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq). Proc. Natl. Acad. Sci. U. S. A. 108(27), 11063-11068 (2011)
[52] Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18, 1245-1262 (1989) · Zbl 0692.68047
[53] Jiang, T., Wang, L., Zhang, K.: Alignment of trees-an alternative to tree edit. Theor. Comput. Sci. 143, 137-148 (1995) · Zbl 0873.68150
[54] Hochsmann, M., Toller, T., Giergerich, R., Kurtz, S.: Local similarity in RNA secondary structures. In: Proceedings of the 2nd IEEE Computer Society Bioinformatics Conference, Washington DC, (2003). pp. 159-168
[55] Li, Y., et al.: Performance comparison and evaluation of software tools for microRNA deep-sequencing data analysis. Nucleic Acids Res. 40(10), 4298-4305 (2012)
[56] Krueger, F., Andrews, S.R.: Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics. 27(11), 1571-1572 (2011)
[57] Xi, Y., Li, W.: BSMAP: whole genome bisulfite sequence MAPping program. BMC Bioinformatics. 10, 232 (2009)
[58] Coarfa, C., et al.: Pash 3.0: A versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing. BMC Bioinformatics. 11, 572 (2010)
[59] Lim, J.Q., et al.: BatMeth: improved mapper for bisulfite sequencing reads on DNA methylation. Genome Biol. 13(10), R82 (2012)
[60] Chen, P.Y., Cokus, S.J., Pellegrini, M.: BS Seeker: precise mapping for bisulfite sequencing. BMC Bioinformatics. 11, 203 (2010)
[61] Kunde-Ramamoorthy, G., et al.: Comparison and quantitative verification of mapping algorithms for whole-genome bisulfite sequencing. Nucleic Acids Res. 42(6), e43 (2014)
[62] Schatz, M.C., Langmead, B., Salzberg, S.L.: Cloud computing and the DNA data race. Nat. Biotechnol. 28(7), 691-693 (2010)
[63] Maji, R.K., et al.: PVT: an efficient computational procedure to speed up next-generation sequence analysis. BMC Bioinformatics. 15, 167 (2014)
[64] Onsongo, G., et al.: Implementation of cloud based next generation sequencing data analysis in a clinical laboratory. BMC Res. Notes. 7, 314 (2014)
[65] Reid, J.G., et al.: Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline. BMC Bioinformatics. 15(1), 30 (2014)
[66] Oldach, L.: Edico genome makes first sale of NGS processor. In: Bio-IT World, Cambridge Healthtech Institute, 2014
[67] Kalari, K.R., et al.: MAP-RSeq: Mayo Analysis Pipeline for RNA sequencing. BMC Bioinformatics. 15(1), 224 (2014)
[68] Chin, C.-S., et al.: Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods. 10(6), 563-569 (2013)
[69] English, A.C., et al.: Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology. PLoS One. 7(11), e47768 (2012)
[70] Branton, D., et al.: The potential and challenges of nanopore sequencing. Nat. Biotechnol. 26(10), 1146-1153 (2008)
[71] Laszlo, A.H., et al.: Decoding long nanopore sequencing reads of natural DNA. Nat. Biotechnol. 32(8), 829-833 (2014)
[72] Ummat, A.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.