×

Computing the Burrows-Wheeler transform of a string and its reverse in parallel. (English) Zbl 1284.68705

Summary: The contribution of this article is twofold. First, we provide new theoretical insights into the relationship between a string and its reverse: If the Burrows-Wheeler transform (BWT) of a string has been computed by sorting its suffixes, then the BWT, the suffix array, and the longest common prefix array of the reverse string can be derived from it without suffix sorting. Furthermore, we show that the longest common prefix arrays of a string and its reverse are permutations of each other. Second, we provide a parallel algorithm that, given the BWT of a string, computes the BWT of its reverse much faster than all known (parallel) suffix sorting algorithms. Some bioinformatics applications will benefit from this.

MSC:

68W32 Algorithms on strings
68P20 Information storage and retrieval of data
68P10 Searching and sorting

Software:

mkESA; BWA
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Beller, T.; Gog, S.; Ohlebusch, E.; Schnattinger, T., Computing the longest common prefix array based on the Burrows-Wheeler transform, (Proc. 18th International Symposium on String Processing and Information Retrieval. Proc. 18th International Symposium on String Processing and Information Retrieval, Lecture Notes in Computer Science, vol. 7024 (2011), Springer-Verlag: Springer-Verlag Berlin), 197-208
[2] Burrows, M.; Wheeler, D. J., A block-sorting lossless data compression algorithm (1994), Digital Systems Research Center, Research Report 124
[3] Culpepper, J. S.; Navarro, G.; Puglisi, S. J.; Turpin, A., Top-k ranked document search in general text databases, (Proc. 18th Annual European Symposium on Algorithms. Proc. 18th Annual European Symposium on Algorithms, Lecture Notes in Computer Science, vol. 6347 (2010), Springer-Verlag: Springer-Verlag Berlin), 194-205 · Zbl 1287.68035
[4] Ferragina, P.; Manzini, G., Opportunistic data structures with applications, (Proc. IEEE Symposium on Foundations of Computer Science (2000)), 390-398
[5] Flick, P.; Birney, E., Sense from sequence reads: Methods for alignment and assembly, Nature Methods, 6, 11 Suppl., S6-S12 (2009)
[6] Futamura, N.; Aluru, S.; Kurtz, S., Parallel suffix sorting, (Proc. 9th International Conference on Advanced Computing and Communications (2001), IEEE), 76-81
[7] Gog, S., Compressed suffix trees: design, construction, and applications (2011), University of Ulm: University of Ulm Germany, PhD thesis
[8] Gog, S.; Ohlebusch, E., Compressed suffix trees: Efficient computation and storage of LCP-values, Journal of Experimental Algorithmics, 18, 2.1:2.1-2.1:2.31 (2013) · Zbl 1322.68253
[9] Grossi, R.; Gupta, A.; Vitter, J. S., High-order entropy-compressed text indexes, (Proc. 14th Annual ACM-SIAM Symposium on Discrete Algorithms (2003)), 841-850 · Zbl 1092.68584
[10] Gusfield, D., Algorithms on Strings, Trees, and Sequences (1997), Cambridge University Press: Cambridge University Press New York · Zbl 0934.68103
[11] Homann, R.; Fleer, D.; Giegerich, R.; Rehmsmeier, M., mkESA: Enhanced suffix array construction tool, Bioinformatics, 25, 8, 1084-1085 (2009)
[12] Kärkkäinen, J.; Manzini, G.; Puglisi, S. J., Permuted longest-common-prefix array, (Proc. 20th Annual Symposium on Combinatorial Pattern Matching. Proc. 20th Annual Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science, vol. 5577 (2009), Springer-Verlag: Springer-Verlag Berlin), 181-192 · Zbl 1247.68336
[13] Kasai, T.; Lee, G.; Arimura, H.; Arikawa, S.; Park, K., Linear-time longest-common-prefix computation in suffix arrays and its applications, (Proc. 12th Annual Symposium on Combinatorial Pattern Matching. Proc. 12th Annual Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science, vol. 2089 (2001), Springer-Verlag: Springer-Verlag Berlin), 181-192 · Zbl 0990.68639
[14] Lam, T.-W.; Li, R.; Tam, A.; Wong, S.; Wu, E.; Yiu, S.-M., High throughput short read alignment via bi-directional BWT, (Proc. International Conference on Bioinformatics and Biomedicine (2009), IEEE Computer Society), 31-36
[15] Langmead, B.; Trapnell, C.; Pop, M.; Salzberg, S. L., Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome Biology, 10, R25 (2009)
[16] Li, H.; Durbin, R., Fast and accurate short read alignment with Burrows-Wheeler Transform, Bioinformatics, 25, 14, 1754-1760 (2009)
[17] Manzini, G., Two space saving tricks for linear time LCP array computation, (Proc. 9th Scandinavian Workshop on Algorithm Theory. Proc. 9th Scandinavian Workshop on Algorithm Theory, Lecture Notes in Computer Science, vol. 3111 (2004), Springer-Verlag: Springer-Verlag Berlin), 372-383 · Zbl 1095.68755
[18] Ohlebusch, E.; Beller, T.; Abouelhoda, M. I., Computing the Burrows-Wheeler transform of a string and its reverse, (Proc. 23rd Annual Symposium on Combinatorial Pattern Matching. Proc. 23rd Annual Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science, vol. 7354 (2012), Springer-Verlag: Springer-Verlag Berlin), 243-256 · Zbl 1358.68346
[19] Puglisi, S. J.; Smyth, W. F.; Turpin, A., A taxonomy of suffix array construction algorithms, ACM Computing Surveys, 39, 2, 1-31 (2007)
[20] Schnattinger, T.; Ohlebusch, E.; Gog, S., Bidirectional search in a string with wavelet trees, (Proc. 21st Annual Symposium on Combinatorial Pattern Matching. Proc. 21st Annual Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science, vol. 6129 (2010), Springer-Verlag: Springer-Verlag Berlin), 40-50 · Zbl 1286.68533
[21] Simpson, J. T.; Durbin, R., Efficient construction of an assembly string graph using the FM-index, Bioinformatics, 26, 12, i367-i373 (2010)
[22] Trapnell, C.; Salzberg, S. L., How to map billions of short reads onto genomes, Nature Biotechnology, 27, 5 (2009)
[23] Välimäki, N.; Ladra, S.; Mäkinen, V., Approximate all-pairs suffix/prefix overlaps, (Proc. 21st Annual Symposium on Combinatorial Pattern Matching. Proc. 21st Annual Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science, vol. 6129 (2010), Springer-Verlag: Springer-Verlag Berlin), 76-87 · Zbl 1286.68535
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.