×

Engineering a lightweight external memory suffix array construction algorithm. (English) Zbl 1377.68328

In this article, the authors show how to use delayed merging to reduce from \(O (n^2 / (M B))\) blocks to \(O (n^2 / (M B \log_\sigma n) + n / B)\) blocks the I/O complexity of Ferragina et al.’s scan-based external-memory suffix-array construction, where \(n\) is the length of the text, \(M\) is the size of RAM in words, \(B\) is the size of a block in words and \(\sigma\) is the alphabet size. They also show how to apply techniques developed since the publication of Ferragina et al.’s algorithm, to reduce the running time from \(O((n^2 / M) \log \sigma)\) to \(O ((n^2 / M) \log (\log (\sigma) / \log \log n)\).
Finally, they show experimentally that their algorithm is competitive with what was the state of the art when this article was submitted. While it was in press, however, several new algorithms were developed, and the authors et al.’s paper [“Engineering external memory induced suffix sorting”, in: Proceedings of the 19th workshop on algorithm engineering and experiments, Alenex’17. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM). 98–108 (2017; doi:10.1137/1.9781611974768.8)] is probably now a better reference for most purposes.

MSC:

68W32 Algorithms on strings

Software:

libdivsufsort
PDF BibTeX XML Cite
Full Text: DOI Link

References:

[1] Abouelhoda, MI; Kurtz, S; Ohlebusch, E, Replacing suffix trees with enhanced suffix arrays, J. Discrete Algorithms, 2, 53-86, (2004) · Zbl 1115.92303
[2] Barbay, J., Gagie, T., Navarro, G., Nekrich, Y.: Alphabet partitioning for compressed rank/select and applications. In: Proc. 21st Int. Symp. Algorithms and Computation (ISAAC), volume 6507 of Lect. Notes Computer Sci., pp. 315-326. Springer (2010) · Zbl 1310.68060
[3] Belazzougui, D., Navarro, G.: New lower and upper bounds for representing sequences. In: Proc. 20th Eur. Symp. Algorithms (ESA), volume 7501 of Lect. Notes Computer Sci., pp. 181-192. Springer (2012) · Zbl 1365.68260
[4] Beller, T., Zwerger, M., Gog, S., Ohlebusch, E.: Space-efficient construction of the burrows-wheeler transform. In: Proc. 20th Symp. String Processing and Inf. Retr. (SPIRE), volume 8214 of Lect. Notes Computer Sci., pp. 5-16. Springer (2013)
[5] Bingmann, T., Fischer, J., Osipov, V.: Inducing suffix and LCP arrays in external memory. In Proc. 15th Meet. Algorithm Eng. and Exp. (ALENEX), pp. 103-112. SIAM (2013) · Zbl 1365.68169
[6] Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, Palo Alto, California (1994) · Zbl 0995.68032
[7] Crauser, A; Ferragina, P, A theoretical and experimental study on the construction of suffix arrays in external memory, Algorithmica, 32, 1-35, (2002) · Zbl 0995.68032
[8] Crochemore, M, String-matching on ordered alphabets, Theor. Comput. Sci., 92, 33-47, (1992) · Zbl 0747.68021
[9] da Louza, F.A., Telles, G.P., de Aguiar Ciferri, C.D.: External memory generalized suffix and LCP arrays construction. In: Proc. 24th Symp. Comb. Pattern Matching (CPM), volume 7922 of Lect. Notes Computer Sci., pp. 201-210. Springer (2013) · Zbl 1381.68072
[10] Dementiev, R., Kärkkäinen, J., Mehnert, J., Sanders, P.: Better external memory suffix array construction. ACM J. Exp. Algorithmics 12, Article 3.4 (2008) · Zbl 1365.68178
[11] Ferragina, P; Gagie, T; Manzini, G, Lightweight data indexing and compression in external memory, Algorithmica, 63, 707-730, (2012) · Zbl 1241.68062
[12] Ferragina, P; Manzini, G, Indexing compressed text, J. ACM, 52, 552-581, (2005) · Zbl 1323.68261
[13] Gonnet, GH; Baeza-Yates, RA; Snider, T; Frakes, WB (ed.); Baeza-Yates, R (ed.), New indices for text: pat trees and pat arrays, 66-82, (1992), Upper Saddle River
[14] Kärkkäinen, J, Fast BWT in small space by blockwise suffix sorting, Theor. Comput. Sci., 387, 249-257, (2007) · Zbl 1144.68021
[15] Kärkkäinen, J., Kempa, D.: LCP array construction in external memory. In: Proc. 13th Symp. Exp. Algorithmics (SEA), Lect. Notes Computer Sci. Springer, to appear (2014) · Zbl 1365.68183
[16] Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Lightweight Lempel-Ziv parsing. In: Proc. 12th Symp. Exp. Algorithmics (SEA), volume 7933 of Lect. Notes Computer Sci., pp. 139-150. Springer (2013)
[17] Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Lempel-Ziv parsing in external memory. In: Proc. Data Compression Conf. (DCC), pp. 153-162. IEEE CS (2014)
[18] Kärkkäinen, J., Kempa, D., Puglisi, S.J.: String range matching. In: Proc. 25th Symp. Comb. Pattern Matching (CPM), Lect. Notes Computer Sci. Springer, to appear (2014) · Zbl 1407.68576
[19] Kärkkäinen, J., Puglisi, S.J.: Fixed-block compression boosting in FM-indexes. In: Proc. 18th Symp. String Processing and Inf. Retr. (SPIRE), volume 7024 of Lect. Notes Computer Sci., pp. 174-184. Springer (2011) · Zbl 0747.68021
[20] Kärkkäinen, J; Sanders, P; Burkhardt, S, Linear work suffix array construction, J. ACM, 53, 918-936, (2006) · Zbl 1326.68111
[21] Knuth, DE; Morris, JH; Pratt, VR, Fast pattern matching in strings, SIAM J. Comput., 6, 323-350, (1977) · Zbl 0372.68005
[22] Manber, U; Myers, GW, Suffix arrays: a new method for on-line string searches, SIAM J. Comput., 22, 935-948, (1993) · Zbl 0784.68027
[23] Mori, Y.: libdivsufsort, a C library for suffix array construction. http://code.google.com/p/libdivsufsort/ · Zbl 0784.68027
[24] Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39(1), Article 2 (2007) · Zbl 1321.68263
[25] Nong, G.: Practical linear-time O(1)-workspace suffix sorting for constant alphabets. ACM Trans. Inf. Syst. 31(3), Article 15 (2013) · Zbl 0995.68032
[26] Ohlebusch, E.: Bioinformatics Algorithms: Sequence Analysis, Genome Rearrangements, and Phylogenetic Reconstruction. Oldenbusch Verlag, Bremen (2013) · Zbl 1295.92011
[27] Okanohara, D., Sadakane, K.: A linear-time Burrows-Wheeler transform using induced sorting. In: Proc. 16th Symp. String Processing and Inf. Retr. (SPIRE), volume 5721 of Lect. Notes Computer Sci., pp. 90-101. Springer (2009)
[28] Puglisi, S.J., Smyth, W.F., Turpin, A.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39(2), Article 4 (2007) · Zbl 1326.68111
[29] Tischler, G.: Faster average case low memory semi-external construction of the Burrows-Wheeler transform. In: Proc. 2nd Int. Conf. Algorithms for Big Data (ICABD), number 1146 in CEUR-WS Proceedings, pp. 61-68 (2014)
[30] Vitter, JS, Algorithms and data structures for external memory, Found. Trends Theor. Comput. Sci., 2, 305-474, (2006) · Zbl 1244.68007
[31] Williams, HE; Zobel, J, Compressing integers for fast file access, Comput. J., 42, 193-201, (1999)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.