Clifford, Raphaël Distributed suffix trees. (English) Zbl 1101.68502 J. Discrete Algorithms 3, No. 2-4, 176-197 (2005). Summary: We present a new variant of the suffix tree called a Distributed Suffix Tree (DST) which allows for much larger databases of strings to be handled efficiently. The method is based on a new linear time construction algorithm for subtrees of a suffix tree. The new data structure tackles the memory bottleneck problem by constructing these subtrees independently and in parallel. It is designed for distributed memory parallel computing environments (e.g., Beowulf clusters). The central advantage is that standard operations of biological importance on suffix trees are shown to be easily translatable to this new data structure. While none of these operations on the DST require inter-process communication, many have optimal expected parallel running times. Cited in 1 Document MSC: 68P10 Searching and sorting 68W10 Parallel algorithms in computer science Keywords:distributed computing; string matching; suffix trees; bioinformatics Software:REPuter; MUMMER PDFBibTeX XMLCite \textit{R. Clifford}, J. Discrete Algorithms 3, No. 2--4, 176--197 (2005; Zbl 1101.68502) Full Text: DOI References: [1] Andersson, A.; Larsson, N. J.; Swanson, K., Suffix trees on words, Algorithmica, 23, 3, 246-260 (1999) · Zbl 0921.68021 [2] Andersson, A.; Nilsson, S., Improved behaviour of tries by adaptive branching, Inform. Process. Lett., 46, 293-300 (1993) · Zbl 0776.68035 [3] Apostolico, A., The myriad virtues of subword trees, (Apostolico, A.; Galil, Z., Combinatorial Algorithms on Words. Combinatorial Algorithms on Words, NATO ASI Series, vol. F12 (1985), Springer-Verlag), 85-96 · Zbl 0572.68067 [4] Burkhardt, S.; Kärkkäinen, J., Fast lightweight suffix array construction and checking, (Baeza-Yates, R. A.; Chávez, E.; Crochemore, M., Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching. Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science, vol. 2676 (2003)), 55-69 · Zbl 1279.68065 [5] Chang, W. I.; Lawler, E. L., Sublinear expected time approximate string matching and biological applications, Algorithmica, 12, 327-344 (1994) · Zbl 0942.68575 [6] Clark, D.; Munro, J. I., Efficient suffix trees on secondary storage, (Proceedings of the 7th Annual ACM-SIAM Symposium on Discrete algorithms (1996)), 383-391 · Zbl 0847.68030 [7] Delcher, A.; Kasif, S.; Fleischmann, R.; Peterson, J.; White, O.; Salzberg, S., Alignment of whole genomes, Nucleic Acids Research, 27, 11, 2369-2376 (1999) [8] Dorohonceanu, B.; Nevill-Manning, C., Accelerating protein classification using suffix trees, (Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB) (2000)), 126-133 [9] Ferragina, P.; Grossi, R., A fully-dynamic data structure for external substring search, (Proceedings of the 27th Annual ACM Symposium on Theory of Computing, Las Vegas, Nevada (1995)), 693-702 · Zbl 0978.68513 [10] Ferragina, P.; Grossi, R., Fast string searching in secondary storage: Theoretical developments and experimental results, (Proceedings of the 7th Annual ACM-SIAM Symposium on Discrete Algorithms, Atlanta, Georgia (1996)), 373-382 · Zbl 0852.68017 [11] Ferragina, P.; Grossi, R., The string B-Tree: a new data structure for string search in external memory and its applications, J. ACM, 46, 2, 238-280 (1999) · Zbl 1065.68518 [12] Giegerich, R.; Kurtz, S., From Ukkonen to McCreight and Weiner: A unifying view of linear-time suffix tree construction, Algorithmica (1997) · Zbl 0895.68056 [13] Gusfield, D., Algorithms on Strings, Trees and Sequences. Computer Science and Computational Biology (1997), Cambridge University Press · Zbl 0934.68103 [14] Gusfield, D.; Landau, G. M.; Schieber, D., An efficient algorithm for the all pairs suffix-prefix problem, Inform. Process. Lett., 41, 181-185 (1992) · Zbl 0748.68021 [15] Hariharan, R., Optimal parallel suffix tree construction, (Proceedings of the 26th ACM Symposium on Theory of Computing (1994)), 290-299 · Zbl 1345.68302 [16] Kärkkäinen, J., Suffix cactus: a cross between suffix tree and suffix array, (Galil, Z.; Ukkonen, E., Proceedings of the 6th Annual Symposium on Combinatorial Pattern Matching. Proceedings of the 6th Annual Symposium on Combinatorial Pattern Matching, Lecture Notes in Comput. Sci., vol. 937 (1995), Springer-Verlag), 191-204 [17] Kärkkäinen, J.; Ukkonen, E., Sparse suffix trees, (COCOON ’96, Hong Kong. COCOON ’96, Hong Kong, Lecture Notes in Comput. Sci., vol. 1090 (1996)), 219-230 [18] S. Kurtz, Reducing the space requirement of suffix trees, Report 98-03, Technical Report, Technische Fakultat, Universitat Bielefeld, 1998; S. Kurtz, Reducing the space requirement of suffix trees, Report 98-03, Technical Report, Technische Fakultat, Universitat Bielefeld, 1998 [19] Kurtz, S.; Schleiermacher, C., Reputer: Fast computation of maximal repeats in complete genomes, Bioinformatics, 15, 5, 426-427 (1999) [20] Manber, U.; Myers, G., Suffix arrays: a new method for on-line string searches, (Proceedings of the 1st Annual ACM-SIAM symposium on Discrete algorithms (1990), Society for Industrial and Applied Mathematics), 319-327 · Zbl 0800.68364 [21] McCreight, E. M., A space-economical suffix tree construction algorithm, J. ACM, 23, 262-272 (1976) · Zbl 0329.68042 [22] Ukkonen, E., On-line construction of suffix-trees, Algorithmica, 14, 249-260 (1995) · Zbl 0831.68027 [23] Waterman, M. S., Introduction to Computational Biology: Maps, Sequences and Genomes (1995), Chapman and Hall · Zbl 0831.92011 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.