×

A new taxonomy of sublinear right-to-left scanning keyword pattern matching algorithms. (English) Zbl 1209.68170

Summary: A new taxonomy of sublinear (multiple) keyword pattern matching algorithms is presented. Based on an earlier taxonomy by the second and third authors, this new taxonomy includes not only suffix-based algorithms, but also factor- and factor-oracle-based algorithms. In particular, we show how suffix-based (Commentz-Walter like), factor- and factor-oracle-based sublinear keyword pattern matching algorithms can be seen as instantiations of a general sublinear algorithm skeleton. During processing, such algorithms shift or jump through the text in a forward or left-to-right direction, and read backward or right-to-left starting from positions in the text, i.e. they read suffixes of certain prefixes of the text. They use finite automata for efficient computation of string membership in a certain language. In addition, we show shift functions defined for the suffix-based algorithms to be reusable for factor- and factor-oracle-based algorithms. The taxonomy is based on deriving the algorithms from a common starting point by adding algorithm and problem details, to arrive at efficient or well-known algorithms. Such a presentation provides correctness arguments for the algorithms as well as clarity on how the algorithms are related to one another. In addition, it is helpful in the construction of a toolkit of the algorithms.

MSC:

68P10 Searching and sorting
68Q45 Formal languages and automata
68W05 Nonnumerical algorithms
68T10 Pattern recognition, speech recognition

Software:

SPARE Parts
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Aho, A. V.: Algorithms for finding patterns in strings, Handbook of theoretical computer science, 255-300 (1990) · Zbl 0900.68249
[2] Aho, A. V.; Corasick, M. J.: Efficient string matching: an aid to bibliographic search, Communications of the ACM 18, 333-340 (1975) · Zbl 0301.68048
[3] Allauzen, C.; Crochemore, M.; Raffinot, M.: Efficient experimental string matching by weak factor recognition, Lncs 2089 (2001) · Zbl 0992.68501
[4] C. Allauzen, M. Raffinot, Oracle des facteurs d’un ensemble de mots, Tech. Rep. 99-11, Institut Gaspard-Monge, Université de Marne-la-Vallée, 1999. · Zbl 0964.68078
[5] Apostolico, A.; Galil, Z.: Pattern matching algorithms, (1997) · Zbl 0874.68006
[6] G. Barla-Szabo, A taxonomy of graph representations, Master’s Thesis, Department of Computer Science, University of Pretoria, November 2002.
[7] Boyer, R. S.; Moore, J. S.: A fast string searching algorithm, Communications of the ACM 20, No. 10, 62-72 (1977) · Zbl 1219.68165
[8] Cleophas, L.; Watson, B. W.; Zwaan, G.: Automaton-based sublinear keyword pattern matching, Lncs 3246 (2004) · Zbl 1111.68428
[9] L. Cleophas, B.W. Watson, G. Zwaan, A new taxonomy of sublinear keyword pattern matching algorithms, Tech. Rep. 04/07, Department of Mathematics and Computer Science, Technische Universiteit Eindhoven, March 2004. · Zbl 1209.68170
[10] L. Cleophas, G. Zwaan, B.W. Watson, Constructing factor oracles, Tech. Rep. 04/01, Department of Mathematics and Computer Science, Technische Universiteit Eindhoven, January 2004. · Zbl 1135.68453
[11] Cleophas, L.; Zwaan, G.; Watson, B. W.: Constructing factor oracles, Journal of automata, languages and combinatorics 10, No. 5/6, 627-640 (2005) · Zbl 1135.68453
[12] L.G.W.A. Cleophas, Towards SPARE time: a new taxonomy and toolkit of keyword pattern matching algorithms, Master’s Thesis, Department of Mathematics and Computer Science, Technische Universiteit Eindhoven, August 2003.
[13] L.G.W.A. Cleophas, Tree algorithms: two taxonomies and a toolkit, Ph.D. Thesis, Department of Mathematics and Computer Science, Eindhoven University of Technology, April 2008.
[14] Commentz-Walter, B.: A string matching algorithm fast on the average, Proceedings of the 6th international colloquium on automata, languages and programming (1979) · Zbl 0407.68092
[15] B. Commentz-Walter, A string matching algorithm fast on the average, Tech. Rep. TR 79.09.007, IBM Germany, Heidelberg Scientific Center, 1979. · Zbl 0407.68092
[16] Crochemore, M.; Czumaj, A.; Ga\ogonek sieniec, L.; Jarominek, S.; Lecroq, T.; Plandowski, W.; Rytter, W.: Speeding up two string matching algorithms, Algorithmica 12, No. 4/5, 247-267 (1994) · Zbl 0942.68574
[17] Crochemore, M.; Czumaj, A.; Ga\ogonek sieniec, L.; Lecroq, T.; Plandowski, W.; Rytter, W.: Fast practical multi-pattern matching, Information processing letters 71, No. 3–4, 107-113 (1999) · Zbl 0999.68246
[18] Crochemore, M.; Hancart, C.: Automata for matching patterns, Handbook of formal languages 2 (1997)
[19] Crochemore, M.; Hancart, C.; Lecroq, T.: Algorithms on strings, (2007) · Zbl 1137.68060
[20] Crochemore, M.; Rytter, W.: Jewels of stringology – text algorithms, (2003) · Zbl 1078.68151
[21] Dijkstra, E. W.: A discipline of programming, (1976) · Zbl 0368.68005
[22] Dijkstra, E. W.; Scholten, C. S.: Predicate calculus and program semantics, (1990) · Zbl 0698.68011
[23] Fan, J. -J.; Su, K. -Y.: An efficient algorithm for matching multiple patterns, IEEE transactions knowledge and data engineering 5, 339-351 (1993)
[24] Fan, J. -J.; Su, K. -Y.: An efficient algorithm for matching multiple patterns, Computer algorithms: string pattern matching strategies, 91-104 (1994)
[25] Horspool, R. N.: Practical fast searching in strings, Software–practice experience 10, No. 6, 501-506 (1980)
[26] H.B.M. Jonkers, Abstraction, specification and implementation techniques, with an application to garbage collection, Tech. Rep. 166, Mathematisch Centrum, Amsterdam, 1983. · Zbl 0513.68008
[27] Knuth, D. E.; Morris, J. H.; Pratt, V. R.: Fast pattern matching in strings, SIAM journal of computing 6, No. 2, 323-350 (1977) · Zbl 0372.68005
[28] A. Mancheron, C. Moan, Combinatorial characterization of the language recognized by factor and suffix oracles, in: Proceedings of the Prague Stringology Conference 2004, Department of Computer Science and Engineering, Czech Technical University, Prague, 2004. · Zbl 1105.68024
[29] , Merriam webster’s collegiate dictionary (1993)
[30] Navarro, G.; Raffinot, M.: Fast and flexible string matching by combining bit-parallelism and suffix automata, ACM journal of experimental algorithmics 5, No. 4 (2000) · Zbl 1071.68563
[31] Navarro, G.; Raffinot, M.: Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences, (2002) · Zbl 0992.92029
[32] Smyth, W.: Computing patterns in strings, (2003)
[33] Sunday, D. M.: A very fast substring search algorithm, Communications of the ACM 33, No. 8, 132-142 (1990)
[34] J.P.H.W. van den Eijnde, Program derivation in acyclic graphs and related problems, Tech. Rep. 92/04, Faculty of Computing Science, Technische Universiteit Eindhoven, 1992.
[35] B.W. Watson, Taxonomies and toolkits of regular language algorithms, Ph.D. Thesis, Faculty of Computing Science, Technische Universiteit Eindhoven, September 1995. · Zbl 0832.68064
[36] B.W. Watson, A new family of Commentz-Walter-style multiple-keyword pattern matching algorithms, in: Proceedings of the Prague Stringology Club Workshop 2000, Department of Computer Science and Engineering, Czech Technical University, Prague, 2000.
[37] Watson, B. W.; Cleophas, L.: SPARE parts: a C++ toolkit for string pattern recognition, Software–practice experience 34, No. 7, 697-710 (2004)
[38] Watson, B. W.; Zwaan, G.: A taxonomy of sublinear multiple keyword pattern matching algorithms, Science of computer programming 27, No. 2, 85-118 (1996) · Zbl 0858.68026
[39] S. Wu, U. Manber, A fast algorithm for multi-pattern searching, Tech. Rep. TR-94-17, Department of Computer Science, University of Arizona, Tucson, AZ, 1994. · Zbl 0807.68037
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.