## Window-accumulated subsequence matching problem is linear.(English)Zbl 0998.68042

Summary: Given two strings, text $$t$$ of length $$n$$, and pattern $$p= p_1\dots p_k$$ of length $$k$$, and given a natural number $$w$$, the subsequence matching problem consists in finding the number of size $$w$$ windows of text $$t$$ which contain pattern $$p$$ as a subsequence, i.e. the letters $$p_1,\dots, p_k$$ occur in the window, in the same order as in $$p$$, but not necessarily consecutively (they may be interleaved with other letters). Subsequence matching is used for finding frequent patterns and association rules in databases. We generalize the Knuth-Morris-Pratt pattern matching algorithm; we define a non-conventional kind of RAM, the MP-RAMs which model more closely the microprocessor operations; we design an $$O(n)$$ on-line algorithm for solving the subsequence matching problem on MP-RAMs.

### MSC:

 68P10 Searching and sorting 68P15 Database theory

### Keywords:

subsequence matching
Full Text:

### References:

 [1] Aho, A., Algorithms for finding patterns in strings, (), 255-300 · Zbl 0900.68249 [2] Aho, A.; Hopcroft, J.; Ullman, J., The design and analysis of computer algorithms, (1974), Addison-Wesley London [3] Boasson, L.; Cegielski, P.; Guessarian, I.; Matiyasevich, Y., (), 327-336 [4] Baeza-Yates, R.; Gonnet, G., A new approach to text searching, Comm. ACM, 3, 74-82, (1992) [5] Baeza-Yates, R.; Navarro, G., (), 1-23 [6] Ben-Amram, A.; Galil, Z., On the power of the shift instruction, Inform and comput., 117, 19-36, (1995) · Zbl 0828.68074 [7] Cook, S., Linear time simulation of deterministic two-way pushdown automata, (), 75-80 [8] Crochemore, M., (), 44-58 [9] Das, G.; Fleischer, R.; Ga̧sienic, L.; Gunopoulos, D.; Kärkkäinen, J., (), 12-27 [10] Galil, Z., String matching in real time, J. ACM, 28, 134-149, (1981) · Zbl 0454.68009 [11] D.E. Knuth, Big omicron and big omega and big theta, SIGACT News (1976) 18-24. [12] Knuth, D.; Morris, J.; Pratt, V., Fast pattern matching in strings, SIAM J. comput., 6, 2, 323-350, (1977) · Zbl 0372.68005 [13] Kucherov, G.; Rusinovitch, M., Matching a set of strings with variable length don’t cares, Theoret. comput. sci., 178, 129-154, (1997) · Zbl 0901.68037 [14] Manber, U.; Baeza-Yates, R., An algorithm for string matching with a sequence of don’t cares, Inform. process. lett., 37, 133-136, (1991) · Zbl 0713.68026 [15] Mannila, H., (), 41-55 [16] H. Mannila, H. Toivonen, A. Verkamo, Discovering frequent episodes in sequences, Proc. 1995 KDD Conf., 1995, pp. 210-215. [17] Y. Matiyasevich, Real-time recognition of the inclusion relation, Zapiski Nauchnykh Leningradskovo Otdeleniya Mat. Inst. Steklova Akad. Nauk SSSR 20 (1971) 104-114. (Translated into English, J. Soviet Math. 1 (1973) 64-70). [18] V. Pratt, M. Rabin, L. Stockmeyer, A charaterization of the power of vector machines, Proc. SToC 74, pp. 122-134. [19] Slissenko, A., (), 493-496 [20] Trahan, J.; Loui, M.; Ramachandran, V., Multiplication, division and shift instructions in parallel random access machines, Theoret. comput. sci., 100, 1-44, (1992) · Zbl 0780.68035 [21] Ukkonen, E., On-line construction of suffix-trees, Algorithmica, 14, 249-260, (1995) · Zbl 0831.68027 [22] Wu, S.; Manber, U., Fast text searching, Comm. ACM, 3, 83-91, (1992)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.