×

Online recognition of dictionary with one gap. (English) Zbl 1496.68382

Summary: We formalize and examine the online Dictionary Recognition with One Gap problem (DROG) which is the following. Preprocess a dictionary \(D\) of \(d\) patterns each containing a special gap symbol that matches any string, so that given a text arriving online a character at a time, all patterns from \(D\) which are suffixes of the text that has arrived so far and have not been reported yet, are reported before the next character arrives. The gap symbols are associated with bounds determining possible lengths of matching strings. Online DROG captures the difficulty in a bottleneck procedure for cyber-security, as many digital signatures of viruses manifest themselves as patterns with a single gap.
Following the work on the closely related online Dictionary Matching with One Gap problem (DMOG), we provide algorithms whose time cost depends linearly on \(\delta(G_D)\), where \(G_D\) is a bipartite graph that captures the structure of \(D\) and \(\delta(G_D)\) is the degeneracy of this graph. These algorithms are of practical interest since although \(\delta(G_D)\) can be as large as \(\sqrt{d}\), and even larger if \(G_D\) is a multi-graph, it is typically a small constant in practice.

MSC:

68W32 Algorithms on strings
68R10 Graph theory (including graph drawing) in computer science
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Abboud, A.; Williams, V. V., Popular conjectures imply strong lower bounds for dynamic problems, (Proceedings of the 55th IEEE Annual Symposium on Foundations of Computer Science (FOCS) (2014)), 434-443
[2] Aho, A. V.; Corasick, M. J., Efficient string matching: an aid to bibliographic search, Commun. ACM, 18, 6, 333-340 (1975) · Zbl 0301.68048
[3] Amir, A.; Farach, M.; Idury, R. M.; La Poutré, J. A.; Schäffer, A. A., Improved dynamic dictionary matching, Inf. Comput., 119, 2, 258-282 (1995) · Zbl 0832.68033
[4] Amir, A.; Keselman, D.; Landau, G. M.; Lewenstein, M.; Lewenstein, N.; Rodeh, M., Text indexing and dictionary matching with one error, J. Algorithms, 37, 2, 309-325 (2000) · Zbl 0966.68062
[5] Amir, A.; Kopelowitz, T.; Levy, A.; Pettie, S.; Porat, E.; Shalom, B. R., Mind the gap: essentially optimal algorithms for online dictionary matching with one gap, (27th International Symposium on Algorithms and Computation (ISAAC) (2016)), 12:1-12:12 · Zbl 1398.68207
[6] Amir, A.; Kopelowitz, T.; Levy, A.; Pettie, S.; Porat, E.; Shalom, B. R., Mind the gap! Online dictionary matching with one gap, Algorithmica, 81, 6, 2123-2157 (2019) · Zbl 1412.68075
[7] Amir, A.; Levy, A.; Porat, E.; Shalom, B. R., Dictionary matching with one gap, (Proceedings of the 25th Annual Symposium on Combinatorial Pattern Matching (CPM) (2014)), 11-20 · Zbl 1390.68781
[8] Amir, A.; Levy, A.; Porat, E.; Shalom, B. R., Dictionary matching with a few gaps, Theor. Comput. Sci., 589, 34-46 (2015) · Zbl 1319.68106
[9] Bentley, J. L., Decomposable searching problems, Inf. Process. Lett., 8, 5, 244-251 (1979) · Zbl 0404.68067
[10] Bille, P.; Gørtz, I. L.; Vildhøj, H. W.; Wind, D. K., String matching with variable length gaps, Theor. Comput. Sci., 443, 25-34 (2012) · Zbl 1247.68333
[11] Bille, P.; Thorup, M., Regular expression matching with multi-strings and intervals, (Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (2010)), 1297-1308 · Zbl 1288.68299
[12] Bjørklund, A.; Pagh, R.; Williams, V. V.; Zwick, U., Listing triangles, (Proceedings 41st International Colloquium on Automata, Languages, and Programming (ICALP (I)) (2014)), 223-234 · Zbl 1410.05191
[13] Brodal, G. S.; Gasieniec, L., Approximate dictionary queries, (Proceeding of the 7th Annual Symposium on Combinatorial Pattern Matching (CPM) (1996)), 65-74
[14] Chiba, N.; Nishizeki, T., Arboricity and subgraph listing algorithms, SIAM J. Comput., 14, 1, 210-223 (1985) · Zbl 0572.68053
[15] Cole, R.; Gottlieb, L.; Lewenstein, M., Dictionary matching and indexing with errors and don’t cares, (Proceedings of the 36th Annual ACM Symposium on Theory on Computing (STOC) (2004)), 91-100 · Zbl 1192.68818
[16] Fredriksson, K.; Grabowski, S., Efficient algorithms for pattern matching with general gaps, character classes, and transposition invariance, Inf. Retr., 11, 4, 335-357 (2008)
[17] Grønlund, A.; Pettie, S., Threesomes, degenerates, and love triangles, J. ACM, 65, 4 (2018) · Zbl 1422.68112
[18] Haapasalo, T.; Silvasti, P.; Sippu, S.; Soisalon-Soininen, E., Online dictionary matching with variable-length gaps, (Symposium on Experimental Algorithms (SEA) (2011)), 76-87
[19] Hofmann, K.; Bucher, P.; Falquet, L.; Bairoch, A., The PROSITE database, its status in 1999, Nucleic Acids Res., 27, 1, 215-219 (1999)
[20] Hon, W.; Lam, T.; Shah, R.; Thankachan, S. V.; Ting, H.; Yang, Y., Dictionary matching with uneven gaps, (Proceedings of the 26th Annual Symposium on Combinatorial Pattern Matching (CPM) (2015)), 247-260 · Zbl 1383.68105
[21] Hon, W.; Lam, T.; Shah, R.; Thankachan, S. V.; Ting, H.; Yang, Y., Dictionary matching with a bounded gap in pattern or in text, Algorithmica, 80, 2, 698-713 (2018) · Zbl 1391.68129
[22] Itai, A.; Rodeh, M., Finding a minimum circuit in a graph, SIAM J. Comput., 7, 4, 413-423 (1978) · Zbl 0386.68064
[23] Kopelowitz, T.; Pettie, S.; Porat, E., Dynamic set intersection, (Proceedings of the 14th Annual Symposium on Algorithms and Data Structures (WADS) (2015)), 470-481 · Zbl 1451.68081
[24] Kopelowitz, T.; Pettie, S.; Porat, E., Higher lower bounds from the 3sum conjecture, (Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (2016)), 1272-1287 · Zbl 1410.68147
[25] Kucherov, G.; Rusinowitch, M., Matching a set of strings with variable length don’t cares, Theor. Comput. Sci., 178, 1-2, 129-154 (1997) · Zbl 0901.68037
[26] Lee, D. T.; Wong, C. K., Quintary trees: a file structure for multidimensional database systems, ACM Trans. Database Syst., 5, 3, 339 (1980) · Zbl 0441.68122
[27] Liu, N.; Xie, F.; Wu, X., Multi-pattern matching with variable-length wildcards using suffix tree, Pattern Anal. Appl., 21, 4, 1151-1165 (2018)
[28] Lueker, G. S., A data structure for orthogonal range queries, (Proceedings of the 19th Annual Symposium on Foundations of Computer Science (1978)), 28-34
[29] Mehlhorn, K.; Näher, S., Dynamic fractional cascading, Algorithmica, 5, 2, 215-241 (1990) · Zbl 0693.68038
[30] Morgante, M.; Policriti, A.; Vitacolonna, N.; Zuccolo, A., Structured motifs search, J. Comput. Biol., 12, 8, 1065-1082 (2005)
[31] Mortensen, C. W., Fully dynamic orthogonal range reporting on RAM, SIAM J. Comput., 35, 6, 1494-1525 (2006) · Zbl 1115.68059
[32] Myers, E. W., A four Russians algorithm for regular expression pattern matching, J. ACM, 39, 2, 430-448 (1992) · Zbl 0799.68104
[33] Myers, G.; Mehldau, G., A system for pattern matching applications on biosequences, Comput. Appl. Biosci., 9, 3, 299-314 (1993)
[34] Navarro, G.; Raffinot, M., Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching, J. Comput. Biol., 10, 6, 903-923 (2003)
[35] Nekrich, Y., Space efficient dynamic orthogonal range reporting, Algorithmica, 49, 94-108 (2007) · Zbl 1131.68041
[36] Pǎtraşcu, M., Towards polynomial lower bounds for dynamic problems, (Proceedings of the 42nd Annual Symposium on Theory of Computing (STOC) (2010)), 603-610 · Zbl 1293.68153
[37] Verint Systems. Personal communication, 2013. Addres: Maskit St. 33 Herzliya Israel.
[38] Willard, D. E., Log-logarithmic worst-case range queries are possible in space \(\theta(n)\), Inf. Process. Lett., 17, 2, 81-84 (1983) · Zbl 0509.68106
[39] Zhang, M.; Zhang, Y.; Hu, L., A faster algorithm for matching a set of patterns with variable length don’t cares, Inf. Process. Lett., 110, 6, 216-220 (2010) · Zbl 1211.68141
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.