×

zbMATH — the first resource for mathematics

Dynamic and internal longest common substring. (English) Zbl 07272778
Summary: Given two strings \(S\) and \(T\), each of length at most \(n\), the longest common substring (LCS) problem is to find a longest substring common to \(S\) and \(T\). This is a classical problem in computer science with an \(\mathcal{O}(n)\)-time solution. In the fully dynamic setting, edit operations are allowed in either of the two strings, and the problem is to find an LCS after each edit. We present the first solution to the fully dynamic LCS problem requiring sublinear time in \(n\) per edit operation. In particular, we show how to find an LCS after each edit operation in \(\tilde{\mathcal{O}}(n^{2/3})\) time, after \(\tilde{\mathcal{O}}(n)\)-time and space preprocessing. This line of research has been recently initiated in a somewhat restricted dynamic variant by Amir et al. [SPIRE 2017]. More specifically, the authors presented an \(\tilde{\mathcal{O}}(n)\)-sized data structure that returns an LCS of the two strings after a single edit operation (that is reverted afterwards) in \(\tilde{\mathcal{O}}(1)\) time. At CPM 2018, three papers (Abedin et al., Funakoshi et al., and Urabe et al.) studied analogously restricted dynamic variants of problems on strings; specifically, computing the longest palindrome and the Lyndon factorization of a string after a single edit operation. We develop dynamic sublinear-time algorithms for both of these problems as well. We also consider internal LCS queries, that is, queries in which we are to return an LCS of a pair of substrings of \(S\) and \(T\). We show that answering such queries is hard in general and propose efficient data structures for several restricted cases.

MSC:
68Wxx Algorithms in computer science
05Cxx Graph theory
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Abboud, A., Backurs, A., Williams, V.V.: Tight hardness results for LCS and other sequence similarity measures. In: IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, pp. 59-78. IEEE Computer Society (2015)
[2] Abboud, A., Williams, R.R., Yu, H.: More applications of the polynomial method to algorithm design. In: Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015, pp. 218-230 (2015) · Zbl 1372.68282
[3] Abedin, P., Hooshmand, S., Ganguly, A., Thankachan, S.V.: The heaviest induced ancestors problem revisited. In: Annual Symposium on Combinatorial Pattern Matching, CPM 2018, July 2-4, 2018—Qingdao, China, pp. 20:1-20:13 (2018)
[4] Afshani, P., Nielsen, J.S.: Data structure lower bounds for document indexing problems. In 43rd International Colloquium on Automata, Languages, and Programming, ICALP 2016, July 11-15, 2016, Rome, Italy, pp. 93:1-93:15 (2016) · Zbl 1388.68026
[5] Agarwal, PK; Goodman, JE; O’Rourke, J., Range searching, Handbook of Discrete and Computational Geometry, 809-837 (2004), Boca Raton: Chapman and Hall, Boca Raton
[6] Alstrup, S., Brodal, G.S., Rauhe, T.: New data structures for orthogonal range searching. In: 41st Annual Symposium on Foundations of Computer Science, FOCS 2000, 12-14 November 2000, Redondo Beach, California, USA, pages 198-207, (2000)
[7] Alstrup, S., Brodal, G.S., Rauhe, T.: Pattern matching in dynamic texts. In: Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’00, pp. 819-828, Philadelphia, PA, USA, Society for Industrial and Applied Mathematics (2000) · Zbl 0956.68042
[8] Amir, A., Boneh, I.: Locally maximal common factors as a tool for efficient dynamic string algorithms. In: Annual Symposium on Combinatorial Pattern Matching, CPM 2018, July 2-4, 2018 - Qingdao, China, pp. 11:1-11:13 (2018)
[9] Amir, A., Boneh, I.: Dynamic palindrome detection. CoRR (2019). arXiv:1906.09732
[10] Amir, A., Boneh, I., Charalampopoulos, P., Kondratovsky, E.: Repetition detection in a dynamic string. In: 27th Annual European Symposium on Algorithms, ESA 2019, September 9-11, 2019, Munich/Garching, Germany, pp. 5:1-5:18 (2019)
[11] Amir, A., Charalampopoulos, P. Iliopoulos, C.S., Pissis, S.P., Radoszewski, J.: Longest common factor after one edit operation. In: String Processing and Information Retrieval—24th International Symposium, SPIRE 2017, Palermo, Italy, September 26-29, 2017, Proceedings, pp. 14-26 (2017)
[12] Amir, A., Charalampopoulos, P., Pissis, S.P., Radoszewski, J.: Longest common substring made fully dynamic. In: 27th Annual European Symposium on Algorithms, ESA 2019, September 9-11, 2019, Munich/Garching, Germany, pp. 6:1-6:17 (2019)
[13] Amir, A.; Landau, GM; Lewenstein, M.; Sokol, D., Dynamic text and static pattern matching, ACM Trans. Algorithms, 3, 2, 19 (2007) · Zbl 1321.68547
[14] Amir, A., Lewenstein, M., Thankachan, S.V.: Range LCP queries revisited. In: String Processing and Information Retrieval—22nd International Symposium, SPIRE 2015, London, UK, September 1-4, 2015, Proceedings, pp. 350-361 (2015)
[15] Apostolico, A.; Crochemore, M., Fast parallel Lyndon factorization with applications, Math. Syst. Theory, 28, 2, 89-108 (1995) · Zbl 0815.68066
[16] Apostolico, A., Crochemore, M., Farach-Colton, M., Galil, Z., Muthukrishnan, S.: Forty years of text indexing. In: Combinatorial Pattern Matching, 24th Annual Symposium, CPM 2013, Bad Herrenalb, Germany, June 17-19, 2013. Proceedings, pp. 1-10 (2013) · Zbl 1381.68067
[17] Ayad, L.A.K., Barton, C., Charalampopoulos, P., Iliopoulos, C.S., Pissis, S.P.: Longest common prefixes with k-errors and applications. In: String Processing and Information Retrieval - 25th International Symposium, SPIRE 2018, Lima, Peru, October 9-11, 2018, Proceedings, pp. 27-41 (2018) · Zbl 1445.68362
[18] Backurs, A.; Indyk, P., Edit distance cannot be computed in strongly subquadratic time (unless SETH is false), SIAM J. Comput., 47, 3, 1087-1097 (2018) · Zbl 1396.68137
[19] Bannai, H., I, T., Inenaga, S., Nakashima, Y., Takeda, M., Tsuruta, K.: A new characterization of maximal repetitions by Lyndon trees. In: Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015, pp. 562-571 (2015) · Zbl 1372.68216
[20] Bannai, H.; I, T.; Inenaga, S.; Nakashima, Y.; Takeda, M.; Tsuruta, K., The “runs” theorem, SIAM J. Comput., 46, 5, 1501-1514 (2017) · Zbl 1375.68093
[21] Barcelo, H., On the action of the symmetric group on the free Lie algebra and the partition lattice, J. Comb. Theory Ser. A, 55, 1, 93-129 (1990) · Zbl 0709.17005
[22] Bender, M.A., Farach-Colton, M.: The LCA problem revisited. In: LATIN 2000: Theoretical Informatics, 4th Latin American Symposium, Punta del Este, Uruguay, April 10-14, 2000, Proceedings, pp. 88-94 (2000) · Zbl 0959.68133
[23] Bentley, JL, Multidimensional divide-and-conquer, Commun. ACM, 23, 4, 214-229 (1980) · Zbl 0434.68049
[24] Borozdin, K., Kosolobov, D., Rubinchik, M., Shur, A.M.: Palindromic length in linear time. In: 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017, July 4-6, 2017, Warsaw, Poland, pp. 23:1-23:12 (2017) · Zbl 1434.68724
[25] Bringmann, K., Künnemann, M.: Quadratic conditional lower bounds for string problems and dynamic time warping. In: 56th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2015, pp. 79-97. IEEE Computer Society (2015)
[26] Burkhardt, S., Kärkkäinen, J.: Fast lightweight suffix array construction and checking. In: Combinatorial Pattern Matching, 14th Annual Symposium, CPM 2003, Morelia, Michocán, Mexico, June 25-27, 2003, Proceedings, pp. 55-69 (2003) · Zbl 1279.68065
[27] Chan, T. M., Larsen, K. G., Pǎtraşcu, M.: Orthogonal range searching on the RAM, revisited. In: Proceedings of the 27th ACM Symposium on Computational Geometry, Paris, France, June 13-15, 2011, pp. 1-10 (2011) · Zbl 1283.68139
[28] Charalampopoulos, P., Crochemore, M., Iliopoulos, C.S., Kociumaka, T., Pissis, S.P., Radoszewski, J., Rytter, W., Waleń, T.: Linear-time algorithm for long LCF with k mismatches. In: Annual Symposium on Combinatorial Pattern Matching, CPM 2018, July 2-4, 2018 - Qingdao, China, pp. 23:1-23:16 (2018) · Zbl 1434.68729
[29] Charalampopoulos, P., Gawrychowski, P., Pokorski, K.: Dynamic longest common substring in polylogarithmic time. In: 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020, pp. 27:1-27:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)
[30] Charalampopoulos, P., Kociumaka, T., Mohamed, M., Radoszewski, J., Rytter, W., Straszyński, J., Waleń, T., Zuba, W.: Counting distinct patterns in internal dictionary matching. In: 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, pp. 8:1-8:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)
[31] Charalampopoulos, P., Kociumaka, T., Mohamed, M., Radoszewski, J. Rytter, W., Waleń, T.: Internal dictionary matching. In: 30th International Symposium on Algorithms and Computation, ISAAC 2019, December 8-11, 2019, Shanghai, China, pp. 22:1-22:17 (2019)
[32] Charalampopoulos, P., Kociumaka, T., Mozes, S.: Dynamic string alignment. In: 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, volume 161 of LIPIcs, pp. 9:1-9:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)
[33] Charalampopoulos, P., Kociumaka, T., Wellnitz, P.: Faster approximate pattern matching: a unified approach. CoRR (2020). arXiv:2004.08350
[34] Chen, K-T; Fox, RH; Lyndon, RC, Free differential calculus, IV, Ann. Math., 68, 81-95 (1958) · Zbl 0142.22304
[35] Clifford, R., Grønlund, A., Larsen, K.G., Starikovskaya, T.A.: Upper and lower bounds for dynamic data structures on strings. In: 35th Symposium on Theoretical Aspects of Computer Science, STACS 2018, February 28-March 3, 2018, Caen, France, pp. 22:1-22:14 (2018) · Zbl 07228413
[36] Cohen, H., Porat, E.: On the hardness of distance oracle for sparse graph. CoRR (2010). arXiv:1006.1117
[37] Crochemore, M.; Hancart, C.; Lecroq, T., Algorithms on Strings (2007), Cambridge: Cambridge University Press, Cambridge
[38] Crochemore, M.; Iliopoulos, CS; Mohamed, M.; Sagot, M., Longest repeats with a block of \(k\) don’t cares, Theoret. Comput. Sci., 362, 1-3, 248-254 (2006) · Zbl 1103.68131
[39] Daykin, JW; Iliopoulos, CS; Smyth, WF, Parallel RAM algorithms for factorizing words, Theoret. Comput. Sci., 127, 1, 53-67 (1994) · Zbl 0805.68057
[40] Dietz, PF; Mehlhorn, K.; Raman, R.; Uhrig, C., Lower bounds for set intersection queries, Algorithmica, 14, 2, 154-168 (1995) · Zbl 0833.68037
[41] Farach, M.: Optimal suffix tree construction with large alphabets. In: 38th Annual Symposium on Foundations of Computer Science, FOCS ’97, Miami Beach, Florida, USA, October 19-22, 1997, pp. 137-143 (1997)
[42] Ferragina, P., Dynamic text indexing under string updates, J. Algorithms, 22, 2, 296-328 (1997) · Zbl 0876.68038
[43] Ferragina, P.; Grossi, R., Optimal on-line search and sublinear time update in string matching, SIAM J. Comput., 27, 3, 713-736 (1998) · Zbl 0911.68044
[44] Fici, G.; Gagie, T.; Kärkkäinen, J.; Kempa, D., A subquadratic algorithm for minimum palindromic factorization, J. Discrete Algorithms, 28, 41-48 (2014) · Zbl 1305.68382
[45] Fischer, J., Köppl, D., Kurpicz, F.: On the benefit of merging suffix array intervals for parallel pattern matching. In: 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016, June 27-29, 2016, Tel Aviv, Israel, pp. 26:1-26:11 (2016) · Zbl 1380.68471
[46] Flouri, T.; Giaquinta, E.; Kobert, K.; Ukkonen, E., Longest common substrings with \(k\) mismatches, Inf. Process. Lett., 115, 6-8, 643-647 (2015) · Zbl 1328.68326
[47] Fredman, ML; Komlós, J.; Szemerédi, E., Storing a sparse table with O(1) worst case access time, J. ACM, 31, 3, 538-544 (1984) · Zbl 0629.68068
[48] Funakoshi, M., Nakashima, Y., Inenaga, S., Bannai, H., Takeda, M.: Longest substring palindrome after edit. In: Annual Symposium on Combinatorial Pattern Matching, CPM 2018, July 2-4, 2018—Qingdao, China, pp. 12:1-12:14 (2018)
[49] Funakoshi, M., Nakashima, Y., Inenaga, S., Bannai, H., Takeda, M.: Faster queries for longest substring palindrome after block edit. In: 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019, June 18-20, 2019, Pisa, Italy, pp. 27:1-27:13 (2019)
[50] Gagie, T., Gawrychowski, P., Nekrich, Y.: Heaviest induced ancestors and longest common substrings. In: Proceedings of the 25th Canadian Conference on Computational Geometry, CCCG 2013, Waterloo, Ontario, Canada, August 8-10, 2013 (2013)
[51] Gawrychowski, P., Karczmarz, A., Kociumaka, T., Lacki, J., Sankowski, P.: Optimal dynamic strings. In: Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pp. 1509-1528 (2018). Full version available at arXiv:1511.02612 · Zbl 1403.68372
[52] Goldstein, I., Kopelowitz, T., Lewenstein, M., Porat, E.: Conditional lower bounds for space/time tradeoffs. In: Algorithms and Data Structures—15th International Symposium, WADS 2017, St. John’s, NL, Canada, July 31-August 2, 2017, Proceedings, pp. 421-436 (2017) · Zbl 1457.68115
[53] Gu, M., Farach, M., Beigel, R.: An efficient algorithm for dynamic text indexing. In: Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’94, pp. 697-704, Philadelphia, PA, USA, (1994) · Zbl 0871.68072
[54] Gusfield, D., Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology (1997), New York: Cambridge University Press, New York
[55] Hyyrö, H.; Narisawa, K.; Inenaga, S., Dynamic edit distance table under a general weighted cost function, J. Discrete Algorithms, 34, 2-17 (2015) · Zbl 1336.68316
[56] I, T.; Nakashima, Y.; Inenaga, S.; Bannai, H.; Takeda, M., Faster Lyndon factorization algorithms for SLP and LZ78 compressed text, Theoret. Comput. Sci., 656, 215-224 (2016) · Zbl 1356.68302
[57] I, T., Sugimoto, S., Inenaga, S., Bannai, H., Takeda, M.: Computing palindromic factorizations and palindromic covers on-line. In: Combinatorial Pattern Matching—25th Annual Symposium, CPM 2014, Moscow, Russia, June 16-18, 2014. Proceedings, pp. 150-161 (2014) · Zbl 1407.68575
[58] Karp, RM; Rabin, MO, Efficient randomized pattern-matching algorithms, IBM J. Res. Dev., 31, 2, 249-260 (1987) · Zbl 0653.68054
[59] Knuth, D.E.: The Art of Computer Programming, Volume 4, Fascicle 2: Generating All Tuples and Permutations. Addison-Wesley Professional, New York (2005) · Zbl 1127.68068
[60] Kociumaka, T.: Minimal suffix and rotation of a substring in optimal time. In: 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016, June 27-29, 2016, Tel Aviv, Israel, pp. 28:1-28:12, (2016) · Zbl 1380.68151
[61] Kociumaka, T.: Efficient data structures for internal queries in texts. Ph.D. thesis, University of Warsaw, Oct. 2018. https://www.mimuw.edu.pl/ kociumaka/files/phd.pdf
[62] Kociumaka, T., Radoszewski, J., Rytter, W., Waleń, T.: Internal pattern matching queries in a text and applications. In: Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015, pp. 532-551 (2015) · Zbl 1371.68340
[63] Kociumaka, T., Starikovskaya, T.A., Vildhøj, H.W.: Sublinear space algorithms for the longest common substring problem. In: Algorithms - ESA 2014 - 22th Annual European Symposium, Wroclaw, Poland, September 8-10, 2014. Proceedings, pp. 605-617 (2014) · Zbl 1425.68469
[64] Lyndon, RC, On Burnside’s problem, Trans. Am. Math. Soc., 77, 202-215 (1954) · Zbl 0058.01702
[65] Maekawa, M., A \(\sqrt{n}\) algorithm for mutual exclusion in decentralized systems, ACM Trans. Comput. Syst., 3, 2, 145-159 (1985)
[66] Manacher, GK, A new linear-time “on-line” algorithm for finding the smallest initial palindrome of a string, J. ACM, 22, 3, 346-351 (1975) · Zbl 0305.68027
[67] Mehlhorn, K.; Sundar, R.; Uhrig, C., Maintaining dynamic sequences under equality tests in polylogarithmic time, Algorithmica, 17, 2, 183-198 (1997) · Zbl 0865.68034
[68] Mucha, M.: Lyndon words and short superstrings. In: Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, Louisiana, USA, January 6-8, 2013, pp. 958-972, (2013) · Zbl 1422.68332
[69] Pǎtraşcu, M.; Roditty, L., Distance oracles beyond the Thorup-Zwick bound, SIAM J. Comput., 43, 1, 300-311 (2014) · Zbl 1310.68117
[70] Sahinalp, S.C., Vishkin, U.: Efficient approximate and dynamic matching of patterns using a labeling paradigm (extended abstract). In: 37th Annual Symposium on Foundations of Computer Science, FOCS ’96, Burlington, Vermont, USA, 14-16 October, 1996, pp. 320-328 (1996)
[71] Sleator, DD; Tarjan, RE, A data structure for dynamic trees, J. Comput. Syst. Sci., 26, 3, 362-391 (1983) · Zbl 0509.68058
[72] Starikovskaya, T. A.: Longest common substring with approximately \(k\) mismatches. In: 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016, June 27-29, 2016, Tel Aviv, Israel, pp. 21:1-21:11 (2016) · Zbl 1380.68482
[73] Starikovskaya, T.A., Vildhøj, H. W.: Time-space trade-offs for the longest common substring problem. In: Combinatorial Pattern Matching, 24th Annual Symposium, CPM 2013, Bad Herrenalb, Germany, June 17-19, 2013. Proceedings, pp. 223-234 (2013) · Zbl 1381.68102
[74] Sundar, R.; Tarjan, RE, Unique binary-search-tree representations and equality testing of sets and sequences, SIAM J. Comput., 23, 1, 24-44 (1994) · Zbl 0802.68035
[75] Thankachan, S.V., Aluru, C., Chockalingam, S.P., Aluru, S.: Algorithmic framework for approximate matching under bounded edits with applications to sequence analysis. In: Research in Computational Molecular Biology—22nd Annual International Conference, RECOMB 2018, Paris, France, April 21-24, 2018, Proceedings, pp. 211-224 (2018)
[76] Thankachan, SV; Apostolico, A.; Aluru, S., A provably efficient algorithm for the k-mismatch average common substring problem, J. Comput. Biol., 23, 6, 472-482 (2016)
[77] Urabe, Y., Nakashima, Y., Inenaga, S., Bannai, H., Takeda, M.: Longest Lyndon substring after edit. In: Annual Symposium on Combinatorial Pattern Matching, CPM 2018, July 2-4, 2018—Qingdao, China, pp. 19:1-19:10 (2018)
[78] Weiner, P.: Linear pattern matching algorithms. In: 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, October 15-17, 1973, pp. 1-11 (1973)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.