×

zbMATH — the first resource for mathematics

Repetitions in strings: algorithms and combinatorics. (English) Zbl 1180.68206
Summary: The article is an overview of basic issues related to repetitions in strings, concentrating on algorithmic and combinatorial aspects. This area is important both from theoretical and practical points of view. Repetitions are highly periodic factors (substrings) in strings and are related to periodicities, regularities, and compression. The repetitive structure of strings leads to higher compression rates, and conversely, some compression techniques are at the core of fast algorithms for detecting repetitions. There are several types of repetitions in strings: squares, cubes, and maximal repetitions also called runs. For these repetitions, we distinguish between the factors (sometimes qualified as distinct) and their occurrences (also called positioned factors).
The combinatorics of repetitions is a very intricate area, full of open problems. For example we know that the number of (distinct) primitively-rooted squares in a string of length \(n\) is no more than \(2n-\Theta(\log n)\), conjecture to be \(n\), and that their number of occurrences can be \(\Theta(n\log n)\). Similarly we know that there are at most \(1.029n\) and at least \(0.944n\) maximal repetitions and the conjecture is again that the exact bound is \(n\). We know almost everything about the repetitions in Sturmian words, but despite the simplicity of these words, the results are nontrivial. One of the main motivations for writing this text is the development during the last couple of years of new techniques and results about repetitions. We report both the progress which has been achieved and which we expect to happen.

MSC:
68R15 Combinatorics on words
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Apostolico, A.; Preparata, F.P., Optimal off-line detection of repetitions in a string, Theoret. comput. sci., 22, 3, 297-315, (1983) · Zbl 0497.68052
[2] Baturo, P.; Piatkowski, M.; Rytter, W., The number of runs in Sturmian words, (), 252-261 · Zbl 1172.68565
[3] Berstel, J.; Savelli, A., Crochemore factorization of Sturmian and other infinite strings, (), 157-166 · Zbl 1132.68512
[4] Chen, G.; Puglisi, S.J.; Smyth, W.F., Fast and practical algorithms for computing all runs in a string, (), 307-315 · Zbl 1138.68658
[5] Constantinescu, S.; Ilie, L., The lempel – ziv complexity of fixed points of morphisms, SIAM J. discrete math., 21, 2, 466-481, (2007) · Zbl 1138.68046
[6] Crochemore, M., An optimal algorithm for computing the repetitions in a string, Inform. process. lett., 12, 5, 244-250, (1981) · Zbl 0467.68075
[7] Crochemore, M., Transducers and repetitions, Theoret. comput. sci., 45, 1, 63-86, (1986) · Zbl 0615.68053
[8] Crochemore, M.; Fazekas, S. Z.; Iliopoulos, C.; Jayasekera, I., Bounds on powers in strings, (), 206-215 · Zbl 1159.68014
[9] Crochemore, M.; Hancart, C.; Lecroq, T., Algorithms on strings, (2007), Cambridge Univ. Press
[10] Crochemore, M.; Ilie, L., Analysis of maximal repetitions in strings, (), 465-476 · Zbl 1147.68864
[11] Crochemore, M.; Ilie, L., Maximal repetitions in strings, J. comput. syst. sci., 74, 796-807, (2008) · Zbl 1149.68066
[12] Crochemore, M.; Ilie, L., Computing longest previous factors in linear time and applications, Inform. process. lett., 106, 2, 75-80, (2008) · Zbl 1186.68591
[13] Crochemore, M.; Ilie, L.; Smyth, W.F., A simple algorithm for computing the lempel – ziv factorization, (), 482-488
[14] Crochemore, M.; Ilie, L.; Tinta, L., Towards a solution to the “runs” conjecture, (), 290-302 · Zbl 1143.68510
[15] Crochemore, M.; Perrin, D., Two-way string matching, J. ACM, 38, 3, 651-675, (1991) · Zbl 0808.68063
[16] Crochemore, M.; Rytter, W., Squares, cubes, and time-space efficient string searching, Algorithmica, 13, 5, 405-425, (1995) · Zbl 0849.68044
[17] Crochemore, M.; Rytter, W., Jewels of stringology, (2003), World Scientific Singapore · Zbl 1078.68151
[18] Franek, F.; Karaman, A.; Smyth, W.F., Repetitions in Sturmian strings, Theoret. comput. sci., 249, 2, 289-303, (2000) · Zbl 0949.68124
[19] Fraenkel, A.S.; Simpson, R.J., How many squares can a string contain?, J. combin. theory ser. A, 82, 112-120, (1998) · Zbl 0910.05001
[20] Fraenkel, A.S.; Simpson, R.J., The exact number of squares in Fibonacci strings, Theoret. comput. sci., 218, 1, 95-106, (1999) · Zbl 0916.68122
[21] Franek, F.; Yang, Q., An asymptotic lower bound for the maximal number of runs in a string, Int. J. found. comput. sci., 19, 1, 195-203, (2008) · Zbl 1169.68563
[22] Franek, F.; Smyth, W.F.; Tang, Y., Computing all repeats using suffix arrays, J. autom. lang. comb., 8, 4, 579-591, (2003) · Zbl 1088.68679
[23] Galil, Z.; Seiferas, J., Time-space-optimal string matching, J. comput. syst. sci., 26, 3, 280-294, (1983) · Zbl 0509.68101
[24] Gasieniec, L.; Plandowski, W.; Rytter, W., Constant-space string matching with smaller number of comparisons: sequential sampling, (), 78-89
[25] M. Giraud, Not so many runs in strings., in: Proc. of LATA’08, Rovira i Virgili University, 2008, pp. 245-252 · Zbl 1156.68511
[26] Gusfield, D., ()
[27] Gusfield, D.; Stoye, J., Linear time algorithms for finding and representing all the tandem repeats in a string, J. comput. syst. sci., 69, 4, 525-546, (2004) · Zbl 1076.68111
[28] Ilie, L., A simple proof that a string of length \(n\) has at most \(2 n\) distinct squares, J. combin. theory, ser. A, 112, 1, 163-164, (2005) · Zbl 1088.68146
[29] Ilie, L., A note on the number of squares in a string, Theoret. comput. sci., 380, 3, 373-376, (2007) · Zbl 1119.68141
[30] Iliopoulos, C.; Moore, D.; Smyth, W.F., A characterization of the squares in a Fibonacci string, Theoret. comput. sci., 172, 281-291, (1997) · Zbl 0903.68050
[31] Karhumäki, J., On strongly cube-free \(\omega\)-words generated by binary morphisms, (), 182-189
[32] Kolpakov, R.; Kucherov, G., Finding maximal repetitions in a string in linear time, (), 596-604
[33] Lothaire, M., Algebraic combinatorics on words, (2002), Cambridge University Press · Zbl 1001.68093
[34] Lothaire, M., Applied combinatorics on words, (2005), Cambridge University Press · Zbl 1133.68067
[35] MacDonald, M.; Ambrose, C.M., A novel gene containing a trinucleotide repeat that is expanded and unstable on huntington’s disease chromosomes, Cell, 72, 6, 971-983, (1993)
[36] Main, M.G., Detecting leftmost maximal periodicities, Discret. appl. math., 25, 145-153, (1989) · Zbl 0683.68033
[37] Main, M.G.; Lorentz, R.J., An \(O(n \log n)\) algorithm for finding all repetitions in a string, J. algorithms, 5, 3, 422-432, (1984) · Zbl 0547.68083
[38] W. Matsubara, K. Kusano, A. Ishino, H. Bannai, A. Shinohara, New Lower Bounds for the Maximum Number of Runs in a string, in: Prague Stringology Conference, Czech Technical Univ. in Prague, 2008, pp. 140-144
[39] Mignosi, F.; Pirillo, G., Repetitions in the Fibonacci infinite word, RAIRO inform. théor. appl., 26, 3, 199-204, (1992) · Zbl 0761.68078
[40] Puglisi, S.J.; Simpson, J.; Smyth, B., How many runs can a string contain?, Theor. comput. sci., 401, 1-3, 165-171, (2008) · Zbl 1155.68070
[41] Rytter, W., The structure of subword graphs and suffix trees of Fibonacci words, Theor. comput. sci., 363, 2, 211-223, (2006) · Zbl 1153.68044
[42] Rytter, W., The number of runs in a string: improved analysis of the linear upper bound, (), 184-195 · Zbl 1136.68621
[43] Rytter, W., The number of runs in a string, Inf. comput., 205, 9, 459-1469, (2007) · Zbl 1127.68076
[44] Smyth, W.F., Repetitive perhaps, but certainly not boring, Theoret. comput. sci., 249, 2, 343-355, (2000) · Zbl 0949.68125
[45] Smyth, W.F., Computing patterns in strings, (2003), Addison-Wesley · Zbl 1253.68277
[46] Thue, A., Über unendliche zeichenreihen, Kra. vidensk. selsk. skrifter. I. mat.-nat. kl., cristiana, 7, (1906) · JFM 39.0283.01
[47] Witten, I.H.; Moffat, A.; Bell, T.C., Managing gigabytes, (1994), Van Nostrand Reinhold New York · Zbl 0821.68051
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.