×

Periodic words, common subsequences and frogs. (English) Zbl 1517.60018

Summary: Let \(W^{(n)}\) be the \(n\)-letter word obtained by repeating a fixed word \(W\), and let \(R_n\) be a random \(n\)-letter word over the same alphabet. We show several results about the length of the longest common subsequence (LCS) between \(W^{(n)}\) and \(R_n\); in particular, we show that its expectation is \(\gamma_W n-O (\sqrt{n})\) for an efficiently-computable constant \(\gamma_W\).
This is done by relating the problem to a new interacting particle system, which we dub “frog dynamics”. In this system, the particles (“frogs”) hop over one another in the order given by their labels. Stripped of the labeling, the frog dynamics reduces to a variant of the PushTASEP.
In the special case when all symbols of \(W\) are distinct, we obtain an explicit formula for the constant \(\gamma_W\) and a closed-form expression for the stationary distribution of the associated frog dynamics.
In addition, we propose new conjectures about the asymptotic of the LCS of a pair of random words. These conjectures are informed by computer experiments using a new heuristic algorithm to compute the LCS. Through our computations, we found periodic words that are more random-like than a random word, as measured by the LCS.

MSC:

60C05 Combinatorial probability
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] ALDOUS, D. and FILL, J. A. (2002). Reversible Markov chains and random walks on graphs. Unfinished monograph, recompiled 2014, available at: http://www.stat.berkeley.edu/texttildelowaldous/RWG/book.html.
[2] ALEXANDER, K. S. (1994). The rate of convergence of the mean length of the longest common subsequence. Ann. Appl. Probab. 4 1074-1082. · Zbl 0812.60014
[3] ANGEL, O. (2006). The stationary measure of a 2-type totally asymmetric exclusion process. J. Combin. Theory Ser. A 113 625-635. · Zbl 1087.60067 · doi:10.1016/j.jcta.2005.05.004
[4] BAEZA-YATES, R. A., GAVALDÀ, R., NAVARRO, G. and SCHEIHING, R. (1999). Bounding the expected length of longest common subsequences and forests. Theory Comput. Syst. 32 435-452. · Zbl 0934.68043 · doi:10.1007/s002240000125
[5] Borodin, A. and Ferrari, P. L. (2008). Large time asymptotics of growth models on space-like paths. I. PushASEP. Electron. J. Probab. 13 1380-1418. · Zbl 1187.82084 · doi:10.1214/EJP.v13-541
[6] BUNDSCHUH, R. (2001). High precision simulations of the longest common subsequence problem. Eur. Phys. J. B 22 533-541.
[7] CHUNG, K.-M., LAM, H., LIU, Z. and MITZENMACHER, M. (2012). Chernoff-Hoeffding bounds for Markov chains: Generalized and simplified. In 29th International Symposium on Theoretical Aspects of Computer Science. LIPIcs. Leibniz Int. Proc. Inform. 14 124-135. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern. · Zbl 1245.68143
[8] CHUNG, K. L. (1967). Markov Chains with Stationary Transition Probabilities, 2nd ed. Die Grundlehren der Mathematischen Wissenschaften, Band 104. Springer New York, Inc., New York. · Zbl 0146.38401
[9] CHVATAL, V. and SANKOFF, D. (1975). Longest common subsequences of two random sequences. J. Appl. Probab. 12 306-315. · Zbl 0313.60008 · doi:10.2307/3212444
[10] DANČÍK, V. (1994). Expected length of longest common subsequences Ph.D. thesis, University of Warwick. https://web.archive.org/web/19980125080727/http://www-hto.usc.edu/people/dancik/thesis/index.html. · Zbl 0941.68812
[11] FERRARI, P. A. and MARTIN, J. B. (2007). Stationary distributions of multi-type totally asymmetric exclusion processes. Ann. Probab. 35 807-832. · Zbl 1117.60089 · doi:10.1214/009117906000000944
[12] Geyer, C. J. (2011). Introduction to Markov chain Monte Carlo. In Handbook of Markov Chain Monte Carlo. Chapman & Hall/CRC Handb. Mod. Stat. Methods 3-48. CRC Press, Boca Raton, FL. · Zbl 1229.65014
[13] GRAHAM, R. L., KNUTH, D. E. and PATASHNIK, O. (1994). Concrete Mathematics: A Foundation for Computer Science, 2nd ed. Addison-Wesley Company, Reading, MA. · Zbl 0836.00001
[14] HOUDRÉ, C. and MATZINGER, H. (2016). Closeness to the diagonal for longest common subsequences in random words. Electron. Commun. Probab. 21 Paper No. 36, 19. · Zbl 1338.05004 · doi:10.1214/16-ECP4029
[15] KIWI, M., LOEBL, M. and MATOUŠEK, J. (2005). Expected length of the longest common subsequence for large alphabets. Adv. Math. 197 480-498. · Zbl 1087.68081 · doi:10.1016/j.aim.2004.10.012
[16] KLOECKNER, B. (2019). Effective Berry-Esseen and concentration bounds for Markov chains with a spectral gap. Ann. Appl. Probab. 29 1778-1807. · Zbl 1447.60129 · doi:10.1214/18-AAP1438
[17] LIU, Q. and HOUDRÉ, C. (2017). Simulations, computations, and statistics for longest common subsequences. 1705.06826v1.
[18] LUEKER, G. S. (2009). Improved bounds on the average length of longest common subsequences. J. ACM 56 Art. 17, 38. · Zbl 1325.68308 · doi:10.1145/1516512.1516519
[19] MATZINGER, H., LEMBER, J. and DURRINGER, C. (2007). Deviation from mean in sequence comparison with a periodic sequence. ALEA Lat. Am. J. Probab. Math. Stat. 3 1-29. · Zbl 1140.60044
[20] Norris, J. R. (1998). Markov Chains. Cambridge Series in Statistical and Probabilistic Mathematics 2. Cambridge Univ. Press, Cambridge. Reprint of 1997 original. · Zbl 0938.60058
[21] POPOV, S. YU. (2001). Frogs in random environment. J. Stat. Phys. 102 191-201. · Zbl 0972.60101 · doi:10.1023/A:1026516826875
[22] RANEY, G. N. (1960). Functional composition patterns and power series reversion. Trans. Amer. Math. Soc. 94 441-451. · Zbl 0131.01402 · doi:10.2307/1993433
[23] SCHIMD, M. and BILARDI, G. (2019). Bounds and estimates on the average edit distance. In String Processing and Information Retrieval. Lecture Notes in Computer Science 11811 91-106. Springer, Cham. · doi:10.1007/978-3-030-32686-9_7
[24] Spitzer, F. (1970). Interaction of Markov processes. Adv. Math. 5 246-290. · Zbl 0312.60060 · doi:10.1016/0001-8708(70)90034-4
[25] TISKIN, A. (2009). Periodic string comparison. In Combinatorial Pattern Matching. Lecture Notes in Computer Science 5577 193-206. Springer, Berlin · Zbl 1247.68338 · doi:10.1007/978-3-642-02441-2_18
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.