×

A formula for the mean length of the longest common subsequence. (English) Zbl 07325323

Summary: The expected value \(E\) of the longest common subsequence of letters in two random words is considered as a function of the \(\alpha=|A|\) of alphabet and of words lengths \(m\) and \(n\). It is assumed that each letter independently appears at any position with equal probability. A simple expression for \(E(\alpha, m, n)\) and its empirical proof are presented for fixed \(\alpha\) and \(m + n \). High accuracy of the formula in a wide range of values is confirmed by numerical simulations.

MSC:

65C05 Monte Carlo methods
68R15 Combinatorics on words
68W32 Algorithms on strings
68P10 Searching and sorting
PDFBibTeX XMLCite
Full Text: DOI MNR

References:

[1] V. Chvátal, D. Sankoff, “Longest common subsequences of two random sequences”, J. Appl. Prob., 12 (1975), 306-315 · Zbl 0313.60008 · doi:10.1017/S0021900200047999
[2] M. A. Kiwi, M. Loebl, J. Matousek, “Expected length of the longest common subsequence for large alphabets”, Advances in Mathematics, 197:2 (2005), 480-498 · Zbl 1087.68081 · doi:10.1016/j.aim.2004.10.012
[3] G. S. Lueker, “Improved bounds on the average length of longest common subsequences”, Journal of the ACM, 56:3 (2009), 17 · Zbl 1325.68308 · doi:10.1145/1516512.1516519
[4] R. Bundschuh, “High precision simulations of the longest common subsequence problem”, The European Physical Journal B - Condensed Matter and Complex Systems, 22:4 (2001), 533-541 · doi:10.1007/s100510170102
[5] J. Boutet de Monvel, “Extensive simulations for longest common subsequences”, The European Physical Journal B - Condensed Matter and Complex Systems, 7:2 (1999), 293-308 · doi:10.1007/s100510050616
[6] R. Baeza-Yates, G. Navarro, R. Gavaldá, R. Schehing, “Bounding the expected length of the longest common subsequences and forests”, Theory of Computing Systems, 32:4 (1999), 435-452 · Zbl 0934.68043 · doi:10.1007/s002240000125
[7] Kang Ning, Kwok Pui Choi, Systematic assessment of the expected length, variance and distribution of Longest Common Subsequences, 2013, arXiv:
[8] J. D. Dixon, Longest common subsequences in binary sequences, 2013, arXiv:
[9] S. V. Znamenskij, “A picture of common subsequence length for two random strings over an alphabet of 4 symbols”, Program systems: theory and applications, 7:1(28) (2016), 201-208
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.