×

Matching strings in encoded sequences. (English) Zbl 1442.94028

Let \(\chi, \tilde\chi \) be alphabets. An encoder is a measurable function on strings \(f\colon \chi ^{\mathrm{N}} \to \tilde\chi^{\mathrm{N}}\). For any \(x,y \in \chi^{\mathrm{N}}\) the longest common substring between encoded strings is
\[ M_n^f(x,y) = \max \{ k:f(x)_i^{i + k - 1} = f(y)_j^{j + k - 1},\ 0 \le i,j \le n - k\}. \]
Here it is proved an almost sure convergence of \(M_n^f\). The result is applied to generalize some earlier results for stochastic scrabble, stochastic noise and for analysis of the behaviour of the shortest distance between observed orbits of a dynamical system.

MSC:

94A17 Measures of information, entropy
60F15 Strong limit theorems
28D20 Entropy and other invariants
37B10 Symbolic dynamics
37H10 Generation, random and stochastic difference and differential equations
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Abadi, M., Gallo, S. and Rada-Mora, E.A. (2018). The shortest possible return time of \(\beta \)-mixing processes. IEEE Trans. Inform. Theory 64 4895-4906. · Zbl 1401.60054
[2] Abadi, M. and Galves, A. (2001). Inequalities for the occurrence times of rare events in mixing processes. The state of the art. Markov Process. Related Fields 7 97-112. · Zbl 0973.60035
[3] Abadi, M. and Lambert, R. (2013). The distribution of the short-return function. Nonlinearity 26 1143-1162. · Zbl 1309.37014
[4] Abadi, M. and Lambert, R. (2019). From the divergence between two measures to the shortest path between two observables. Ergodic Theory Dynam. Systems 39 1729-1744. · Zbl 1417.37068
[5] Abadi, M. and Vaienti, S. (2008). Large deviations for short recurrence. Discrete Contin. Dyn. Syst. 21 729-747. · Zbl 1148.60010
[6] Abadi, M.N. and Cardeño, L. (2015). Rényi entropies and large deviations for the first match function. IEEE Trans. Inform. Theory 61 1629-1639. · Zbl 1359.94289
[7] Afraimovich, V., Chazottes, J.R. and Saussol, B. (2003). Pointwise dimensions for Poincaré recurrences associated with maps and special flows. Discrete Contin. Dyn. Syst. 9 263-280. · Zbl 1029.37007
[8] Arratia, R., Morris, P. and Waterman, M.S. (1988). Stochastic Scrabble: Large deviations for sequences with scores. J. Appl. Probab. 25 106-119. · Zbl 0645.60077
[9] Arratia, R. and Waterman, M.S. (1985). An Erdős-Rényi law with shifts. Adv. Math. 55 13-23. · Zbl 0586.60031
[10] Aytaç, H., Freitas, J.M. and Vaienti, S. (2015). Laws of rare events for deterministic and random dynamical systems. Trans. Amer. Math. Soc. 367 8229-8278. · Zbl 1353.37012
[11] Ayyer, A. and Stenlund, M. (2007). Exponential decay of correlations for randomly chosen hyperbolic toral automorphisms. Chaos 17 043116, 7. · Zbl 1163.37309
[12] Baladi, V. (2000). Positive Transfer Operators and Decay of Correlations. Advanced Series in Nonlinear Dynamics 16. River Edge, NJ: World Scientific Co., Inc. · Zbl 1012.37015
[13] Baladi, V. and Young, L.-S. (1993). On the spectra of randomly perturbed expanding maps. Comm. Math. Phys. 156 355-385. · Zbl 0809.60101
[14] Barros, V., Liao, L. and Rousseau, J. (2019). On the shortest distance between orbits and the longest common substring problem. Adv. Math. 344 311-339. · Zbl 1442.37060
[15] Boshernitzan, M.D. (1993). Quantitative recurrence results. Invent. Math. 113 617-631. · Zbl 0839.28008
[16] Collet, P., Galves, A. and Leonardi, F. (2008). Random perturbations of stochastic processes with unbounded variable length memory. Electron. J. Probab. 13 1345-1361. · Zbl 1206.62160
[17] Dembo, A., Karlin, S. and Zeitouni, O. (1994). Critical phenomena for sequence matching with scoring. Ann. Probab. 22 1993-2021. · Zbl 0834.60031
[18] Dembo, A., Karlin, S. and Zeitouni, O. (1994). Limit distribution of maximal non-aligned two-sequence segmental score. Ann. Probab. 22 2022-2039. · Zbl 0836.60023
[19] Dembo, A. and Kontoyiannis, I. (1999). The asymptotics of waiting times between stationary processes, allowing distortion. Ann. Appl. Probab. 9 413-429. · Zbl 0940.60033
[20] Ferrari, P.A. and Galves, A. (1997). Acoplamento e Processos Estocásticos. \(21^{\text{o}}\) Colóquio Brasileiro de Matemática. [21st Brazilian Mathematics Colloquium]. Rio de Janeiro: Instituto de Matemática Pura e Aplicada (IMPA).
[21] Garcia, N.L. and Moreira, L. (2015). Stochastically perturbed chains of variable memory. J. Stat. Phys. 159 1107-1126. · Zbl 1338.60106
[22] Grzegorek, P. and Kupsa, M. (2009). Return times in a process generated by a typical partition. Nonlinearity 22 371-379. · Zbl 1177.37015
[23] Haydn, N. and Vaienti, S. (2010). The Rényi entropy function and the large deviation of short return times. Ergodic Theory Dynam. Systems 30 159-179. · Zbl 1202.37009
[24] Haydn, N.T.A. (2013). Entry and return times distribution. Dyn. Syst. 28 333-353. · Zbl 1295.37003
[25] Kelbert, M. and Suhov, Y. (2013). Information Theory and Coding by Example. Cambridge: Cambridge Univ. Press. · Zbl 1281.94002
[26] Kontoyiannis, I., Algoet, P.H., Suhov, Yu.M. and Wyner, A.J. (1998). Nonparametric entropy estimation for stationary processes and random fields, with applications to English text. IEEE Trans. Inform. Theory 44 1319-1327. · Zbl 1026.94516
[27] Levin, D.A., Peres, Y. and Wilmer, E.L. (2009). Markov Chains and Mixing Times. Providence, RI: Amer. Math. Soc. · Zbl 1160.60001
[28] Łuczak, T. and Szpankowski, W. (1997). A suboptimal lossy data compression based on approximate pattern matching. IEEE Trans. Inform. Theory 43 1439-1451. · Zbl 0953.94012
[29] Månsson, M. (2000). On compound Poisson approximation for sequence matching. Combin. Probab. Comput. 9 529-548. · Zbl 0990.60015
[30] Marie, P. and Rousseau, J. (2011). Recurrence for random dynamical systems. Discrete Contin. Dyn. Syst. 30 1-16. · Zbl 1218.37071
[31] Neuhauser, C. (1996). A phase transition for the distribution of matching blocks. Combin. Probab. Comput. 5 139-159. · Zbl 0865.60027
[32] Ornstein, D.S. and Weiss, B. (1993). Entropy and data compression schemes. IEEE Trans. Inform. Theory 39 78-83. · Zbl 0764.94003
[33] Rousseau, J. Longest common substring for random subshifts of finite type. Preprint. Available at arXiv:1905.08131. · Zbl 1492.37016
[34] Rousseau, J. (2014). Hitting time statistics for observations of dynamical systems. Nonlinearity 27 2377-2392. · Zbl 1350.37060
[35] Rousseau, J. and Saussol, B. (2010). Poincaré recurrence for observations. Trans. Amer. Math. Soc. 362 5845-5859. · Zbl 1211.37024
[36] Saussol, B. (2006). Recurrence rate in rapidly mixing dynamical systems. Discrete Contin. Dyn. Syst. 15 259-267. · Zbl 1175.37006
[37] Saussol, B. (2009). An introduction to quantitative Poincaré recurrence in dynamical systems. Rev. Math. Phys. 21 949-979. · Zbl 1181.37012
[38] Saussol, B., Troubetzkoy, S. and Vaienti, S. (2002). Recurrence, dimensions, and Lyapunov exponents. J. Stat. Phys. 106 623-634. · Zbl 1138.37300
[39] Shields, P.C. (1996). The Ergodic Theory of Discrete Sample Paths. Graduate Studies in Mathematics 13. Providence, RI: Amer. Math. Soc. · Zbl 0879.28031
[40] Viana, M. (1997). Stochastic dynamics of deterministic systems. Brazilian Math. Colloquium, IMPA.
[41] Wyner, A.D. and Ziv, J. (1989). Some asymptotic properties of the entropy of a stationary ergodic data source with applications to data compression. IEEE Trans. Inform. Theory 35 1250-1258. · Zbl 0695.94003
[42] Wyner, A.J. (1999). More on recurrence and waiting times. Ann. Appl. Probab. 9 780-796. · Zbl 0955.60031
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.