# zbMATH — the first resource for mathematics

Counting colours in compressed strings. (English) Zbl 1339.68331
Giancarlo, Raffaele (ed.) et al., Combinatorial pattern matching. 22nd annual symposium, CPM 2011, Palermo, Italy, June 27–29, 2011. Proceedings. Berlin: Springer (ISBN 978-3-642-21457-8/pbk). Lecture Notes in Computer Science 6661, 197-207 (2011).
Summary: Motivated by the problem of counting unique visitors to a website, we consider how to preprocess a string $$s[1 \dots n]$$ such that later, given a substring’s endpoints, we can quickly count how many distinct characters that substring contains. The smallest reasonably fast previous data structure for this problem takes $$n \log \sigma + {\mathcal{O}\!\left( {n \log \log n} \right)}$$ bits and answers queries in $${\mathcal{O}\!\left( {\log n} \right)}$$ time. We give a data structure for this problem that takes $$n H_0 (s) + {\mathcal{O}\!\left( {n} \right)} + o (n H_0 (s))$$ bits, where $$H _{0} (s)$$ is the 0th-order empirical entropy of $$s$$, and answers queries in $${\mathcal{O}\!\left( {\log \ell} \right)}$$ time, where $$\ell$$ is the length of the query substring. As far as we know, this is the first data structure, where the query time depends only on $$\ell$$ and not on $$n$$. We also show how our data structure can be made partially dynamic.
For the entire collection see [Zbl 1216.68028].

##### MSC:
 68W32 Algorithms on strings 68P05 Data structures
PATRICIA
Full Text:
##### References:
 [1] Bozanis, P., Kitsios, N., Makris, C., Tsakalidis, A.K.: New upper bounds for generalized intersection searching problems. In: Proceedings of the 22nd International Colloquium on Automata, Language and Programming (ICALP), pp. 464–474 (1995) · doi:10.1007/3-540-60084-1_97 [2] Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Transactions on Algorithms 3(2) (2007) · Zbl 1321.68263 · doi:10.1145/1240233.1240243 [3] Gagie, T., Navarro, G., Puglisi, S.J.: Colored range queries and document retrieval. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 67–81. Springer, Heidelberg (2010) · Zbl 05803982 · doi:10.1007/978-3-642-16321-0_7 [4] González, R., Navarro, G.: Rank/select on dynamic compressed sequences and applications. Theoretical Computer Science 410(43), 4414–4422 (2009) · Zbl 1194.68103 · doi:10.1016/j.tcs.2009.07.022 [5] Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proceedings of the 14th Symposium on Discrete Algorithms (SODA), pp. 636–645 (2003) · Zbl 1092.68584 [6] Kaplan, H., Rubin, N., Sharir, M., Verbin, E.: Efficient colored orthogonal range counting. SIAM Journal on Computing 38(3), 982–1011 (2008) · Zbl 1187.68172 · doi:10.1137/070684483 [7] Lai, Y.K., Poon, C.K., Shi, B.: Approximate colored range and pointer enclosure queries. Journal of Discrete Algorithms 6(3), 420–432 (2008) · Zbl 1160.68352 · doi:10.1016/j.jda.2007.10.001 [8] Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 45–56. Springer, Heidelberg (2005) · Zbl 1131.68431 · doi:10.1007/11496656_5 [9] Mäkinen, V., Navarro, G.: Implicit compression boosting with applications to self-indexing. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 229–241. Springer, Heidelberg (2007) · Zbl 05344726 · doi:10.1007/978-3-540-75530-2_21 [10] Mäkinen, V., Navarro, G.: Rank and select revisited and extended. Theoretical Computer Science 387(3), 332–347 (2007) · Zbl 1144.68023 · doi:10.1016/j.tcs.2007.07.013 [11] Morrison, D.R.: PATRICIA – practical algorithm to retrieve information coded in alphanumeric. Journal of the ACM 15(4) (1968) · doi:10.1145/321479.321481 [12] Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: Proceedings of the 13th Symposium on Discrete Algorithms (SODA), pp. 657–666 (2002) · Zbl 1093.68588 [13] Patrascu, M.: Predecessor search. In: Kao, M.-Y. (ed.) Encyclopedia of Algorithms. Springer, Heidelberg (2008) [14] Sadakane, K.: Succinct data structures for flexible text retrieval systems. Journal of Discrete Algorithms 5(1), 12–22 (2007) · Zbl 1137.68360 · doi:10.1016/j.jda.2006.03.011
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.