×

A faster grammar-based self-index. (English) Zbl 1351.68089

Dediu, Adrian-Horia (ed.) et al., Language and automata theory and applications. 6th international conference, LATA 2012, A Coruña, Spain, March 5–9, 2012. Proceedings. Berlin: Springer (ISBN 978-3-642-28331-4/pbk). Lecture Notes in Computer Science 7183, 240-251 (2012).
Summary: To store and search genomic databases efficiently, researchers have recently started building compressed self-indexes based on straight-line programs and LZ77. In this paper we show how, given a balanced straight-line program for a string \(S[1..n]\) whose LZ77 parse consists of \(z\) phrases, we can add \(\mathcal{O}(z \log \log z)\) words and obtain a compressed self-index for \(S\) such that, given a pattern \(P [1..m]\), we can list the occ occurrences of \(P\) in \(S\) in \(\mathcal{O}({m^{2} + (m + \mathrm{occ}) \log \log n})\) time. All previous self-indexes are either larger or slower in the worst case.
For the entire collection see [Zbl 1235.68026].

MSC:

68P15 Database theory
68P05 Data structures
68P30 Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science)
68W32 Algorithms on strings
PDFBibTeX XMLCite
Full Text: DOI arXiv