A sequential recursive implementation of dead-zone single keyword pattern matching. (English) Zbl 1293.68315

Arumugam, S. (ed.) et al., Combinatorial algorithms. 23rd international workshop, IWOCA 2012, Tamil Nadu, India, July 19–21, 2012. Revised selected papers. Berlin: Springer (ISBN 978-3-642-35925-5/pbk). Lecture Notes in Computer Science 7643, 236-248 (2012).
Summary: Earlier publications provided an abstract specification of a family of single keyword pattern matching algorithms which search unexamined portions of the text in a divide-and-conquer fashion, generating dead-zones in the text as they progress. These dead zones are area of text that require no further examination. Here the results are described of implementing in C++ a sequential recursive version of the algorithm family, where all instances of a single keyword \(p\) in a text \(S\) are sought – the online keyword matching problem where \(S\) may not be precomputed.{
}We show that each step may involve a window shift of up to \(2 \times |p|-1\) characters – almost twice as much (and therefore potentially almost twice as fast) as the maximum of \(|p|\) characters possible with the Boyer-Moore family of algorithms. Our counterintuitive improvement over Boyer-Moore algorithms is achieved by simultaneously shifting left and right. Ongoing benchmarking shows that such bidirectional shifts are highly efficient – and we make specific comparisons here to Horspool’s algorithm, regarded as one of the most efficient algorithms of the Boyer-Moore family.
For the entire collection see [Zbl 1278.68025].


68W32 Algorithms on strings


Full Text: DOI