Deterministic sampling - A new technique for fast pattern matching. (English) Zbl 0716.68074

Summary: Consider the string matching problem. Given the pattern, a sample of its positions is carefully selected whose size is at most logarithmic (the deterministic sample). Then, the sample is searched for. For nonperiodic patterns, the sample has the following perhaps surprising property. It is possible to disqualify all occurrences of the sample positions but one, within each “neighborhood” of locations in the text, without any further comparisons of characters. This provides sparse verification.
This approach enables the text analysis (stages “search for sample” and “verify”) to be performed in \(O(\log^*n)\) time and optimal speedup on a PRAM. This improves on the previous fastest optimal speedup result. It also leads to a new serial algorithm for string matching that runs in linear time including preprocessing.
The approach is expected to be applicable for pragmatic pattern recognition problems. In some sense the algorithms are based on degenerate forms of computation, such as AND and OR of a large number of bits. However, traditional machine designs do not take advantage of such degeneracies, and usual complexity measures do not even enable them to be reflected. This leads to the conclusion of the paper with some speculative thoughts on desirable capabilities that would enhance computing machinery for some pattern recognition applications.


68T10 Pattern recognition, speech recognition
68W10 Parallel algorithms in computer science
68Q10 Modes of computation (nondeterministic, parallel, interactive, probabilistic, etc.)
68P99 Theory of data
Full Text: DOI