×

Approximating language edit distance beyond fast matrix multiplication: ultralinear grammars are where parsing becomes hard! (English) Zbl 1441.68114

Chatzigiannakis, Ioannis (ed.) et al., 44th international colloquium on automata, languages, and programming, ICALP 2017, Warsaw, Poland July 10–14, 2017. Proceedings. Wadern: Schloss Dagstuhl – Leibniz Zentrum für Informatik. LIPIcs – Leibniz Int. Proc. Inform. 80, Article 19, 15 p. (2017).
Summary: In [J. Comput. Syst. Sci. 10, 308–315 (1975; Zbl 0312.68042)], a breakthrough result of L. G. Valiant showed that parsing context free grammars can be reduced to Boolean matrix multiplication, resulting in a running time of \(O(n^\omega)\) for parsing where \(\omega\le 2.373\) is the exponent of fast matrix multiplication, and \(n\) is the string length. Recently, A. Abboud et al. [SIAM J. Comput. 47, No. 6, 2527–2555 (2018; Zbl 1412.68094)] demonstrated that this is likely optimal; moreover, a combinatorial \(o(n^3)\) algorithm is unlikely to exist for the general parsing problem. The language edit distance problem is a significant generalization of the parsing problem, which computes the minimum edit distance of a given string (using insertions, deletions, and substitutions) to any valid string in the language, and has received significant attention both in theory and practice since the seminal work of A. V. Aho and T. G. Peterson [SIAM J. Comput. 1, 305–312 (1972; Zbl 0241.68038)]. Clearly, the lower bound for parsing rules out any algorithm running in \(o(n^\omega)\) time that can return a nontrivial multiplicative approximation of the language edit distance problem. Furthermore, combinatorial algorithms with cubic running time or algorithms that use fast matrix multiplication are often not desirable in practice.
To break this \(n^\omega\) hardness barrier, in this paper we study additive approximation algorithms for language edit distance. We provide two explicit combinatorial algorithms to obtain a string with minimum edit distance with performance dependencies on either the number of nonlinear productions, \(k^*\), or the number of nested nonlinear production, \(k\), used in the optimal derivation. Explicitly, we give an additive \(O(k^*\gamma)\) approximation in time \(O(|G|(n^2+\frac{n^3}{\gamma^3}))\) and an additive \(O(k\gamma)\) approximation in time \(O(|G|(n^2+\frac{n^3}{\gamma^2}))\), where \(|G|\) is the grammar size and \(n\) is the string length. In particular, we obtain tight approximations for an important subclass of context free grammars known as ultralinear grammars, for which \(k\) and \(k^*\) are naturally bounded. Interestingly, we show that the same conditional lower bound for parsing context free grammars holds for the class of ultralinear grammars as well, clearly marking the boundary where parsing becomes hard!
For the entire collection see [Zbl 1373.68018].

MSC:

68Q42 Grammars and rewriting systems
68Q17 Computational difficulty of problems (lower bounds, completeness, difficulty of approximation, etc.)
68W25 Approximation algorithms
68W32 Algorithms on strings
68W40 Analysis of algorithms
PDFBibTeX XMLCite
Full Text: DOI