×

From regular expressions to DFA’s using compressed NFA’s. (English) Zbl 0912.68105

Summary: There are two principal methods for turning regular expressions into NFA’s – one due to McNaughton and Yamada and another due to Thompson. Unfortunately, both have drawbacks. Given a regular expression \(R\) of length \(r\) and with \(s\) occurrences of alphabet symbols, the author [Lect. Notes Comput. Sci. 644, 88-108 (1992)] and A. Brüggemann-Klein [Regular expressions into finite automata, Theoret. Comput. Sci. 120, 197-213 (1993)] gave \(\Theta(m+r)\) time and \(O(r)\) space algorithms to produce a \(\Theta(m)\) space representation of McNaughton and Yamada’s NFA with \(s+1\) states and \(m\) transitions. The problem with this NFA is that \(m=\Theta(s^{2})\) in the worst case. Thompson’s method takes \(\Theta(r)\) time and space to construct a \(\Theta(r)\) space NFA with \(\Theta(r)\) states and \(\Theta(r)\) transitions. The problem with this NFA is that \(r\) can be arbitrarily larger than \(s\). We overcome drawbacks of both methods with a \(\Theta(r)\) time \(\Theta(s)\) space algorithm to construct an \(O(s)\) space representation of McNaughton and Yamada’s NFA. Given any set \(V\) of NFA states, our representation can be used to compute the set \(U\) of states one transition away from the states in V in optimal time \(O(| V| +| U|)\). McNaughton and Yamada’s NFA requires \(\Theta(| V|\times| U|)\) in the worst case. Using Thompson’s NFA, the equivalent calculation requires \(\Theta(r)\) time in the worst case. Comparative benchmarks show that an implementation of our method outperforms implementations of competing methods with respect to time for NFA construction, NFA accepting testing, and NFA-to-DFA conversion by subset construction. Throughout this paper program transformations are used to design algorithms and derive programs. A transformation of special importance is a form of finite differencing used previously by Douglas Smith to improve the efficiency of functional programs.

MSC:

68Q45 Formal languages and automata

Software:

Esterel
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Aho, A., Pattern matching in strings, (Book, R. V., Formal Language Theory (1980), Academic Press: Academic Press New York)
[2] Aho, A.; Hopcroft, J.; Ullman, J., Design and Analysis of Computer Algorithms (1974), Addison-Wesley: Addison-Wesley Reading, MA
[3] Aho, A.; Sethi, R.; Ullman, J., Compilers Principles, Techniques, and Tools (1986), Addison-Wesley: Addison-Wesley Reading, MA
[4] Berry, G.; Cosserat, L., The esterel synchronous programming language and its mathematical semantics, (Brookes, S. D.; Roscoe, A. W.; Winskel, G., Seminar in Concurrency. Seminar in Concurrency, Lecture Notes in Computer Science, Vol. 197 (1985), Springer: Springer Berlin) · Zbl 0599.68023
[5] Berry, G.; Sethi, R., From regular expressions to deterministic automata, Theoret. Comput. Sci., 48, 117-126 (1986) · Zbl 0626.68043
[6] Brüggemann-Klein, A., Regular expressions into finite automata, Theoret. Comput. Sci., 120, 197-213 (1993) · Zbl 0811.68096
[7] Brzozowski, J., Derivatives of regular expressions, J. ACM, 11, 4, 481-494 (1964) · Zbl 0225.94044
[8] Cai, J.; Paige, R., Using multiset discrimination to solve language processing problems without hashing, Theoret. Comput. Sci., 145, 189-228 (1995) · Zbl 0874.68165
[9] Chang, C., From regular expressions to DFA’s using compressed NFA’s, (Ph. D. Thesis (1992), New York University: New York University New York) · Zbl 0912.68105
[10] Chang, C.; Paige, R., From regular expressions to DFA’s using compressed NFA’s, (Apostolico, A.; Crochemore, M.; Galil, Z.; Manber, U., Lecture Notes in Computer Science, Vol. 644 (1992), Springer: Springer Berlin), 88-108 · Zbl 0912.68105
[11] Driscoll, J.; Sarnak, N.; Sleator, D.; Tarjan, R., Making data structures persistent, (Proc. 8th ACM STOC (1986)), 109-121
[12] Emerson, E.; Lei, C., Model checking in the propositional mu-calculus, (Proc. IEEE Conf. on Logic in Computer Science (1986)), 86-106
[13] Hopcroft, J.; Ullman, J., Formal Languages and Their Relation to Automata (1969), Addison-Wesley: Addison-Wesley Reading, MA · Zbl 0196.01701
[14] Kleene, S., Representation of events in nerve nets and finite automata, (Automata Studies, Ann. Math. Studies, Vol. 34 (1956), Princeton Univ. Press: Princeton Univ. Press Princeton, NJ), 3-41
[15] Knuth, D., On the translation of languages from left to right, Inform. and Control, 8, 6, 607-639 (1965) · Zbl 0231.68027
[16] McNaughton, R.; Yamada, H., Regular expressions and state graphs for automata, IRA Trans. Electron. Comput., EC-9, 39-47 (1960) · Zbl 0156.25501
[17] Myhill, J., Finite automata and representation of events, WADC, Tech. Rep., 57-624 (1957)
[18] Nerode, A., Linear automaton transformations, (Proc. Amer. Math. Soc., 9 (1958)), 541-544 · Zbl 0089.33403
[19] Rabin, M.; Scott, D., Finite automata and their decision problems, IBM J. Res. Develop., 3, 114-125 (1959) · Zbl 0158.25404
[20] Ritchie, D.; Thompson, K., The UNIX time-sharing system, Comm. ACM, 17, 7, 365-375 (1974)
[22] Smith, D., KIDS — A knowledge-based software development system, (Proc. Workshop on Automating Software Design. Proc. Workshop on Automating Software Design, AAAI-88 (1988))
[23] (Programmer’s Manual. Programmer’s Manual, SunOS Reference Manual, Vol. II (1989), SUN microsystems)
[24] Thompson, K., Regular expression search algorithm, Comm. ACM, 11, 6, 419-422 (1968) · Zbl 0164.46205
[25] Ullman, J., Computational Aspects of VLSI (1984), Computer Science Press: Computer Science Press Rockville, MD · Zbl 0539.68021
[26] Winskel, G., The Formal Semantics of Programming Languages (1993), MIT Press: MIT Press Cambridge, MA · Zbl 0919.68082
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.