##
**From regular expressions to deterministic automata.**
*(English)*
Zbl 0626.68043

The main theorem allows an elegant algorithm to be refined into an efficient one. The elegant algorithm for constructing a finite automaton from a regular expression is based on ‘derivatives of ’ regular expressions; the efficient algorithm is based on ‘marking of’ regular expressions.

Derivatives of regular expressions correspond to state transitions in finite automata. When a finite automaton makes a transition under input symbol a, a leading a is stripped from the remaining input. Correspondingly, if the input string is generated by a regular expression E, then the derivative of E by a generates the remaining input after a leading a is stripped. J. A. Brzozowski [J. Assoc. Comput. Mach. 11, 481-494 (1964; Zbl 0225.94044)] used derivatives to construct finite automata; the state for expression E has a transition under a to the state for the derivative of E by a. This approach extends to regular expressions with new operators, including intersection and complement; however, explicit computation of derivatives can be expensive.

Marking of regular expressions yields an expression with distinct input symbols. Following R. McNaughton and H. Yamada [IRE Trans. Electron. Comput. EC-9, 38-47 (1960; Zbl 0156.255)], we attach subscripts to each input symbol in an expression; \((ab+b)^*ba\) becomes \((a_ 1b_ 2+b_ 3)^*b_ 4a_ 5\). Conceptually, the efficient algorithm constructs an automaton for the marked expression. The marks on the transitions are then erased, resulting in a nondeterministic automaton for the original unmarked expression. This approach works for the usual operations of union, concatenation, and iteration; however, intersection and complement cannot be handled because marking and unmarking do not preserve the languages generated by regular expressions with these operators.

Derivatives of regular expressions correspond to state transitions in finite automata. When a finite automaton makes a transition under input symbol a, a leading a is stripped from the remaining input. Correspondingly, if the input string is generated by a regular expression E, then the derivative of E by a generates the remaining input after a leading a is stripped. J. A. Brzozowski [J. Assoc. Comput. Mach. 11, 481-494 (1964; Zbl 0225.94044)] used derivatives to construct finite automata; the state for expression E has a transition under a to the state for the derivative of E by a. This approach extends to regular expressions with new operators, including intersection and complement; however, explicit computation of derivatives can be expensive.

Marking of regular expressions yields an expression with distinct input symbols. Following R. McNaughton and H. Yamada [IRE Trans. Electron. Comput. EC-9, 38-47 (1960; Zbl 0156.255)], we attach subscripts to each input symbol in an expression; \((ab+b)^*ba\) becomes \((a_ 1b_ 2+b_ 3)^*b_ 4a_ 5\). Conceptually, the efficient algorithm constructs an automaton for the marked expression. The marks on the transitions are then erased, resulting in a nondeterministic automaton for the original unmarked expression. This approach works for the usual operations of union, concatenation, and iteration; however, intersection and complement cannot be handled because marking and unmarking do not preserve the languages generated by regular expressions with these operators.

### MSC:

68Q45 | Formal languages and automata |

### Keywords:

algorithm for constructing a finite automaton from a regular expression; efficient algorithm; Derivatives of regular expressions; marked expression### Software:

Esterel
PDF
BibTeX
XML
Cite

\textit{G. Berry} and \textit{R. Sethi}, Theor. Comput. Sci. 48, 117--126 (1986; Zbl 0626.68043)

### References:

[1] | Aho, A.V., Pattern matching in strings, (), 325-347 |

[2] | Aho, A.V.; Sethi, R.; Ullman, J.D., Compilers: principles, techniques, and tools, (1986), Addison-Wesley Reading, MA |

[3] | Berry, G.; Cosserat, L., The esterel synchronous programming language and its mathematical semantics, () · Zbl 0599.68023 |

[4] | Berry, G.; Couronne, P.; Gonthier, G., () |

[5] | Bochmann, G.V., Communication protocols and error recovery procedures, ACM operating systems review, 9, 3, 45-50, (1975) |

[6] | Brzozowski, J.A., Derivatives of regular expressions, J. ACM, 11, 4, 481-494, (1964) · Zbl 0225.94044 |

[7] | Clement, D.; Despeyroux, J.; Despeyroux, T.; Hascoet, L.; Kahn, G., Natural semantics on the computer, () |

[8] | D. Clement and G. Kahn, Personal communication, February 1986. |

[9] | Holzmann, G.J., A theory for protocol validation, IEEE trans. comput., C-31, 8, 730-738, (1982) |

[10] | Katzenelson, J.; Kurshan, R.P., S/R: A language for specifying protocols and other coordinating processes, (), 286-292 |

[11] | McNauthton, R.; Yamada, H., Regular expressions and state graphs for automata, IRE trans. on electronic comput., EC-9, 1, 38-47, (1960) |

[12] | Milner, R., A complete inference system for a class of regular behaviours, J. comput. system sci., 28, 439-466, (1984) · Zbl 0562.68065 |

[13] | Rabin, M.O.; Scott, D., Finite automata and their decision problems, IBM J. res. develop., 3, 2, 114-125, (1959) · Zbl 0158.25404 |

[14] | Thompson, K., Regular expression search algorithm, Comm. ACM, 11, 6, 419-422, (1968) · Zbl 0164.46205 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.