×

zbMATH — the first resource for mathematics

How to squeeze a lexicon. (English) Zbl 0987.68782
Summary: Minimal acyclic deterministic finite automata (ADFAs) can be used as a compact representation of finite string sets with fast access time. Creating them with traditional algorithms of DFA minimization is resource greedy when a large collection of strings is involved. This paper aims to popularize an efficient but little-known algorithm for creating minimal ADFAs recognizing a finite language, invented independently by several authors. The algorithm is presented for three variants of ADFAs, its minor improvements are discussed, and minimal ADFAs are compared to competitive data structures.

MSC:
68U99 Computing methodologies and applications
68Q45 Formal languages and automata
68P05 Data structures
68T10 Pattern recognition, speech recognition
68T50 Natural language processing
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Dictionnaires et lexiques?méthodes et algorithmes. PhD Thesis, Université Paris VII, 1991. http://www-igm.univ-mlv.fr/?dr/thdr/.
[2] Czech, Theoretical Computer Science 182 pp 1– (1997) · Zbl 0954.68060 · doi:10.1016/S0304-3975(96)00146-6
[3] International Ispell. http://www.cs.hmc.edu/?geoff/ispell.html.
[4] McIlroy, IEEE Transactions on Communications COM-30 pp 91– (1982) · doi:10.1109/TCOM.1982.1095395
[5] Sorting and Searching (The Art of Computer Programming, vol. 3). Addison-Wesley: Reading, MA, 1998.
[6] Al-Suwaiyel, ACM Transactions on Database Systems 9 pp 243– (1984) · Zbl 0542.68079 · doi:10.1145/329.295
[7] Maly, Communications of the ACM 19 pp 409– (1976) · Zbl 0329.68037 · doi:10.1145/360248.360258
[8] Darragh, Software?Practice and Experience 23 pp 277– (1993) · doi:10.1002/spe.4380230305
[9] Blumer, Journal of the ACM 34 pp 578– (1987) · doi:10.1145/28869.28873
[10] Compilers: Principles, Techniques and Tools. Addison-Wesley: Reading, MA, 1985.
[11] Private communication, November 2000.
[12] TeX: The Program (Computers & Typesetting, vol. C). Addison-Wesley: Reading, MA, 1986.
[13] Lexical tree compression. Proceedings of the Second European Conference on Speech Communication and Technology, Genoa, 1991; 581-584.
[14] Appel, Communications of the ACM 31 pp 572– (1988) · doi:10.1145/42411.42420
[15] Gordon, Software?Practice and Experience 24 pp 219– (1994) · doi:10.1002/spe.4380240205
[16] Lucchesi, Software?Practice and Experience 23 pp 15– (1993) · doi:10.1002/spe.4380230103
[17] Park, International Journal of Computational Mathematics 54 pp 155– (1994) · Zbl 0839.68027 · doi:10.1080/00207169408804348
[18] Sgarbas, International Journal on Artificial Intelligence Tools 4 pp 369– (1995) · doi:10.1142/S0218213095000188
[19] Daciuk, Computational Linguistics 26 pp 3– (2000) · Zbl 1232.68081 · doi:10.1162/089120100561601
[20] Taxonomies and toolkits of regular language algorithms. PhD Thesis, Technische Universiteit Eindhoven, 1995. · Zbl 0832.68064
[21] Revuz, Theoretical Computer Science 92 pp 181– (1992) · Zbl 0759.68066 · doi:10.1016/0304-3975(92)90142-3
[22] Watson, Lecture Notes in Computer Science 1660 pp 91– (1998)
[23] Incremental construction of finite-state automata and transducers, and their use in the natural language processing. PhD Thesis, Politechnika Gda?ska, 1998. http://www.pg.gda.pl/?jandac/thesis.ps.gz.
[24] Mihov, Annuaire de l’Université de Sofia 91 pp 33– (1997)
[25] Mihov, Annuaire de l’Université de Sofia 92 (1998)
[26] Introduction to Automata Theory, Languages, and Computation. Addison-Wesley: Reading, MA, 1979. · Zbl 0426.68001
[27] Watson, Lecture Notes in Computer Science 1436 pp 232– (1997) · doi:10.1007/BFb0031396
[28] Lucchesi, Journal of the Brazilian Computer Society pp 5– (1995)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.