zbMATH — the first resource for mathematics

Compressed matching in dictionaries. (English) Zbl 07042098
Summary: The problem of compressed pattern matching, which has recently been treated in many papers dealing with free text, is extended to structured files, specifically to dictionaries, which appear in any full-text retrieval system. The prefix-omission method is combined with Huffman coding and a new variant based on Fibonacci codes is presented. Experimental results suggest that the new methods are often preferable to earlier ones, in particular for small files which are typical for dictionaries, since these are usually kept in small chunks.

68P30 Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science)
68P20 Information storage and retrieval of data
68W32 Algorithms on strings
Full Text: DOI
[1] Amir, A.; Benson, G.; Efficient two-dimensional compressed matching; ; ,279-288.
[2] Klein, S.T.; Shapira, D.; Compressed matching in dictionaries; 2002; ,142-151. · Zbl 07042098
[3] Klein, S.T.; Shapira, D.; Pattern matching in Huffman encoded texts; Inform. Process. Manage.: 2005; Volume 41 ,829-841. · Zbl 1101.68814
[4] Amir, A.; Benson, G.; Farach, M.; Let sleeping files lie: Pattern matching in Z-compressed files; J. Comput. Syst. Sci.: 1996; Volume 52 ,299-307. · Zbl 1152.68436
[5] Farach, M.; Thorup, M.; String matching in Lempel-Ziv compressed strings; 1995; ,703-712. · Zbl 0978.68520
[6] Navarro, G.; Raffinot, M.; A general practical approach to pattern matching over Ziv-Lempel compressed text; Lect. Notes Comput. Sci.: 1999; Volume 1645 ,14-36. · Zbl 1063.68621
[7] Klein, S.T.; Shapira, D.; A new compression method for compressed matching; ; ,400-409.
[8] Manber, U.; A text compression scheme that allows fast searching directly in the compressed file; ACM Trans. Inform. Syst.: 1997; Volume 15 ,124-136.
[9] Navarro, G.; Kida, T.; Takeda, M.; Shinohara, A.; Arikawa, S.; Faster approximate string matching over compressed text; ; ,459-468.
[10] Ng, W.K.; Ravishankar, C.V.; Block-oriented compression techniques for large statistical Databases; IEEE Trans. Knowl. Data. Eng.: 1997; Volume 9 ,314-328.
[11] Levene, M.; Wood, P.; ; XML Structure Compression: London, UK 2002; .
[12] Liefke, H.; Suciu, D.; XMill: An efficient compressor for XML data; ; ,153-164.
[13] Cheney, J.; Compressing XML with multiplexed hierarchical PPM models; ; ,163-172.
[14] Tolani, P.M.; Haritsa, J.R.; XGRIND: A query-friendly XML compressor; ; ,225-234.
[15] XMLSolutions, XMLZip; ; .
[16] Buchsbaum, A.L.; Caldwell, D.F.; Church, K.W.; Fowler, G.S.; Muthukrishnan, S.; Engineering the compression of massive tables: An experimental approach; ; ,175-184. · Zbl 0956.68508
[17] Buchsbaum, A.L.; Fowler, G.S.; Giancarlo, R.; Improving compression with combinatorial optimization; J. ACM: 2003; Volume 50 ,825-851. · Zbl 1325.68087
[18] Choueka, Y.; Responsa: A full-text retrieval system with linguistic processing for a 65-million word corpus of Jewish heritage in Hebrew; IEEE Data Eng. Bull.: 1989; Volume 14 ,22-31.
[19] Bookstein, A.; Klein, S.T.; Ziff, D.A.; A systematic approach to compressing a full text retrieval system; Inform. Process. Manage.: 1992; Volume 28 ,795-806.
[20] Bratley, P.; Choueka, Y.; Processing Truncated Terms in Document Retrieval Systems; Inform. Process. Manage.: 1982; Volume 18 ,257-266.
[21] Klein, S.T.; Skeleton trees for efficient decoding of Huffman encoded texts; Inf. Retr.: 2000; Volume 3 ,7-23.
[22] Brisaboa, N.R.; Fariña, A.; Navarro, G.; Esteller, M.F.; (S,C)-dense coding: An optimized compression code for natural language text databases; Lect. Notes Comput. Sci.: 2003; Volume 2857 ,122-136. · Zbl 1254.68119
[23] Klein, S.T.; Kopel Ben-Nissan, M.; On the usefulness of fibonacci compression codes; Comput. J.: 2010; Volume 53 ,701-716.
[24] Fraenkel, A.S; Klein, S.T.; Robust universal complete codes for transmission and compression; Discrete. Appl. Math.: 1996; Volume 64 ,31-55. · Zbl 0874.94026
[25] Ristov, S.; Laporte, E.; Ziv Lempel compression of huge natural language data tries using suffix arrays; Lect. Notes Comput. Sci.: 1999; Volume 1645 ,196-211. · Zbl 1063.68579
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.