×

On the structure of consistent partitions of substring set of a word. (English) Zbl 1248.68176

Deng, Xiaotie (ed.) et al., Frontiers in algorithmics. Third international workshop, FAW 2009, Hefei, China, June 20–23, 2009. Proceedings. Berlin: Springer (ISBN 978-3-642-02269-2/pbk). Lecture Notes in Computer Science 5598, 326-335 (2009).
Summary: DAWG is a key data structure for string matching and it is widely used in bioinformatics and data compression. But DAWGs are memory greedy. Weighted directed word graph (WDWG) is a space-economical variation of DAWG which is as powerful as DAWG. The underlying concept of WDWGs is a new equivalence relation of the substrings of a word, namely the minimal consistent linear partition. However, the structure of the consistent linear partition is not extensively explored. In this paper, we present a theorem that gives insight into the structure of consistent partitions. Through this theorem, one can enumerate all the consistent linear partitions and verify whether a linear partition is consistent. It also demonstrates how to merge the DAWG into a consistent partition. In the end, we give a simple and easy-to-construct class of consistent partitions based on lexicographic order.
For the entire collection see [Zbl 1166.68003].

MSC:

68P05 Data structures
68R15 Combinatorics on words
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Allauzen, C., Crochemore, M., Raffinot, M.: Efficient Experimental String Matching by Weak Factor Recognition. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 51–72. Springer, Heidelberg (2001) · Zbl 0992.68501 · doi:10.1007/3-540-48194-X_5
[2] Blumer, A., Blumer, J., Haussler, D., Ehrenfeucht, A., Chen, M.T., Seiferas, J.: The smallest automation recognizing the subwords of a text. Theoretical Computer Science 40, 31–55 (1985) · Zbl 0574.68070 · doi:10.1016/0304-3975(85)90157-4
[3] Blumer, A., Blumer, J., Haussler, D., McConnell, R., Ehrenfeucht, A.: Complete inverted files for effcient text retrieval and analysis. Journal of the ACM 34(3), 578–595 (1987) · Zbl 0554.68058 · doi:10.1145/28869.28873
[4] Crochemore, M.: Transducers and repetitions. Theoretical Computer Science 45, 63–86 (1986) · Zbl 0615.68053 · doi:10.1016/0304-3975(86)90041-1
[5] Crochemore, M., Czumaj, A., Gasieniec, L., Lecroq, T., Plandowski, W., Rytter, W.: Fast Practical Multi-Pattern Matching. Inf. Process. Lett. 71(3-4), 107–113 (1999) · Zbl 0999.68246 · doi:10.1016/S0020-0190(99)00092-7
[6] Crochemore, M., Ilie, L., Seid-Hilmi, E.: The Structure of Factor Oracles. Int. J. Found. Comput. Sci 18(4), 781–797 (2007) · Zbl 1142.68330 · doi:10.1142/S0129054107004978
[7] Charras, C., Lecroq, T.: Exact string matching algorithms (1997), http://www-igm.univ-mlv.fr/ lecroq/string/ · Zbl 1230.68001
[8] Gusfield, D.: Algorithms on Strings Trees and Sequences. Cambridge UniversityPress, New York (1997) · Zbl 0934.68103 · doi:10.1017/CBO9780511574931
[9] Grossi, R., Vitter, J.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: Proceedings of the 32nd ACM Symposium on Theory of Computing (2000) · Zbl 1296.68035 · doi:10.1145/335305.335351
[10] Inenaga, S., Shinohara, A., Takeda, M., Arikawa, S.: Compact Directed Acyclic Word Graphs for a Sliding Window. Journal of Discrete Algorithms 2(1), 33–51 (2004); Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476. Springer, Heidelberg (2002) · Zbl 1118.68755 · doi:10.1016/S1570-8667(03)00064-9
[11] Miyamoto, S., Inenaga, S., Takeda, M., Shinohara, A.: Ternary Directed Acyclic Word Graphs. Theoretical Compututer Science 328(1-2), 97–111 (2004); H. Ibarra, O., Dang, Z. (eds.) CIAA 2003. LNCS, vol. 2759, pp. 120–130. Springer, Heidelberg (2003) · Zbl 1071.68048 · doi:10.1016/j.tcs.2004.07.008
[12] Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22, 935–948 (1993) · Zbl 0784.68027 · doi:10.1137/0222058
[13] Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995) · Zbl 0831.68027 · doi:10.1007/BF01206331
[14] Weiner, P.: Linear pattern matching algorithm. In: Proceedings of the 14th Annual IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973) · doi:10.1109/SWAT.1973.13
[15] Zhang, M., Ju, J.: Space-economical reassembly for intrusion detection system. In: Qing, S., Gollmann, D., Zhou, J. (eds.) ICICS 2003. LNCS, vol. 2836, pp. 393–404. Springer, Heidelberg (2003) · doi:10.1007/978-3-540-39927-8_36
[16] Zhang, M., Tang, J., Guo, D., Hu, L., Li, Q.: Succinct Text Indexes on Large Alphabet. In: Cai, J.-Y., Cooper, S.B., Li, A. (eds.) TAMC 2006. LNCS, vol. 3959, pp. 528–537. Springer, Heidelberg (2006) · Zbl 1178.68180 · doi:10.1007/11750321_50
[17] Zhang, M., Hu, L., Li, Q., Ju, J.: Weighted Directed Word Graph. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 156–167. Springer, Heidelberg (2005) · Zbl 1130.68317 · doi:10.1007/11496656_14
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.