zbMATH — the first resource for mathematics

On building minimal automaton for subset matching queries. (English) Zbl 1379.68371
Summary: We address the problem of building an index for a set \(D\) of \(n\) strings, where each string location is a subset of some finite integer alphabet of size \(\sigma\), so that we can answer efficiently if a given simple query string (where each string location is a single symbol) \(p\) occurs in the set. That is, we need to efficiently find a string \(d\in D\) such that \(p[i]\in d[i]\) for every \(i\). We show how to build such index in \(O(n^{\log_{\sigma/\Delta}(\sigma)}\log(n))\) average time, where \(\Delta\) is the average size of the subsets. Our methods have applications e.g. in computational biology (haplotype inference) and music information retrieval.
68W32 Algorithms on strings
68P20 Information storage and retrieval of data
68Q45 Formal languages and automata
Full Text: DOI
[1] Badr, G.H.; Oommen, B.J., Self-adjusting of ternary search tries using conditional rotations and randomized heuristics, Comput. J., 48, 2, 200-219, (2005)
[2] Cambouropoulos, E.; Crochemore, M.; Iliopoulos, C.S.; Mouchard, L.; Pinzon, Y.J., Algorithms for computing approximate repetitions in musical sequences, Int. J. comput. math., 79, 11, 1135-1148, (2002) · Zbl 1008.68043
[3] Cole, R.; Gottliebd, L.; Lewenstein, M., Dictionary matching and indexing with errors and don’t cares, (), 91-100 · Zbl 1192.68818
[4] Daciuk, J., Comparison of construction algorithms for minimal, acyclic, deterministic, finite-state automata from sets of strings, (), 255-261 · Zbl 1033.68575
[5] Daciuk, J.; Maurel, D.; Savary, A., Incremental and semi-incremental construction of pseudo-minimal automata, (), 341-342 · Zbl 1172.68500
[6] Daciuk, J.; Mihov, S.; Watson, B.W.; Watson, R., Incremental construction of minimal acyclic finite state automata, Comput. linguist., 26, 1, 3-16, (2000) · Zbl 1232.68081
[7] Fredkin, E., Trie memory, Commun. ACM, 3, 9, 490-499, (1960)
[8] Fredriksson, K.; Mäkinen, V.; Navarro, G., Flexible music retrieval in sublinear time, Internat. J. found. comput. sci. (IJFCS), 17, 6, 1345-1364, (2006) · Zbl 1169.68390
[9] Gusfield, D., Haplotype inference by pure parsimony, (), 144-155
[10] Landau, G.; Tsur, D.; Weimann, O., Indexing a dictionary for subset matching queries, (), 195-204
[11] Maurel, D., Pseudo-minimal transducer, Theoret. comput. sci., 231, 1, 129-139, (2000) · Zbl 0951.68063
[12] Morrison, D.R., Patricia—practical algorithm to retrieve information coded in alphanumeric, J. ACM, 15, 4, 514-534, (1968)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.