zbMATH — the first resource for mathematics

On the hardness of learning queries from tree structured data. (English) Zbl 1321.90144
Summary: The problem of learning queries from tree structured data is studied by this paper. A tree structured data is modeled as a node-labeled tree \(T\), and applying a query \(q\) on \(T\) will return a set \(q(T)\) which is a subset of nodes in \(T\). For a tree-node pair \((T,t)\) where \(t\) is a node in \(T\), \(q\) is called to accept the pair if \(t\in {q(T)}\), and reject the pair if \(t\notin {q(T)}\). For some query class \(\mathcal{L}\), given tree-node pair sets \(E_p\) and \(E_n\), the tree query learning problem is to find a query \(q\in \mathcal{L}\) such that (1) \(q\) rejects all pairs in \(E_n\), and (2) the size of pairs in \(E_p\) accepted by \(q\) is maximized. On four different query classes \(\mathcal Q^{/}\), \(\mathcal Q^{/,*}\), \(\mathcal Q^{/,//}\) and \(\mathcal Q^{/,[]}\), this paper studies the hardness of the corresponding tree query learning problems. For \(\mathcal Q^{/}\), a PTIME algorithm is given. For \(\mathcal Q^{,*}\) and \(\mathcal Q^{/,//}\), the NP-complete results are shown. For \(\mathcal Q^{/,[]}\), the problem is shown to be NP-hard by considering two constrained fragments of \(\mathcal Q^{/,[]}\). Also, for \(\mathcal Q^{/,*}\), \(\mathcal Q^{/,[]}\) and \(\mathcal Q^{/,//}\), it is shown that there are no \(n^{1-\epsilon}\)-approximation algorithms for any \(\epsilon >0\).
90C35 Programming involving graphs or networks
XPath; XQuery
Full Text: DOI
[1] Abiteboul S, Buneman P, Suciu D (2000) Data on the web: from relations to semistructured data and xml. Morgan Kaufmann, San Francisco
[2] Amer-Yahia S, Cho S, Lakshmanan LVS, Srivastava D (2002) Tree pattern query minimization. VLDB J 11(4):315-331 · Zbl 1047.68040
[3] Angluin, D, Inductive inference of formal languages from positive data, Inf Control, 45, 117-135, (1980) · Zbl 0459.68051
[4] Angluin, D, Learning regular sets from queries and counterexamples, Inf Comput, 75, 87-106, (1987) · Zbl 0636.68112
[5] Angluin, D, Negative results for equivalence queries, Mach Learn, 5, 121-150, (1990)
[6] Bex, GJ; Neven, F; Schwentick, T; Vansummeren, S, Inference of concise regular expressions and dtds, ACM Trans Database Syst (TODS), 35, 11:1-11:47, (2010)
[7] Boag S, Chamberlin D, Fernandez M, Florescu D, Robie J, Simeon J, Stefanescu M (2002) Xquery 1.0: an xml query language, http://www.w3.org/TR/xquery
[8] Carme J, Ceresna M, Goebel M (2006) Query-based learning of xpath expressions. In: ICGI, pp 342-343
[9] Carme, J; Gilleron, R; Lemay, A; Niehren, J, Interactive learning of node selecting tree transducer, Mach Learn, 66, 33-67, (2007)
[10] Deutch A, Fernandez M, Florescu D, Levy A, Suciu D (1999) A query language for xml. In: Proceedings of WWW
[11] Garey MR, Johnson DS (1990) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman & Co., New York · Zbl 0459.68051
[12] Gold EM (1967) Language identification in the limit. Inf Control 10(5):447-474 · Zbl 0259.68032
[13] Gonzalez G, Tari L, Gitter A, Leaman R, Nikkila S, Wendt R, Zeigler A, Baral C (2007) Integrating knowledge from biomedical literature: Normalization and evidence statements for interactions. In: Proceedings of the second bioCreative challenge evaluation workshop, pp 227-236
[14] Higuera Cdl (1997) Characteristic sets for polynomial grammatical inference. Machine Learn 27(2):125-138 · Zbl 0884.68107
[15] Jagadish HV, Milo T, Srivastava D, Vista D (1999) Querying network directories. In: SIGMOD, pp 133-144
[16] Jiang, T; Lin, G; Ma, B; Zhang, K, A general edit distance between RNA structures, J Comput Biol, 9, 371-388, (2002)
[17] Kearns MJ, Vazirani UV (1994) An introduction to computational learning theory. MIT Press, Cambridge
[18] Lemay A, Niehren J, Gilleron R (2006) Learning n-ary node selecting tree transducers from completely annotated examples. In: ICGI, pp 253-267 · Zbl 1158.68411
[19] Miyano, S; Shinohara, A; Shinohara, T, Polynomial-time learning of elementary formal systems, New Gen Comput, 18, 217-242, (2000)
[20] Sarma AD, Parameswaran A, Garcia-Molina H, Widom J (2010) Synthesizing view definitions from data. In: ICDT
[21] Staworko S, Wieczorek P (2012) Learning twig and path queries. In: ICDT · Zbl 1365.68223
[22] Weis M, Naumann F (2005) Dogmatix tracks down duplicates in xml. In: SIGMOD
[23] Zuckerman D (2006) Linear degree extractors and the inapproximability of max clique and chromatic number. In: STOC · Zbl 1301.68152
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.