# zbMATH — the first resource for mathematics

On the hardness of learning queries from tree structured data. (English) Zbl 1321.90144
Summary: The problem of learning queries from tree structured data is studied by this paper. A tree structured data is modeled as a node-labeled tree $$T$$, and applying a query $$q$$ on $$T$$ will return a set $$q(T)$$ which is a subset of nodes in $$T$$. For a tree-node pair $$(T,t)$$ where $$t$$ is a node in $$T$$, $$q$$ is called to accept the pair if $$t\in {q(T)}$$, and reject the pair if $$t\notin {q(T)}$$. For some query class $$\mathcal{L}$$, given tree-node pair sets $$E_p$$ and $$E_n$$, the tree query learning problem is to find a query $$q\in \mathcal{L}$$ such that (1) $$q$$ rejects all pairs in $$E_n$$, and (2) the size of pairs in $$E_p$$ accepted by $$q$$ is maximized. On four different query classes $$\mathcal Q^{/}$$, $$\mathcal Q^{/,*}$$, $$\mathcal Q^{/,//}$$ and $$\mathcal Q^{/,[]}$$, this paper studies the hardness of the corresponding tree query learning problems. For $$\mathcal Q^{/}$$, a PTIME algorithm is given. For $$\mathcal Q^{,*}$$ and $$\mathcal Q^{/,//}$$, the NP-complete results are shown. For $$\mathcal Q^{/,[]}$$, the problem is shown to be NP-hard by considering two constrained fragments of $$\mathcal Q^{/,[]}$$. Also, for $$\mathcal Q^{/,*}$$, $$\mathcal Q^{/,[]}$$ and $$\mathcal Q^{/,//}$$, it is shown that there are no $$n^{1-\epsilon}$$-approximation algorithms for any $$\epsilon >0$$.
##### MSC:
 90C35 Programming involving graphs or networks
##### Keywords:
tree query learning; hardness; inapproximability
XPath; XQuery
Full Text:
##### References:
  Abiteboul S, Buneman P, Suciu D (2000) Data on the web: from relations to semistructured data and xml. Morgan Kaufmann, San Francisco  Amer-Yahia S, Cho S, Lakshmanan LVS, Srivastava D (2002) Tree pattern query minimization. VLDB J 11(4):315-331 · Zbl 1047.68040  Angluin, D, Inductive inference of formal languages from positive data, Inf Control, 45, 117-135, (1980) · Zbl 0459.68051  Angluin, D, Learning regular sets from queries and counterexamples, Inf Comput, 75, 87-106, (1987) · Zbl 0636.68112  Angluin, D, Negative results for equivalence queries, Mach Learn, 5, 121-150, (1990)  Bex, GJ; Neven, F; Schwentick, T; Vansummeren, S, Inference of concise regular expressions and dtds, ACM Trans Database Syst (TODS), 35, 11:1-11:47, (2010)  Boag S, Chamberlin D, Fernandez M, Florescu D, Robie J, Simeon J, Stefanescu M (2002) Xquery 1.0: an xml query language, http://www.w3.org/TR/xquery  Carme J, Ceresna M, Goebel M (2006) Query-based learning of xpath expressions. In: ICGI, pp 342-343  Carme, J; Gilleron, R; Lemay, A; Niehren, J, Interactive learning of node selecting tree transducer, Mach Learn, 66, 33-67, (2007)  Deutch A, Fernandez M, Florescu D, Levy A, Suciu D (1999) A query language for xml. In: Proceedings of WWW  Garey MR, Johnson DS (1990) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman & Co., New York · Zbl 0459.68051  Gold EM (1967) Language identification in the limit. Inf Control 10(5):447-474 · Zbl 0259.68032  Gonzalez G, Tari L, Gitter A, Leaman R, Nikkila S, Wendt R, Zeigler A, Baral C (2007) Integrating knowledge from biomedical literature: Normalization and evidence statements for interactions. In: Proceedings of the second bioCreative challenge evaluation workshop, pp 227-236  Higuera Cdl (1997) Characteristic sets for polynomial grammatical inference. Machine Learn 27(2):125-138 · Zbl 0884.68107  Jagadish HV, Milo T, Srivastava D, Vista D (1999) Querying network directories. In: SIGMOD, pp 133-144  Jiang, T; Lin, G; Ma, B; Zhang, K, A general edit distance between RNA structures, J Comput Biol, 9, 371-388, (2002)  Kearns MJ, Vazirani UV (1994) An introduction to computational learning theory. MIT Press, Cambridge  Lemay A, Niehren J, Gilleron R (2006) Learning n-ary node selecting tree transducers from completely annotated examples. In: ICGI, pp 253-267 · Zbl 1158.68411  Miyano, S; Shinohara, A; Shinohara, T, Polynomial-time learning of elementary formal systems, New Gen Comput, 18, 217-242, (2000)  Sarma AD, Parameswaran A, Garcia-Molina H, Widom J (2010) Synthesizing view definitions from data. In: ICDT  Staworko S, Wieczorek P (2012) Learning twig and path queries. In: ICDT · Zbl 1365.68223  Weis M, Naumann F (2005) Dogmatix tracks down duplicates in xml. In: SIGMOD  Zuckerman D (2006) Linear degree extractors and the inapproximability of max clique and chromatic number. In: STOC · Zbl 1301.68152
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.