×

zbMATH — the first resource for mathematics

Schemas for unordered XML on a DIME. (English) Zbl 1347.68109
Summary: We investigate schema languages for unordered XML having no relative order among siblings. First, we propose unordered regular expressions (UREs), essentially regular expressions with unordered concatenation instead of standard concatenation, that define languages of unordered words to model the allowed content of a node (i.e., collections of the labels of children). However, unrestricted UREs are computationally too expensive as we show the intractability of two fundamental decision problems for UREs: membership of an unordered word to the language of a URE and containment of two UREs. Consequently, we propose a practical and tractable restriction of UREs, disjunctive interval multiplicity expressions (DIMEs). Next, we employ DIMEs to define languages of unordered trees and propose two schema languages: disjunctive interval multiplicity schema (DIMS), and its restriction, disjunction-free interval multiplicity schema (IMS). We study the complexity of the following static analysis problems: schema satisfiability, membership of a tree to the language of a schema, schema containment, as well as twig query satisfiability, implication, and containment in the presence of schema. Finally, we study the expressive power of the proposed schema languages and compare them with yardstick languages of unordered trees (FO, MSO, and Presburger constraints) and DTDs under commutative closure. Our results show that the proposed schema languages are capable of expressing many practical languages of unordered trees and enjoy desirable computational properties.

MSC:
68P15 Database theory
68Q25 Analysis of algorithms and problem complexity
68Q45 Formal languages and automata
Software:
XMark; XPath
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Abiteboul, S., Bourhis, P., Vianu, V.: Highly expressive query languages for unordered data trees. In: ICDT, pp 46-60 (2012) · Zbl 1352.68075
[2] Albert, J; Giammarresi, D; Wood, D, Normal form algorithms for extended context-free grammars, Theor. Comput. Sci., 267, 35-47, (2001) · Zbl 0984.68092
[3] Amer-Yahia, S; Cho, S; Lakshmanan, LVS; Srivastava, D, Tree pattern query minimization, VLDB J., 11, 315-331, (2002) · Zbl 1047.68040
[4] Beeri, C., Milo, T.: Schemas for integration and translation of structured and semi-structured data. In: ICDT, pp 296-313 (1999)
[5] Benedikt, M., Fan, W., Geerts, F.: XPath satisfiability in the presence of DTDs. J. ACM 55(2) (2008) · Zbl 1326.68154
[6] Berglund, M., Björklund, H., Högberg, J.: Recognizing shuffled languages. In: LATA, pp 142-154 (2011) · Zbl 1154.68370
[7] Bex, G.J., Neven, F., Schwentick, T., Vansummeren, S.: Inference of concise regular expressions and DTDs. ACM Trans. Database Syst 35(2) (2010) · Zbl 1294.68067
[8] Bex, G.J., Neven, F., Van den Bussche, J.: DTDs versus XML Schema A practical study. In: WebDB, pp 79-84 (2004)
[9] Björklund, H., Martens, W., Schwentick, T.: Validity of tree pattern queries with respect to schema information MFCS, pp 171-182 (2013) · Zbl 1398.68187
[10] Boneva, I., Ciucanu, R., Staworko, S.: Simple schemas for unordered XML. In: WebDB (2013) · Zbl 1347.68109
[11] Boneva, I., Gayo, J.E.L., Hym, S., Prud’hommeau, E.G., Solbrig, H.R., Staworko, S.: Validating RDF with shape expressions. arXiv:CoRRabs/1404.1270 CoRR (2014)
[12] Boneva, I., Talbot, J.: Automata and logics for unranked and unordered trees. In: RTA, pp 500-515 (2005) · Zbl 1078.03035
[13] Boneva, I., Talbot, J., Tison, S.: Expressiveness of a spatial logic for trees. In: LICS, pp 280-289 (2005)
[14] Brüggemann-Klein, A; Wood, D, One-unambiguous regular languages, Inf. Comput., 142, 182-206, (1998) · Zbl 0912.68112
[15] Cardelli, L; Ghelli, G, TQL: a query language for semistructured data based on the ambient logic, Math. Struct. Comput. Sci., 14, 285-327, (2004) · Zbl 1085.68035
[16] Ciucanu, R., Staworko, S.: Learning schemas for unordered XML. In: DBPL (2013) · Zbl 1347.68109
[17] Colazzo, D; Ghelli, G; Pardini, L; Sartiani, C, Almost-linear inclusion for XML regular expression types, ACM Trans. Database Syst., 38, 15, (2013) · Zbl 1321.68242
[18] Colazzo, D; Ghelli, G; Sartiani, C, Efficient inclusion for a class of XML types with interleaving and counting, Inf. Syst., 34, 643-656, (2009) · Zbl 1294.68067
[19] Czerwinski, W., David, C., Losemann, K., Martens, W.: Deciding definability by deterministic regular expressions. In: FoSSaCS, pp 289-304 (2013) · Zbl 1260.68199
[20] Dal-Zilio, S., Lugiez, D.: XML schema, tree logic and sheaves automata RTA, pp 246-263 (2003) · Zbl 1038.68039
[21] Gelade, W; Martens, W; F. Neven., Optimizing schema languages for XML, numerical constraints and interleaving, SIAM J. Comput., 38, 2021-2043, (2009) · Zbl 1187.68191
[22] Ghelli, G., Colazzo, D., Sartiani, C.: Linear time membership in a class of regular expressions with interleaving and counting. In: CIKM, pp 389-398 (2008) · Zbl 1154.68370
[23] Grijzenhout, S; Marx, M, The quality of the XML web, J. Web Sem., 19, 59-68, (2013)
[24] Hashimoto, K., Kusunoki, Y., Ishihara, Y., Fujiwara, T.: Validity of positive XPath queries with wildcard in the presence of DTDs. In: DBPL (2011) · Zbl 0984.68092
[25] Hovland, D.: The membership problem for regular expressions with unordered concatenation and numerical constraints. In: LATA, pp 313-324 (2012) · Zbl 1351.68140
[26] Kopczynski, E., To, A.: Parikh images of grammars Complexity and applications. In: LICS, pp 80-89 (2010)
[27] Martens, W; Neven, F, On the complexity of typechecking top-down XML transformations, Theor. Comput. Sci., 336, 153-180, (2005) · Zbl 1080.68021
[28] Martens, W; Neven, F; Gyssens, M, Typechecking top-down XML transformations fixed input or output schemas, Inf. Comput., 206, 806-827, (2008) · Zbl 1154.68370
[29] Martens, W., Neven, F., Schwentick, T.: Complexity of decision problems for simple regular expressions. In: MFCS, pp 889-900 (2004) · Zbl 1097.68066
[30] Martens, W; Neven, F; Schwentick, T, Complexity of decision problems for XML schemas and chain regular expressions, SIAM J. Comput., 39, 1486-1530, (2009) · Zbl 1211.68162
[31] Mayer, AJ; Stockmeyer, LJ, Word problems-this time with interleaving, Inf. Comput., 115, 293-311, (1994)
[32] Miklau, G; Suciu, D, Containment and equivalence for a fragment of xpath, J. ACM, 51, 2-45, (2004) · Zbl 1316.68047
[33] Montazerian, M., Wood, P. T., Mousavi, S. R.: XPath query satisfiability is in PTIME for real-world DTDs XSym, pp 17-30 (2007)
[34] Neven, F., Schwentick, T.: XML schemas without order (1999)
[35] Neven, F., Schwentick, T.: On the complexity of XPath containment in the presence of disjunction, DTDs, and variables. Logical Methods in Computer Science 2 (3) (2006) · Zbl 1126.68376
[36] Oppen, DC, A \(2^{2^{2^{p_{n}}}}\) upper bound on the complexity of Presburger arithmetic, J. Comput. Syst. Sci., 16, 323-332, (1978) · Zbl 0381.03021
[37] Papakonstantinou, Y., Vianu, V.: DTD inference for views of XML data. In: PODS, pp 35-46 (2000)
[38] Schaefer, T.J.: The complexity of satisfiability problems. In: STOC, pp 216-226 (1978) · Zbl 1282.68143
[39] Schmidt, A., Waas, F., Kersten, M., Carey, M., Manolescu, I., XMark, R. Busse.: A benchmark for XML data management VLDB, pp 974-985 (2002)
[40] Schwentick, T.: Trees, automata and XML PODS, p 222 (2004)
[41] Segoufin, L., Sirangelo, C.: Constant-memory validation of streaming XML documents against DTDs. In: ICDT, pp 299-313 (2007) · Zbl 1211.68162
[42] Segoufin, L., Vianu, V.: Validating streaming XML documents. In: PODS, pp 53-64 (2002) · Zbl 0984.68092
[43] Seidl, H., Schwentick, T., Muscholl, A.: Numerical document queries. In: PODS, pp 155-166 (2003)
[44] Seidl, H., Schwentick, T., Muscholl, A.: Counting in trees. In: Logic and Automata, pp 575-612 (2008) · Zbl 1226.03049
[45] Staworko, S., Wieczorek, P.: Learning twig and path queries. In: ICDT, pp 140-154 (2012)
[46] Stockmeyer, L.J., Meyer, A.R.: Word problems requiring exponential time Preliminary report STOC, pp 1-9 (1973) · Zbl 0359.68050
[47] W3C: XML Path language (XPath) 1.0 (1999)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.