×

Foundations of XML processing. The tree-automata approach. (English) Zbl 1235.68004

Cambridge: Cambridge University Press (ISBN 978-0-521-19613-0/hbk; 978-0-511-90402-8/ebook). xi, 226 p. (2011).
The book is meant as a very detailed summary of results of the tree-automata approach to XML. The author considers XML data with a schema. The schemes considered are based on standardized languages of XML schemes like DTD, XML Schema, and RELAX NG.
The book is structured into 15 chapters. The introduction contains a global overview of the book and basic information about tree transformations and type checking. An important remark highlighting the methods used in the book concerns exact type checking and approximate type checking. The preliminaries summarize basic notions, such as regular expressions and string automata, necessary for the subsequent book reading.
The short Chapter 3 introduces the author’s approach to XML schemes. He does not use directly well-known XML schema languages, but he introduces his own schema model capturing the essence of these languages. The model supports mainly structural properties of XML schemes and not integrity constraints as they exist, e.g., in the language XML Schema. Chapter 4 defines formally basic notions used in the author’s approach, namely tree automata. A special focus is put on top-down and bottom-up determinism for tree automata, i.e., features which distinguish the automata regarding their properties. In Chapter 5 the author introduces the notion of patterns for extracting subtrees from an input tree. These definitions naturally continue with defining marking tree automata. Also a method how patterns can be translated into these automata is shown in the chapter. The first part of the book describing basic topics finishes with Chapter 7 devoted to a type checking in the context of XML documents. First, the author overviews various approaches to XML type checking, particularly to its exact and approximate version. He presents a subset of the functional language XDuce and builds a simple yet powerful type system for XML processing.
Part II, devoted to advanced topics, starts with Chapter 8 about on-the-fly algorithms for important problems related to XML processing, namely membership for tree automata, evaluation of marking tree automata, and containment for tree automata. This chapter includes both top-down and bottom-up algorithms. The algorithms are completed by theorems specifying their complexity and power. Chapter 9 introduces an extension of tree automata called alternating tree automata allowing a “set intersection” operator.
A step forward is done in Chapter 10 describing tree transducers, i.e., a class of finite-state models that not only accept trees but they also transform them. The chapter introduces two frameworks, namely, top-down tree transducers and macro transducers, and shows their properties.
Chapter 11 is devoted to exact type checking for the simplest tree transformation languages, namely, tow-down tree transducers. Chapter 12 introduces an alternative approach to subtree extraction called path expressions. While patterns are structural constraints for matching subtrees, path expressions use navigation. Such expressions are used in the very-well known language XPath. In the same chapter, the author introduces a framework for corresponding automata, called tree-walking automata. Their expressiveness is compared there with that of normal tree automata. The third approach to subtree extraction uses first-order logic and its extension monadic second-order logic. The logic-based queries are presented in Chapter 13.
The last two Chapters 14 and 15 are devoted to ambiguity, i.e., the case when regular expressions or patterns have multiple matching possibilities, and unorderedness of XML. (Recall that XML structure is ordered.) The author reviews some previous proposals to solve the problems with unordered XML documents.
One appendix completes the book. It summarizes solutions to selected exercises. In fact, the book can serve also a textbook for advanced courses of the XML language.
The book can be recommended as a fundamental contribution to XML theory. It also offers algorithms that can be used in practice.

MSC:

68-01 Introductory exposition (textbooks, tutorial papers, etc.) pertaining to computer science
68-02 Research exposition (monographs, survey articles) pertaining to computer science
68P15 Database theory
68Q45 Formal languages and automata
68P05 Data structures
68Q25 Analysis of algorithms and problem complexity
68Q65 Abstract data types; algebraic specification

Software:

XDuce; CDuce; biXid
PDFBibTeX XMLCite