×

zbMATH — the first resource for mathematics

Optimal schema hierarchies in searching semistructured databases by conjunctive regular path queries. (English. Russian original) Zbl 1117.68023
Program. Comput. Softw. 32, No. 4, 215-227 (2006); translation from Programmirovanie 2006, No. 4, 38-56 (2006).
Summary: An approach to estimating effectiveness of index usage when searching semistructured databases consisting of OEM documents is presented. In addition to the estimation of the hierarchy optimality from the standpoint of calculation of conjunctive regular path queries, this approach allows one to take into account arbitrary distributions of query probabilities. Algorithms for index construction are given, and estimates of their complexity are obtained. These estimates clearly demonstrate efficiency of the approach and practical applicability of the algorithms suggested.
MSC:
68P15 Database theory
Keywords:
OEM documents
PDF BibTeX Cite
Full Text: DOI
References:
[1] Buneman, P., Semistructured Data: A Tutorial, Proc. of PODS, Arizona, May 1997, pp. 117–121, http://db.cis.upenn.edu/DL/97/Tutorial-Peter/tutorialsemi-pods.ps.gz .
[2] Quass, D., Widom, J., Goldman, R., Haas, K., Luo, Q., McHugh, J., Nestorov, S., Rajaraman, A., Rivero, H., Abiteboul, S., Ullman, J.D., and Wiener, J.L., Proc. of the 1996 ACM SIGMOD Int. Conf. on Management of Data, 1996.
[3] Vasenin, V.A., Afonin, S.A., and Korshunov, A.A., K sozdaniyu kontseptsii integrirovannoi sistemy raspredelennykh informatsionnykh resursov Moskovskogo gosudarstvennogo universiteta im. M.V. Lomonosova (On the Creation of a Concept of Distributed Integrated System of Information Resources at Moscow State University), Moscow: Mosk. Gos. Univ., 2001.
[4] Extensible Markup Language (XML), http://www.w3c.org/XML .
[5] Abiteboul, S. and Vianu, V., Regular Path Queries with Constraints, Proc. of the 16th ACM Symp. on Principles of Database Systems, 1997, pp. 122–133, http://citeseer.ist.psu.edu/article/abiteboul97regular.html . · Zbl 0939.68025
[6] Shasha, D., Tsong-Li Wang, J., and Giugno, R., Algorithmics and Applications of Tree and Graph Searching, Symp. on Principles of Database Systems, 2002, pp. 39–52.
[7] Oracle Corp. Oracle 9i Database, http://www.oracle.com/ip/deploy/database/9i/index.html .
[8] Shanmegasundaram, J. et al., Relational Databases for Querying XML Documents: Limitations and Opportunities, Proc. of 25th VLDB, 1999.
[9] Deutsch, A., Fernandez, M., and Suciu, D., Storing Semistructured Data with STORED, Proc. SIGMOD, 1999.
[10] Florescu, D. and Kossmann, D., A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database, IRNA Tech. Report no. 3684, 1999.
[11] Goldman, R. and Widom, J., DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases, Proc. of the 1st Int. Conf. on Very Large Databases, 1997, pp. 436–455.
[12] Kaushik, R., Shenoy, P., Bohannon, Bh., and Gudes, E., Exploiting Local Similarity to Efficiently Index Paths in Graph-Structured Data, ICDE 2002, in press.
[13] Milo, T. and Suciu, D., Index Structures for Path Expressions, Proc. of the Int. Conf. on Database Theory, 1999, pp. 277–295.
[14] Software AG. Tamino XML Database, http://www.softwareag.com/tamino/ .
[15] XYZFind. XML Database, http://www.xyzfind.com .
[16] Chung, C., Min, J., and Shim, K., Apex: An Adaptive Path Index for XML Data, Proc. of ACM-SIGMOD Int. Conf. on Management of Data, Madison, 2002, pp. 121–132.
[17] Chen, Q., Lim, A., and Ong, K.W., D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data, Proc. of ACM-SIGMOD Int. Conf. on Management of Data, San-Diego, USA, 2003, pp. 134–144.
[18] Cooper, B., Sample, N., Franklin, M.J., Hjaltason, G.R., and Shadmon, M., A Fast Index for Semistructured Data, Proc. of 2001 Int. Conf. Very Large Data Bases, 2001, pp. 341–350.
[19] James, C.A., Weininger, D., and Delany, J., Daylight Theory Manual Daylight Version 4.82. Daylight Chemical Information Systems, 2003.
[20] Srinivasa, S. and Kumar, S., A Platform Based on the Multi-dimensional Data Model for Analysis of Biomolecular Structures, Proc. of 2003 Int. Conf. Very Large Data Bases, 2003.
[21] Washio, T. and Motoda, H., State of the Art of Graph-based Data Mining, SIGKDD Explorations, 2003, vol. 5, pp. 59–68. · Zbl 05442964
[22] Inokuchi, A., Washio, T., and Motoda, H., An a Prioribased Algorithm for Mining Frequent Substructures from Graph Data, Proc. of 2000 European Symp. Principle of Data Mining and Knowledge Discovery (PKDD’00), Lyon, France, 1998, pp. 13–23.
[23] Kuramochi, M. and Karypis, G., Frequent Subgraph Discovery, Proc. of 2001 Int. Conf. Data Mining. San Jose, 2001, pp. 313–320.
[24] Vanetik, N., Gudes, E., and Shimony, S.E., Computing Frequent Graph Patterns from Semistructured Data, Proc. 2002 Int. Conf. on Data Mining, Maebashi, Japan, 2002, pp. 458–465.
[25] Yan, X. and Han, J., CloseGraph: Mining Closed Frequent Graph Patterns, Proc. 2003 Int. Conf. Knowledge Discovery and Data Mining, Washington, D.C., 2003, pp. 286–295.
[26] Borgelt, C. and Berthold, M.R., Mining Molecular Fragments: Finding Relevant Substructures of Molecules, Proc. of 2002 Int. Conf. on Data Mining, Maebashi, Japan, 2002, pp. 211–218.
[27] Buneman, P., Davidson, S., Fernandez, M., Suciu, D., Henzinger, M., Henzinger, T., and Kopke, P., Computing Simulations on Finite and Infinite Graphs, Proc. of 20th Symp. on Foundations of Computer Science, 1995, pp. 462–543.
[28] Gorelov, S.S. and Vasenin, V.A., Search Optimization in Semistructured Databases Using Hierarchy of Document Schemas, Programmirovanie, 2005, no. 6, pp. 41–55 [Programming Comput. Software (Engl. Transl.), 2005, vol. 31, no. 6, pp. 321–331]. · Zbl 1106.68031
[29] Wirth, N., Algorithms and Data Structures, Prentice-Hall, 1986, pp. 286–301.
[30] Afonin, S.A., Strategy of Computation of Regular Path Queries, in Informatsionnye tekhnologii i programmirovanie (Information Technologies and Programming), vol. 5, Moscow: MGIU, 2002, pp. 9–16.
[31] Buneman, P., Davidson, S., Fernandez, M., and Suciu, D., Adding Structure to Unstructured Data, Tech. Report MS-CIS-96-21, Computer and Information Science Department, University of Pennsylvania, 1996.
[32] Fernandez, M.F. and Suciu, D., Optimizing Regular Path Expressions Using Graph Schemas, Proc. of the 14th Int. Conf. on Data Engineering, 1998, pp. 14–23. http://www.cs.Washington.edu/homes/suciu/files/paper-techrep.ps .
[33] Abiteboul, S., Querying Semi-Structured Data, 1997.
[34] Garofalakis, M., Gionis, A., Rastogi, R., Seshadri, S., and Shim, K., XTRACT: A System of Extracting Document Type Descriptors from XML Documents.
[35] Milo, T. and Suciu, D., Index Structures for Path Expressions, Lecture Notes in Computer Science, 1999, vol. 1540, pp. 277–295, http://citeseer.nj.nec.com/milo97index.html .
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.