×

zbMATH — the first resource for mathematics

Repairing XML functional dependency violations. (English) Zbl 1239.68032
Summary: We study the problem of repairing XML functional dependency violations by making the smallest modifications in terms of repair cost. Our cost model assigns a weight to each leaf node in the XML document, and the cost of a repair is measured by the total weight of the modified nodes. We define an optimum repair as the repair with the minimum cost among all of the repairs. We prove lower and upper bounds for the optimum XML repair problem. We show that, in practice, it is beyond reach to find the optimum repairs; this problem is already NP-complete for a setting with a fixed DTD, a fixed set of functional dependencies, and equal weights for all of the nodes in the XML document. Instead, we provide an efficient two-step heuristic method to repair XML functional dependency violations. First, the initial violations are captured and fixed by leveraging the conflict hypergraph. Second, the remaining conflicts are resolved by modifying the violating nodes and their related nodes called determinants in a way that guarantees no new violations. We implement our method and evaluate it on synthetic and real-life data. The experimental results demonstrate that our algorithm scales well and is effective at improving data quality.

MSC:
68P99 Theory of data
68M11 Internet topics
Software:
XMark
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] M. Arenas, L.E. Bertossi, J. Chomicki, Consistent query answers in inconsistent databases, in: Proceedings of ACM Symposium on Principles of Database Systems, 1999, pp. 68-79. · Zbl 1079.68026
[2] Ausiello, G.; Crescenzi, P.; Gambosi, G.; Kann, V.; Marchetti-Spaccamela, A.; Protasi, M., Complexity and approximation: combinatorial optimization problems and their approximability properties, (1999), Springer · Zbl 0937.68002
[3] Abiteboul, S.; Hull, R.; Vianu, V., Foundations of databases, (1995), Addison-Wesley
[4] Arenas, M.; Libkin, L., A normal form for XML documents, ACM transactions on database systems, 29, 1, 195-232, (2004)
[5] Bertossi, L.E.; Bravo, L.; Franconi, E.; Lopatenko, A., The complexity and approximation of fixing numerical attributes in databases under integrity constraints, Information systems, 33, 4-5, 407-434, (2008)
[6] P. Buneman, S. Davidson, W. Fan, C. Hara, W. Tan, Reasoning about keys for XML, in: Proceedings of International Workshop on Database Programming Languages, 2002, pp. 133-148. · Zbl 1098.68552
[7] P. Buneman, S. Davidson, W. Fan, C. Hara, W. Tan, Keys for XML, in: Proceedings of World Wide Web Conference, 2001, pp. 201-210.
[8] P. Bohannon, W. Fan, M. Flaster, R. Rastogi, A cost based model and effective heuristic for repairing constraints by value modification, in: Proceedings of ACM SIGMOD Conference on Management of Data, 2005, pp. 143-154.
[9] Benedikt, M.; Fan, W.; Geerts, F., Xpath satisfiability in the presence of dtds, Journal of the ACM, 55, 2, 1-79, (2008) · Zbl 1326.68154
[10] G. Beskales, I. Ilyas, L. Golab, Sampling the repairs of functional dependency violations under dard constraints, in: Proceedings of International Conference on Very Large Data Bases, 2010, pp. 197-207.
[11] J. Chomicki, Consistent query answering: five easy pieces, in: Proceedings of International Conference on Database Theory, 2007, pp. 1-17.
[12] G. Cong, W. Fan, F. Geerts, X. Jia, S.Ma, Improving data quality: consistency and accuracy, in: Proceedings of International Conference on Very Large Data Bases, 2007, pp. 315-326.
[13] W. Fan, Dependencies revisited for improving data quality, in: Proceedings of ACM Symposium on Principles of Database Systems, 2008, pp. 159-170.
[14] Fan, W.; Bohannon, P., Information preserving XML schema embedding, ACM transactions on database systems, 33, 1, 1-44, (2008)
[15] S. Flesca, F. Furfaro, S. Greco, E. Zumpano, Repairs and consistent answers for XML data with functional dependencies, in: Proceedings of International XML Database Symposium, 2003, pp. 238-253.
[16] S. Flesca, F. Furfaro, S. Greco, E. Zumpano, Querying and repairing inconsistent XML data, in: Proceedings of Web Information Systems Engineering, 2005, pp. 175-188.
[17] Fan, W.; Libkin, L., On XML integrity constraints in the presence of dtds, Journal of the ACM, 49, 3, 368-406, (2002) · Zbl 1326.68120
[18] S. Hartmann, S. Link, Expressive, yet tractable XML keys, in: Proceedings of International Conference on Extending Database Technology, 2009, pp. 357-367.
[19] Hartmann, S.; Link, S., Efficient reasoning about a robust XML key fragment, ACM transactions on database systems, 34, 2, 1-33, (2009)
[20] S. Kolahi, L. Lakshmanan, On approximating optimum repairs for functional dependency violations, in: Proceedings of International Conference on Database Theory, 2009, pp. 53-62.
[21] Li, G.; Li, C.; Feng, J.; Zhou, L., SAIL: structure-aware indexing for effective and progressive top-k keyword search over XML documents, Information sciences, 179, 21, 3745-3762, (2009)
[22] A. Lopatenko, L. Bravo, Efficient approximation algorithms for repairing inconsistent databases, in: Proceedings of IEEE International Conference on Data Engineering, 2007, pp. 216-225.
[23] W. Ng, Repairing inconsistent merged XML data, in: Proceedings of Database and Expert Systems Applications, 2003, pp. 244-255.
[24] Spink, A.; Jansen, B.J., Web search, Information sciences, 179, 12, 1795, (2009)
[25] A. Schmidt, F. Waas, M. Kersten, M. Carey, I. Manolescu, R. Busse, XMark: A benchmark for XML data management, in: Proceedings of International Conference on Very Large Data Bases, 2002, pp. 974-985. · Zbl 1021.68797
[26] Tan, Z.; Liu, C.; Wang, W.; Shi, B., Consistent query answers from virtually integrated XML data, Journal of systems and software, 83, 12, 2566-2578, (2010)
[27] Z. Tan, L. Zhang, Improving XML data quality with functional dependencies, in: Proceedings of Database Systems for Advanced Applications, 2011, pp. 450-465.
[28] Tan, Z.; Zhang, Z.; Wang, W.; Shi, B., Consistent data for inconsistent XML document, Information and software technology, 49, 9-10, 947-959, (2007)
[29] Vazirani, V., Approximation algorithms, (2001), Springer · Zbl 1138.90417
[30] Vincent, M.; Liu, J.; Liu, C., Strong functional dependencies and their application to normal forms in XML, ACM transactions on database systems, 29, 3, 445-462, (2004)
[31] Wijsen, J., Database repairing using updates, ACM transactions on database systems, 30, 3, 722-768, (2005)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.