×

zbMATH — the first resource for mathematics

Management of interval probabilistic data. (English) Zbl 1141.68028
Summary: In this paper we present a data model for uncertain data, where uncertainty is represented using interval probabilities. The theory introduced in the paper can be applied to different specific data models, because the entire approach has been developed independently of the kind of manipulated objects, like XML documents, relational tuples, or other data types. As a consequence, our theory can be used to extend existing data models with the management of uncertainty. In particular, the data model we obtain as an application to XML data is the first proposal that combines XML, interval probabilities and a powerful query algebra with selection, projection, and cross product. The cross product operator is not based on assumptions of independence between XML trees from different collections. Being defined with a possible worlds semantics, our operators are proper extensions of their traditional counterparts, and reduce to them when there is no uncertainty. The main practical result of the paper is a set of equivalences that can be used to compare or rewrite algebraic queries on interval probabilistic data, in particular XML and relational.

MSC:
68P05 Data structures
Keywords:
XML data
Software:
MYSTIQ; ProbView; ProTDB; TAX
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Al-Khalifa, S., Yu, C., Jagadish, H.V.: Querying structured text in an XML database. In: SIGMOD Conference (2003)
[2] Barbara D., Garcia-Molina H. and Porter D. (1992). The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4(5): 487–501 · Zbl 05109620 · doi:10.1109/69.166990
[3] Bonissone P.P. and Tong R.M. (1985). Editorial: Reasoning with uncertainty in expert systems. Int. J. Man Mach. Stud. 22(3): 241–250 · doi:10.1016/S0020-7373(85)80001-8
[4] Boulos, J., Dalvi, N., Mandhani, B., Mathur, S., Re, C., Suciu, D.: Mystiq: a system for finding more answers by using probabilities. In: SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 891–893. ACM Press, New York (2005). http://doi.acm.org/10.1145/1066157.1066277
[5] Codd E.F. (1979). Extending the database relational model to capture more meaning. ACM Trans. Database Syst. 4(4): 397–434 http://doi.acm.org/10.1145/320107.320109 · doi:10.1145/320107.320109
[6] Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB Conference (2004)
[7] Dekhtyar, A., Goldsmith, J., Hawkes, S.R.: Semistructured probalistic databases. In: Statistical and Scientific Database Management (2001)
[8] Demolombe R. (1997). Uncertainty in intelligent databases. In: Motro, A. and Thanos, C. (eds) Uncertainty Management in Information Systems, pp. Kluwer, Dordrecht
[9] Dey D. and Sarkar S. (1996). A probabilistic relational model and algebra. ACM Trans. Database Syst. 21(3): 339–369 · Zbl 01936192 · doi:10.1145/232753.232796
[10] Eiter T., Lu J.J., Lukasiewicz T. and Subrahmanian V.S. (2001). Probabilistic object bases. ACM Trans. Database Syst. 26(3): 264–312 http://doi.acm.org/10.1145/502030.502031 · Zbl 1136.68379 · doi:10.1145/502030.502031
[11] Fuhr N. and Rölleke T. (1997). A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst. 15(1): 32–66 · Zbl 01936043 · doi:10.1145/239041.239045
[12] Hung, E., Getoor, L., Subrahmanian, V.: Probabilistic interval XML. In: ICDT. Siena (2003) · Zbl 1022.68516
[13] Hung, E., Getoor, L., Subrahmanian, V.: PXML: A probabilistic semistructured data model and algebra. In: ICDE. Bangalore (2003)
[14] Hunter, A., Liu, W.: Merging uncertain information with semantic heterogeneity in XML. Knowl. Inf. Syst. (2005) (accepted for publication)
[15] Jagadish, H., Lakshmanan, L., Srivastava, D., Thompson, K.: TAX: A tree algebra for XML. In: DBPL Workshop (2001) · Zbl 1098.68553
[16] Lakshmanan L.V.S., Leone N., Ross R. and Subrahmanian V.S. (1997). ProbView: a flexible probabilistic database system. ACM Trans. Database Syst. 22(3): 419–469 · doi:10.1145/261124.261131
[17] Lee, S.K.: An extended relational database model for uncertain and imprecise information. In: Yuan, L.Y. (ed.) VLDB Conference (1992)
[18] Magnani, M., Montesi, D.: A unified approach to structured and XML data modeling and manipulation. Data Knowl. Eng. 59(1) (2006)
[19] Magnani, M., Rizopoulos, N., McBrien, P., Montesi, D.: Schema integration based on uncertain semantic mappings. In: International Conference of Conceptual Modeling, LNCS 3716 (2005)
[20] Motro A. (1995). Imprecision and uncertainty in database systems. In: Bosc, P. and Kacprzyk, J. (eds) Fuzziness in Database Management Systems, pp 3–22. Physica-Verlag, New York
[21] Nierman, A., Jagadish, H.V.: ProTDB: Probabilistic data in XML. In: VLDB Conference (2002)
[22] Pal N.R. (1999). On quantification of different facets of uncertainty. Fuzzy Sets Syst. 107: 81–91 · Zbl 1023.94551 · doi:10.1016/S0165-0114(98)00005-0
[23] Pittarelli M. (1994). An algebra for probabilistic databases. IEEE Trans. Knowl. Data Eng. 6(2): 293–303 · Zbl 05108994 · doi:10.1109/69.277772
[24] Shafer G. (1976). A mathematical theory of evidence. Princeton University Press, New Jersey · Zbl 0359.62002
[25] Smets P. (1997). Imperfect information: Imprecision - uncertainty. In: Motro, A. and Smets, Ph. (eds) Uncertainty Management in Information Systems. From Needs to Solutions, pp 225–254. Kluwer, Dordrecht
[26] Smithson M.J. (1989). Ignorance and Uncertainty: Emerging Paradigms. Springer, New York
[27] Widom, J.: Trio: A system for integrated management of data, accuracy, and lineage. In: CIDR, pp. 262–276 (2005)
[28] Witold Lipski J. (1979). On semantic issues connected with incomplete information databases. ACM Trans. Database Syst. 4(3): 262–296 http://doi.acm.org/10.1145/320083.320088 · doi:10.1145/320083.320088
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.