×

zbMATH — the first resource for mathematics

Efficient search in graph databases using cross filtering. (English) Zbl 1355.68073
Summary: Recently, graph data has been increasingly used in many areas such as bio-informatics and social networks, and a large amount of graph data is generated in those areas. As such, we need to manage such data efficiently. A basic, common problem in graph-related applications is to find graph data that contains a query (Graph Query Problem). However, since examining graph data sequentially incurs a prohibitive cost due to subgraph isomorphism testing, a novel indexing scheme is needed.
A feature-based approach is generally used as a graph indexing scheme. A path structure, a tree structure, or a graph structure can be extracted from a graph database as a feature. The path feature and the tree feature can be easily managed, but have lower pruning power than the graph feature. Although the graph feature has the best pruning power, it takes too much time to match the graph feature with the query. In this paper, we propose a graph feature-based approach called a CF-Framework (Cross Filtering-Framework) to solve the graph query problem efficiently. To select the graph features that correspond to the query with a low cost, the CF-Framework makes two feature groups according to the query and filters out each group crossly (i.e., alternately) based on set properties. We then validate the efficiency of the CF-Framework through experimental results.
MSC:
68P15 Database theory
68P20 Information storage and retrieval of data
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] C. Chen, X. Yan, P.S. Yu, J. Han, D.-Q. Zhang, X. Gu, Towards graph containment search and indexing, in: International Conference on Very Large Data Bases (VLDB), 2007, pp. 926-937.
[2] J. Cheng, Y. Ke, W. Ng, A. Lu, Fg-index: towards verification-free query processing on graph databases, in: ACM SIGMOD International Conference on Management of Data, 2007, pp. 857-872.
[3] Cheng, J.; Ke, Y.; Ng, W., Efficient query processing on graph databases, ACM Trans. Database Syst. (TODS), 34, 1, (2009)
[4] C.-W. Chung, J.-K. Min, K. Shim, Apex: an adaptive path index for xml data, in: ACM SIGMOD International Conference on Management of Data, 2002, pp. 121-132.
[5] S.A. Cook, The complexity of theorem-proving procedures, in: ACM Symposium on Theory of Computing (STOC), 1971, pp. 151-158.
[6] B.F. Cooper, N. Sample, M.J. Franklin, G.R. Hjaltason, M. Shadmon, A fast index for semistructured data, in: International Conference on Very Large Data Bases (VLDB), 2001, pp. 341-350.
[7] Ferro, A.; Giugno, R.; Mongiovı̀, M.; Pulvirenti, A.; Skripin, D.; Shasha, D., Graphfind enhancing graph searching by low support data mining techniques, BMC Bioinformatics, 9, S-4, (2008)
[8] R. Giugno, D. Shasha, Graphgrep: a fast and universal method for querying graphs, in: International Conference on Pattern Recognition (ICPR), 2002, pp. 112-115.
[9] R. Goldman, J. Widom, Dataguides: enabling query formulation and optimization in semistructured databases, in: International Conference on Very Large Data Bases (VLDB), 1997, pp. 436-445.
[10] Hagadone, T. R., Molecular substructure similarity searching: efficient retrieval in two-dimensional structure databases, J. Chem. Inform. Comput. Sci., 32, 5, 515-521, (1992)
[11] H. He, A.K. Singh, Closure-tree: an index structure for graph queries, in: IEEE International Conference on Data Engineering (ICDE), 2006.
[12] Huan, J.; Bandyopadhyay, D.; Wang, W.; Snoeyink, J.; Prins, J.; Tropsha, A., Comparing graph representations of protein structure for mining family-specific residue-based packing motifs, J. Comput. Biol., 12, 6, 657-671, (2005)
[13] Hsu, W.-C.; Liao, I.-E., CIS-X: a compacted indexing scheme for efficient query evaluation of XML documents, Inform. Sci., 241, 195-211, (2013) · Zbl 1320.68069
[14] H. Jiang, H. Wang, P.S. Yu, S. Zhou, Gstring: a novel approach for efficient search in graph databases, in: IEEE International Conference on Data Engineering (ICDE), 2007, pp. 566-575.
[15] Kang, U.; Hebert, M.; Park, S., Fast and scalable approximate spectral graph matching for correspondence problems, Inform. Sci., 220, 306-318, (2013)
[16] R. Kaushik, P. Shenoy, P. Bohannon, E. Gudes, Exploiting local similarity for indexing paths in graph-structured data, in: IEEE International Conference on Data Engineering (ICDE), 2002, pp. 129-140.
[17] A. Khan, Y. Wu, C.C. Aggarwal, X. Yan, NeMa: fast graph search with label similarity, in: International Conference on Very Large Data Bases (VLDB), 2013, pp. 181-192.
[18] C. Qun, A. Lim, K.W. Ong, D(k)-index: an adaptive structural summary for graph-structured data, in: ACM SIGMOD International Conference on Management of Data, 2003, pp. 134-144.
[19] P. Rao, B. Moon, Prix: indexing and querying xml using prüfer sequences, in: IEEE International Conference on Data Engineering (ICDE), 2004, pp. 288-300.
[20] S. Sakr, Graphrel: a decomposition-based and selectivity-aware relational framework for processing sub-graph queries, in: International Conference on Database Systems for Advanced Applications (DASFAA), 2009.
[21] D. Shasha, J.T.-L. Wang, R. Giugno, Algorithmics and applications of tree and graph searching, in: ACM Symposium on Principles of Database Systems (PODS), 2002, pp. 39-52.
[22] Y. Tian, J.M. Pate, Tale: a tool for approximate large graph matching, in: IEEE International Conference on Data Engineering (ICDE), 2008.
[23] Wang, G.; Wang, B.; Yang, X.; Yu, G., Efficiently indexing large sparse graphs for similarity search, IEEE Trans. Knowl. Data Eng. (TKDE), 24, 3, 440-451, (2012)
[24] H. Wang, S. Park, W. Fan, P.S. Yu, Vist: a dynamic index method for querying xml data by tree structures, in: ACM SIGMOD International Conference on Management of Data, 2003, pp. 110-121.
[25] X. Wang, X. Ding, A.K.H. Tung, S. Ying, H. Jin, An efficient graph indexing method, in: IEEE International Conference on Data Engineering (ICDE), 2012, pp. 210-221.
[26] Willett, P.; Barnard, J. M.; Downs, G. M., Chemical similarity searching, J. Chem. Inform. Comput. Sci., 38, 6, 983-996, (1998)
[27] D.W. Williams, J. Huan, W. Wang, Graph database indexing using structured graph decomposition, in: IEEE International Conference on Data Engineering (ICDE), 2007, pp. 976-985.
[28] Xu, L.; Ling, T. W.; Wu, H., Labeling dynamic XML documents: an order-centric approach, IEEE Trans. Knowl. Data Eng., 24, 1, 100-113, (2012)
[29] X. Yan, P.S. Yu, J. Han, Graph indexing: a frequent structure-based approach, in: ACM SIGMOD International Conference on Management of Data, 2004, pp. 335-346.
[30] X. Yan, P.S. Yu, J. Han, Substructure similarity search in graph databases, in: ACM SIGMOD International Conference on Management of Data, 2005, pp. 766-777.
[31] J. Yang, W. Jin, BR-Index: an indexing structure for subgraph matching in very large dynamic graphs, in: International Conference on Scientific and Statistical Database Management (SSDBM), 2011, pp. 322-331.
[32] D. Yuan, P. Mitra, H. Yu, C.L. Giles, Iterative graph feature mining for graph indexing, in: IEEE International Conference on Data Engineering (ICDE), 2012, pp. 198-209.
[33] Y. Yuan, G. Wang, H. Wang, L. Chen, Efficient subgraph search over large uncertain graphs, in: International Conference on Very Large Data Bases (VLDB), 2011, pp. 876-886.
[34] Y. Yuan, G. Wang, L. Chen, H. Wang, Efficient subgraph similarity search on large probabilistic graph databases, in: International Conference on Very Large Data Bases (VLDB), 2012, pp. 800-811.
[35] S. Zhang, S. Li, J. Yang, Gaddi: distance index based subgraph matching in biological networks, in: International Conference on Extending Database Technology (EDBT), 2009.
[36] S. Zhang, J. Yang, W. Ji, Sapper: subgraph indexing and approximate matching in large graphs, in: International Conference on Very Large Data Bases (VLDB), 2010, pp. 1185-1194.
[37] P. Zhao, J.X. Yu, P.S. Yu, Graph indexing: tree + delta ⩾ graph, in: International Conference on Very Large Data Bases (VLDB), 2007, pp. 938-949.
[38] L. Zou, L. Chen, J.X. Yu, Y. Lu, A novel spectral coding in a large graph database, in: International Conference on Extending Database Technology (EDBT), 2008, pp. 181-192.
[39] Aids, <http://www.cs.ucsb.edu/∼xyan/software.htm>.
[40] Datasets, <http://www.cs.ucsb.edu/∼xyan/software.htm>.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.