×

Mining frequent closed rooted trees. (English) Zbl 1470.68079

Summary: Many knowledge representation mechanisms are based on tree-like structures, thus symbolizing the fact that certain pieces of information are related in one sense or another. There exists a well-studied process of closure-based data mining in the itemset framework: we consider the extension of this process into trees. We focus mostly on the case where labels on the nodes are nonexistent or unreliable, and discuss algorithms for closure-based mining that only rely on the root of the tree and the link structure. We provide a notion of intersection that leads to a deeper understanding of the notion of support-based closure, in terms of an actual closure operator. We describe combinatorial characterizations and some properties of ordered trees, discuss their applicability to unordered trees, and rely on them to design efficient algorithms for mining frequent closed subtrees both in the ordered and the unordered settings. Empirical validations and comparisons with alternative algorithms are provided.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68T30 Knowledge representation
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Arimura, H., & Uno, T. (2005). An output-polynomial time algorithm for mining frequent closed attribute trees. In ILP (pp. 1-19). · Zbl 1134.68464
[2] Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., & Arikawa, S. (2002). Efficient substructure discovery from large semi-structured data. In SDM. · Zbl 1020.68521
[3] Asai, T., Arimura, H., Uno, T., & Nakano, S.-I. (2003). Discovering frequent substructures in large unordered trees. In Discovery science (pp. 47-61).
[4] Baixeries, J., & Balcázar, J. L. Discrete deterministic data mining as knowledge compilation. In Workshop on discrete mathematics and data mining at SIAM DM conference.
[5] Balcázar, J. L., Bifet, A., & Lozano, A. (2007). Mining frequent closed unordered trees through natural representations. In ICCS 2007, 15th international conference on conceptual structures (pp. 347-359). · Zbl 1213.68596
[6] Balcázar, J. L., & Garriga, G. C. (2005). On Horn axiomatizations for sequential data. In ICDT (pp. 215-229) (extended version in Theoretical Computer Science, 371, 247-264, 2007). · Zbl 1108.68551
[7] Balcázar, J. L., Bifet, A., & Lozano, A. (2006). Intersection algorithms and a closure operator on unordered trees. In MLG 2006, 4th international workshop on mining and learning with graphs.
[8] Balcázar, J. L., Bifet, A., & Lozano, A. (2007). Subtree testing and closed tree mining through natural representations. In DEXA workshops (pp. 499-503). · Zbl 1213.68596
[9] Balcázar, J. L., Bifet, A., & Lozano, A. (2008). Mining implications from lattices of closed trees. In Extraction et gestion des connaissances (EGC’2008) (pp. 373-384).
[10] Beyer, T., & Hedetniemi, S. M. (1980). Constant time generation of rooted trees. SIAM Journal on Computing, 9(4), 706-712. · Zbl 0453.05020 · doi:10.1137/0209055
[11] Bifet, A., & Gavaldà, R. (2008). Mining adaptively frequent closed unlabeled rooted trees in data streams. In 14th ACM SIGKDD international conference on knowledge discovery and data mining (2008). · Zbl 0795.05002
[12] Chakrabarti, S. (2002). Mining the web: analysis of hypertext and semi structured data. San Mateo: Morgan Kaufmann.
[13] Chalmers, R., & Almeroth, K. (2001). Modeling the branching characteristics and efficiency gains of global multicast trees. In Proceedings of the IEEE INFOCOM’2001, April 2001. · Zbl 1096.68044
[14] Chi, Y., Muntz, R., Nijssen, S., & Kok, J. (2001a). Frequent subtree mining—an overview. Fundamenta Informaticae, XXI, 1001-1038. · Zbl 1096.68044
[15] Chi, Y., Xia, Y., Yang, Y., & Muntz, R. (2001b). Mining closed and maximal frequent subtrees from databases of labeled rooted trees. Fundamenta Informaticae, XXI, 1001-1038.
[16] Chi, Y.; Yang, Y.; Muntz, R. R., Hybridtreeminer: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms, 11 (2004), Washington
[17] Chi, Y., Yang, Y., & Muntz, R. R. (2005). Canonical forms for labelled trees and their applications in frequent subtree mining. Knowledge and Information Systems, 8(2), 203-234. · doi:10.1007/s10115-004-0180-7
[18] Ganter, B., & Wille, R. (1999). Formal concept analysis. Berlin: Springer. · Zbl 0909.06001
[19] Garriga, G. C. (2006). Formal methods for mining structured objects. PhD thesis. · Zbl 1280.68003
[20] Garriga, G. C., & Balcázar, J. L. (2004). Coproduct transformations on lattices of closed partial orders. In ICGT (pp. 336-352). · Zbl 1116.68595
[21] Hashimoto, K., Aoki-Kinoshita, K. F., Ueda, N., Kanehisa, M., & Mamitsuka, H. (2008). A new efficient probabilistic model for mining labeled ordered trees applied to glycobiology. ACM Transactions on Knowledge Discovery from Data, 2(1), 1-30. · doi:10.1145/1342320.1342326
[22] Hein, J.; Jiang, T.; Wang, L.; Zhang, K.; Galil, Z. (ed.); Ukkonen, E. (ed.), On the complexity of comparing evolutionary trees, Espoo, Finland, Berlin
[23] Nakano, S. I., & Uno, T. (2003). Efficient generation of rooted trees (Tech. Rep. NII-2003-005e). National Institute for Informatics, Japan.
[24] Knuth, D. E. (1997). The art of computer programming: Vol. 1. Fundamental algorithms (3rd ed.). Redwood City: Addison-Wesley/Longman. · Zbl 0895.68055
[25] Knuth, D. E. (2005). The art of computer programming: Vol. 4. Fascicle 4: generating all trees—history of combinatorial generation. New York: Addison-Wesley. · Zbl 1127.68068
[26] Kohavi, R., Brodley, C., Frasca, B., Mason, L., & Zheng, Z. (2000). KDD-Cup 2000 organizers’ report: peeling the onion. SIGKDD Explorations, 2(2), 86-98. · doi:10.1145/380995.381033
[27] Liu, T.-L., & Geiger, D. (1999). Approximate tree matching and shape similarity. In ICCV (pp. 456-462).
[28] Luccio, F., Enriquez, A. M., Rieumont, P. O., & Pagli, L. (2001). Exact rooted subtree matching in sublinear time (Technical Report TR-01-14). Università Di Pisa.
[29] Luccio, F., Enriquez, A. M., Rieumont, P. O., & Pagli, L. (2004). Bottom-up subtree isomorphism for unordered labeled trees. · Zbl 1146.68403
[30] Nijssen, S., & Kok, J. N. (2003). Efficient discovery of frequent unordered trees. In First international workshop on mining graphs, trees and sequences (pp. 55-64).
[31] Plotkin, J. M., & Rosenthal, J. W. (1994). How to obtain an asymptotic expansion of a sequence from an analytic identity satisfied by its generating function. Journal of the Australian Mathematical Society (Series A), 56, 131-143. · Zbl 0795.05002 · doi:10.1017/S1446788700034777
[32] Shasha, D.; Wang, J. T. L.; Zhang, S., Unordered tree mining with applications to phylogeny, 708 (2004), Washington · doi:10.1109/ICDE.2004.1320039
[33] Termier, A., Rousset, M.-C., & Sebag, M. (2004). DRYADE: a new approach for discovering closed frequent trees in heterogeneous tree databases. In ICDM (pp. 543-546).
[34] Termier, A., Rousset, M.-C., Sebag, M., Ohara, K., Washio, T., & Motoda, H. (2008). DryadeParent, an efficient and robust closed attribute tree mining algorithm. IEEE Transactions on Knowledge and Data Engineering, 20(3), 300-320. · doi:10.1109/TKDE.2007.190695
[35] Valiente, G. (2002). Algorithms on trees and graphs. Berlin: Springer. · Zbl 1007.05001
[36] Weiss, S., Indurkhya, N., Zhang, T., & Damerau, F. (2004). Text mining: predictive methods for analyzing unstructured information. Berlin: Springer. · Zbl 1058.68052
[37] Xiao, Y.; Yao, J.-F.; Li, Z.; Dunham, M. H., Efficient data mining for maximal frequent subtrees, 379 (2003), Washington
[38] Yan, X.; Han, J., gspan: graph-based substructure pattern mining, 721 (2002), Washington
[39] Yan, X.; Han, J., Closegraph: mining closed frequent graph patterns, 286-295 (2003), New York · doi:10.1145/956750.956784
[40] Yan, X., Han, J., & Afshar, R. (2003). Clospan: mining closed sequential patterns in large databases. In SDM.
[41] Yan, X.; Zhou, X. J.; Han, J., Mining closed relational graphs with connectivity constraints, 324-333 (2005), New York · doi:10.1145/1081870.1081908
[42] Zaki, M. J. (2002). Efficiently mining frequent trees in a forest. In 8th ACM SIGKDD international conference on knowledge discovery and data mining.
[43] Zaki, M. J. (2005). Efficiently mining frequent embedded unordered trees. Fundamenta Mathematicae, 66(12), 33-52. · Zbl 1098.68579
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.