×

TreeMiner: An efficient algorithm for mining embedded ordered frequent trees. (English) Zbl 1087.68099

Bandyopadhyay, Sanghamitra (ed.) et al., Advanced methods for knowledge discovery from complex data. London: Springer (ISBN 1-85233-989-6/hbk). Advanced Information and Knowledge Processing, 123-151 (2005).
Summary: Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semi-structured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TreeMiner, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called a scope-list. We contrast TreeMiner with a pattern-matching tree-mining algorithm (PatternMatcher). We conduct detailed experiments to test the performance and scalability of these methods. We find that TreeMiner outperforms the pattern matching approach by a factor of 4 to 20, and has good scale-up properties. We also present an application of tree mining to analyze real web logs for usage patterns.
For the entire collection see [Zbl 1074.68001].

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68P05 Data structures

Software:

TreeMiner
PDFBibTeX XMLCite