×

Top-down mining of frequent closed patterns from very high dimensional data. (English) Zbl 1162.68561

Summary: Frequent pattern mining is an essential theme in data mining. Existing algorithms usually use a bottom-up search strategy. However, for very high dimensional data, this strategy cannot fully utilize the minimum support constraint to prune the rowset search space. In this paper, we propose a new method called top-down mining together with a novel row enumeration tree to make full use of the pruning power of the minimum support constraint. Furthermore, to efficiently check if a rowset is closed, we develop a method called the trace-based method. Based on these methods, an algorithm called TD-Close is designed for mining a complete set of frequent closed patterns. To enhance its performance further, we improve it by using new pruning strategies and new data structures that lead to a new algorithm TTD-Close. Our performance study shows that the top-down strategy is effective in cutting down search space and saving memory space, while the trace-based method facilitates the closeness-checking. As a result, the algorithm TTD-Close outperforms the bottom-up search algorithms such as Carpenter and FPclose in most cases. It also runs faster than TD-Close.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] R. Agrawal, R. Srikant, Fast algorithms for mining association rules, in: J.B. Bocca, M. Jarke, C. Zaniolo (Eds.), Proceedings of the 1994 International Conference on Very Large Data Bases (VLDB’94), San Francisco, CA, USA, 1994, pp. 487-499.
[2] Y. Cheng, G.M. Church, Biclustering of expression data, in: Philip E. Bourne, Michael Gribskov, Russ B. Altman, Nancy Jensen, Debra A. Hope, Thomas Lengauer, Julie C. Mitchell, Eric D. Scheeff, Chris Smith, Shawn Strande, Helge Weissig (Eds.), Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB 2000), San Diego, California, USA, August 19-23, vol. 8, 2000, pp. 93-103.
[3] G. Cong, A.K.H. Tung, X. Xu, F. Pan, J. Yang, FARMER: finding interesting rule groups in microarray datasets, in: G. Weikum, A.C. Konig, S. Deßloch (Eds.), Proceedings of the 23rd ACM International Conference on Management of Data, Maison de la Chimie, Paris, France, June 13-18, 2004, pp. 143-154.
[4] G. Cong, K.-L. Tan, A.K.H. Tung, X. Xu, Mining top-k covering rule groups for gene expression data, in: Fatma Özcan (Eds.), Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, June 14-16, 2005, pp. 670-681.
[5] Creighton, C.; Hanash, S., Mining gene expression databases for association rules, Bioinformatics, 19, 1, 79-86, (2003)
[6] G. Dong, X. Zhang, L. Wong, J. Li, CAEP: classification by aggregating emerging patterns, in: Setsuo Arikawa, Koichi Furukawa (Eds.), Discovery Science, Second International Conference, DS’99, Tokyo, Japan, December 1999, pp. 30-42.
[7] G. Grahne, J. Zhu, Efficiently using prefix-trees in mining frequent itemsets, in: B. Goethals, M.J. Zaki (Eds.), Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2003), CEUR Workshop Proc., vol. 90, 2003. <http://CEUR-WS.org/Vol-90/>.
[8] G. Grahne, J. Zhu, Efficiently using prefix-trees in mining frequent itemsets. [Online] Source code available: <http://fimi.cs.helsinki.fi/src/>.
[9] Han, Jiawei; Cheng, Hong; Xin, Dong; Yan, Xifeng, Frequent pattern mining: current status and future direction, Data mining and knowledge discovery, 15, 1, 55-86, (2007)
[10] Han, J.; Pei, J., Mining frequent patterns by pattern growth: methodology and implications, SIGKDD explorations, 2, 14-20, (2000)
[11] Hu, Tianming; Sung, Sam Yuan; Xiong, Hui; Fu, Qian, Discovery of maximum length frequent itemsets, Information sciences, 178, 1, 69-87, (2008)
[12] Lee, Anthony J.T.; Wang, Chun-Sheng, An efficient algorithm for mining frequent inter-transaction patterns, Information sciences, 177, 17, 3453-3476, (2007)
[13] Lee, Anthony J.T.; Hong, Ruey-Wen; Ko, Wei-Min; Tsao, Wen-Kwang; Lin, Hsiu-Hui, Mining spatial association rules in image databases, Information sciences, 177, 7, 1593-1608, (2007)
[14] Lee, Yue-Shi; Yen, Show-Jane, Incremental and interactive mining of web traversal patterns, Information sciences, 178, 2, 287-306, (2008)
[15] Li, J.; Wong, L., Identifying good diagnostic genes or genes groups from gene expression data by using the concept of emerging patterns, Bioinformatics, 18, 725-734, (2002)
[16] W. Li, J. Han, J. Pei, CMAR: accurate and efficient classification based on multiple class-association rules, in: Nick Cercone, Tsau Young Lin, Xindong Wu (Eds.), Proceedings of the 2001 International Conference on Data Mining (ICDM’01), San Jose, CA, USA, 2001, pp. 369-376.
[17] B. Liu, W. Hsu, Y. Ma, Integrating classification and association rule mining, in: Rakesh Agrawal, Paul E. Stolorz, Gregory Piatetsky-Shapiro (Eds.), Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), New York City, New York, USA, 1998, pp. 80-86.
[18] Niehrs, C.; Pollet, N., Synexpression groups in eukaryotes, Nature, 402, 483-487, (1999)
[19] F. Pan, G. Cong, A.K.H. Tung, J. Yang, M.J. Zaki, CARPENTER: finding closed patterns in long biological datasets, in: L. Getoor, T.E. Senator (Eds.), Proceedings of the 2003 ACM SIGKDD International Conference On Knowledge Discovery and Data Mining, Washington, DC, USA, 2003, pp. 637-642.
[20] F. Pan, A.K.H. Tung, G. Cong, X. Xu, COBBLER: combining column and row enumeration for closed pattern discovery, in: Proceedings of the 2004 International Conference on Scientific and Statistical Database Management (SSDBM’04), Santorini Island, Greece, June 2004, pp. 21-30.
[21] N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal, Discovering frequent closed itemsets for association rules, in: C. Beeri, P. Buneman (Eds.), Proceedings of the 7th International Conference on Database Theory (ICDT’99), Jerusalem, Israel, 1999, pp. 398-416. · Zbl 0983.68511
[22] J. Pei, J. Han, R. Mao, CLOSET: an efficient logarithm for mining frequent closed itemsets, in: D. Gunopulos, R. Rastogi (Eds.), Proceedings of the 2000 ACM-SIGMOD International Workshop Data Mining and Knowledge Discovery (DMKD’00), Dallas, TX, May 2000, pp. 11-20.
[23] Qian, Tieyun; Xiong, Hui; Wang, Yuanzhen; Chen, Enhong, On the strength of hyperclique patterns for text categorization, Information sciences, 177, 19, 4040-4058, (2007) · Zbl 1142.68586
[24] F. Rioult, J. Boulicaut, B. Cremilleux, J. Besson, Using transposition for pattern discovery from microarray data, in: Mohammed Javeed Zaki, Charu C. Aggarwal (Eds.), Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, San Diego, California, USA, 2003, pp. 73-79.
[25] J. Yang, H. Wang, W. Wang, P.S. Yu, Enhanced biclustering on gene expression data, in: Hasan Jamil, Vasileios Magalooikonomou (Eds.), Proceedings of the 3rd IEEE Symposium on Bioinformatics and Bioengineering (BIBE’03), Bethesda, MD, USA, March 2003, pp. 321-327.
[26] Yun, Unil, Efficient mining of weighted interesting patterns with a strong weight and/or support affinity, Information sciences, 177, 17, 3477-3499, (2007)
[27] M. Zaki, C. Hsiao, Charm: an efficient algorithm for closed association rule mining, in: Robert L. Grossman, Jiawei Han, Vipin Kumar, Heikki Mannila, Rajeev Motwani (Eds.), Proceedings of 2002 SIAM Data Mining Conference, Arlington, VA, 2002, pp. 457-473.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.