×

Memory-adaptive high utility sequential pattern mining over data streams. (English) Zbl 1457.68246

Summary: High utility sequential pattern (HUSP) mining has emerged as an important topic in data mining. A number of studies have been conducted on mining HUSPs, but they are mainly intended for non-streaming data and thus do not take data stream characteristics into consideration. Streaming data are fast changing, continuously generated unbounded in quantity. Such data can easily exhaust computer resources (e.g., memory) unless a proper resource-aware mining is performed. In this study, we explore the fundamental problem of how limited memory can be best utilized to produce high quality HUSPs over a data stream. We design an approximation algorithm, called MAHUSP, that employs memory adaptive mechanisms to use a bounded portion of memory, in order to efficiently discover HUSPs over data streams. An efficient tree structure, called MAS-Tree, is proposed to store potential HUSPs over a data stream. MAHUSP guarantees that all HUSPs are discovered in certain circumstances. Our experimental study shows that our algorithm can not only discover HUSPs over data streams efficiently, but also adapt to memory allocation with limited sacrifices in the quality of discovered HUSPs. Furthermore, in order to show the effectiveness and efficiency of MAHUSP in real-life applications, we apply our proposed algorithm to a web clickstream dataset obtained from a Canadian news portal to showcase users’ reading behavior, and to a real biosequence database to identify disease-related gene regulation sequential patterns. The results show that MAHUSP effectively discovers useful and meaningful patterns in both cases.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68T10 Pattern recognition, speech recognition
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. In ICDE (pp. 3-14).
[2] Ahmed, C. F., Tanbeer, S. K., & Jeong, B. (2010). A novel approach for mining high-utility sequential patterns in sequence databases. ETRI Journal, 32, 676-686. · doi:10.4218/etrij.10.1510.0066
[3] Ahmed, C. F., Tanbeer, S. K., & Jeong, B. (2011). A framework for mining high utility web access sequences. IETE Journal, 28, 3-16.
[4] Ahmed, C. F., Tanbeer, S. K., & Jeong, B. S. (2012). Interactive mining of high utility patterns over data streams. Expert Systems with Applications, 39, 11979-11991. · doi:10.1016/j.eswa.2012.03.062
[5] Ayres, J., Flannick, J., Gehrke, J., & Yiu, T. (2002). Sequential pattern mining using a bitmap representation. In Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining (pp. 429-435).
[6] Bringay, S., Bringay, S., Roche, M., Teisseire, M., Poncelet, P., Rassoul, R. A., et al. (2010). Discovering novelty in sequential patterns: Application for analysis of microarray data on alzheimer disease. Studies in Health Technology and Informatics, 14(160), 1314-1318.
[7] Chang, L., Wang, T., Yang, D., & Luan, H. (2008) Seqstream: Mining closed sequential patterns over stream sliding windows. In Proceedings of the IEEE international conference on data mining (pp. 83-92).
[8] Chen, G., Wu, X., & Zhu, X. (2005) Mining sequential patterns across data streams, Ph.D. thesis. University of Vermont.
[9] Cheng, C. P., Liu, Y. C., Tsai, Y. L., & Tseng, V. S. (2013). An efficient method for mining cross-timepoint gene regulation sequential patterns from time course gene expression datasets. BMC Bioinformatics, 14(12), 1-12. · doi:10.1186/1471-2105-14-S18-S1
[10] Creighton, C., & Hanash, S. (2003). Mining gene expression databases for association rules. Bioinformatics, 19(1), 79-86. · doi:10.1093/bioinformatics/19.1.79
[11] Fournier-Viger, P., Gomariz, A., Soltani, A., & Gueniche, T. (2013) Spmf: Open-source data mining library. http://www.philippe-fournier-viger.com/spmf/. · Zbl 1310.68178
[12] Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., & Hsu, M. (2010) Freespan: Frequent pattern-projected sequential pattern mining. In Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (pp. 355-359).
[13] Ho, C., Li, H., Kuo, F., & Lee, S. (2006). Incremental mining of sequential patterns over a stream sliding window. In Proceedings of the ICDM workshops (pp. 677-681).
[14] Kim, D., & Yun, U. (2016). Mining high utility itemsets based on the time decaying model. Intelligent Data Analysis, 20(5), 1157-1180. · doi:10.3233/IDA-160861
[15] Li, H. F., Huang, H. Y., Chen, Y. C., Liu, Y. J., & Lee, S. Y. (2008). Fast and memory efficient mining of high utility itemsets in data streams. In Proceedings of the 8th IEEE international conference on data mining (pp. 881-886).
[16] Lin, W. Y., Yang, S. F., & Hong, T. P. (2013). Memory-aware mining of indirect associations over data streams. In IDAM 2013. Amsterdam: Springer.
[17] Liu, Y., Liao, W., & Choudhary, A. (2005). A fast high utility itemsets mining algorithm. In Proceedings of the 1st international workshop on utility-based data mining (pp. 90-99).
[18] Manku, G. S., & Motwani, R. (2002). Approximate frequency counts over data streams. In Proceedings of VLDB, (pp. 346-357).
[19] Marascu, A., & Masseglia, F. (2005). Mining sequential patterns from temporal streaming data. In Proceedings of the ECML/PKDD workshop on mining complex data (pp. 355-359).
[20] McDunn, J., Husain, K., Polpitiya, A., Burykin, A., Ruan, J., Li, Q., et al. (2008). Plasticity of the systemic inflammatory response to acute infection during critical illness: Development of the riboleukogram. PloS ONE, 3(2), e1564. · doi:10.1371/journal.pone.0001564
[21] Mendes, L., Ding, B., & Han, J. (2008). Stream sequential pattern mining with precise error bounds. In ICDM ’08 (pp. 941-946).
[22] Metwaly, A., Agrawal, D., & Abadi, A. (2005). Efficient computation of frequent and top-k elements in data streams. In ICDT (pp. 398-412). Berlin: Springer.
[23] Mooney, C. H., & Roddick, J. F. (2013). Sequential pattern mining approaches and algorithms. ACM Computing Surveys, 45(2), 19:1-19:39. · Zbl 1293.68246 · doi:10.1145/2431211.2431218
[24] Pei, J., Han, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., & Hsu, M. (2004). Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Transactions on Knowledge and Data Engineering, 16, 1424-1440. · doi:10.1109/TKDE.2004.77
[25] Pisharath, J., Liu, Y., Ozisikyilmaz, B., Narayanan, R., Liao, W. K., Choudhary, A., & Memik, G. (2012). Nu-minebench version 2.0 dataset and technical report. http://cucis.ece.northwestern.edu/projects/dms/minebench.html.
[26] Raissi, C., Poncelet, P., & Teisseire, M (2006). Speed: Mining maximal sequential patterns over data streams. In Proceedings of the IEEE international conference on intelligent systems (pp. 546-552).
[27] Ryang, H., & Yun, U. (2016). High utility pattern mining over data streams with sliding window technique. Expert Systems with Applications, 57, 214-231. · doi:10.1016/j.eswa.2016.03.001
[28] Salle, P., Bringay, S., & Teisseire, M. (2009). Mining discriminant sequential patterns for aging brain. In Artificial intelligence in medicine: 12th conference on artificial intelligence in medicine, AIME 2009, Verona, Italy, Proceedings (pp. 365-369). Berlin: Springer.
[29] Shie, B. E., Yu, P. S., & Tseng, V. S. (2012). Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Systems with Applications, 39, 12947-12960. · doi:10.1016/j.eswa.2012.05.035
[30] Shie, B. E., Hsiao, H. F., & Tseng, V. S. (2013). Efficient algorithms for discovering high utility user behavior patterns in mobile commerce environments. Knowledge and Information systems, 37(2), 363-387. · doi:10.1007/s10115-012-0483-z
[31] Srikant, R., & Agrawal, R. (1996). Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the international conference on extending database technology: Advances in database technology (pp. 3-17).
[32] Tseng, V. S., Chu, C. J., & Liang, T. (2006). Efficient mining of temporal high-utility itemsets from data streams. In ACM KDD utility based data mining (pp. 18-27).
[33] Wang, J. Z., Yang, Z. H., & Huang, J. L. (2014). An efficient algorithm for high utility sequential pattern mining. In Frontier and innovation in future computing and communications (Vol. 301, pp. 49-56). Amsterdam: Springer.
[34] Yin, J., Zheng, Z., & Cao, L. (2012). Uspan: An efficient algorithm for mining high utility sequential patterns. In Proceedings of ACM SIGKDD (pp. 660-668).
[35] Yin, J., Zheng, Z., Cao, L., Song, Y., & Wei, W. (2013). Efficiently mining top-k high utility sequential patterns. In IEEE 13th international conference on data mining (ICDM) (pp. 1259-1264).
[36] Zaki, M. J. (2001). Spade: An efficient algorithm for mining frequent sequences. Machine Learning, 42, 31-60. · Zbl 0970.68052 · doi:10.1023/A:1007652502315
[37] Zihayat, M., Wu, C. W., An, A., & Tseng, V. S. (2015). Mining high utility sequential patterns from evolving data streams. In Proceedings of the ASE BigData & SocialInformatics 2015, ASE BD&SI ’15 (pp. 52:1-52:6). New York, NY: ACM. doi:10.1145/2818869.2818883.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.