Discovery of time-series motif from multi-dimensional data based on MDL principle. (English) Zbl 1075.62084

Summary: Recently, the research on efficient extraction of previously unknown, frequently appearing patterns in a time-series data has received much attention. These patterns are called ‘motifs’. Motifs are useful for various time-series data mining tasks. We propose a motif discovery algorithm to extract a motif that represents a characteristic pattern of the given data based on the Minimum Description Length (MDL) principle. In addition, the algorithm can extract motifs from multi-dimensional time-series data by using Principal Component Analysis (PCA). In experimental evaluation, we show the efficiency of the motif discovery algorithm, and the usefulness of extracted motifs to various data mining tasks.


62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62H25 Factor analysis and principal components; correspondence analysis
68U99 Computing methodologies and applications
Full Text: DOI


[1] Akaike, H. (1969). Fitting autoregressive model for prediction. The Annals of the Institute of Statistical Mathematics, 21, 243-247. · Zbl 0202.17301
[2] Berberidis, C., Vlahavas, I., Aref, W. G., Atallah, M., & Elmagarmid, A. K. (2002). On the discovery of weak periodicities in large time series. In Proc. of PAKDD 2002 (pp. 51-61). · Zbl 1020.68546
[3] Buhler, J., & Tompa, M. (2001). Finding motifs using random projections. In Proc. of 5th International Conference on Computer Molecular Biology (pp. 67-74).
[4] Caraca-Valente, J. P., & Lopez-Chavarrias, I. (2000). Discovering similar patterns in time series. In Proc. of 6th ACM SIGMOD International Confference on Knowledge Discovery and Data Mining (pp. 126-133).
[5] Chakrabarti, S., Sarawagi, S., & Dom, B. (1998). Mining surprising patterns using temporal description length. In Proc. of 24th International Conference on Very Large databases (pp. 606-617).
[6] Chiu, B., Keogh, E., & Lonardi, S. (2003). Probabilistic discovery of time series motif. In Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 493-498).
[7] Colomer, J., Melendez, J., & Gamero, F. I. (2002). Pattern recognition based on episodes and DTW, application to diagnosis of a level control system. In Proc. of 16th International Workshop on Qualitative Reasoning (pp. 37-43).
[8] Cyril, G., Peter, T., Egill, R., Arup, N. F., & Kai, H. L. (1999). On clustering fMRI time series. NeuroImage, 9:3, 298-310.
[9] Das, G., Lin, K., Mannila, H., Renganathan, G., & Smyth, P. (1998). Rule discovery from time series. In Proc. of the 4th ACM SIGMOD International Conference on Knowledge Discovery and Data Mining (pp. 16-22).
[10] Fradkin, D., & Madigan, D. (2003). Experiments with random projections for machine learning. In Proc. of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 517-522).
[11] Friedman, J. H., & Tukey, J. W. (1974). A projection pursuit algorithm for exploratory data analysis. IEEE Transactions on Computers, 23:9, 881-889. · Zbl 0284.68079
[12] Heras, D. B., Cabaleiro, J. C., Perez, V. B., Costas, P., & Rivera, F. F. (1996). Principal component analysis on vector computers. In Proc. of Vector and Parallel Processing 1996 (pp. 416-428).
[13] Hyvrinen, A., & Oja, E. (2000). Independent component analysis: Algorithms and applications. Neural Networks, 13:4, 411-430.
[14] Kashino, K., Smith, G., & Murase, H. (1999). Time-series active search for quick retrieval of audio and video. In Proc. of 1999 International Conference on Acoustics, Speech and Signal Processing (pp. 2993-2996).
[15] Keogh, E. (2004). The UCR Time Series Data Mining Archive [http://www.cs.ucr.edu/\(\sim\)eamonn/TSDMA/datasets. html]. Riverside CA, University of California: Computer Science & Engineering Department.
[16] Koopman, S. J., & Ooms, M. (2003). Time-series modeling of daily tax revenues. Statistica Neerlandica, 57:4, 439-469. · Zbl 1090.62568
[17] Levin, A. U., Leen, T. K., and Moody, J. E. (1993), Fast pruning using principal components. In Proc. of NIPS ?93 (pp. 35-42).
[18] Lin, J., Keogh, E., Lonardi, S., & Chiu, B. (2003). A symbolic representation of time series, with implications for streaming algorithms. In Proc. of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.
[19] Lin, J., Keogh, E., Lonardi, S., & Patel, P. (2002). Finding motifs in time series. In Proc. of the 2nd Workshop on Temporal Data Mining (pp. 53-68).
[20] Mori, T., & Uehara, K. (2001). Extraction of primitive motion and discovery of association rules from human motion. In Proc. of the 10th IEEE International Workshop on Robot and Human Communication (pp. 200-206).
[21] Myers, C. S., & Rabiner, L. R. (1981). A comparative study of several dynamic time-warping algorithms for connected word recognition. The Bell System Technical Journal, 7:60, 1389-1409.
[22] Navarro, G., & Baeza-Yates, R., (1999). Fast multi-dimensional approximate pattern matching. In Proc. of the 10th Annual Symposium on Combinatorial Pattern Matching (pp. 243-257). · Zbl 1063.68637
[23] Rissanen, J. (1989). Stochastic Complexity in Statistical Inquiry, Vol. 15. World Scientific Publishing. · Zbl 0800.68508
[24] Schwarz, G (1981). Estimating the dimension of a model. The Annals of Statistical Mathematics, 6:2, 461-464. · Zbl 0379.62005
[25] Shannon, C. E. (1949). Communication in the presence of noise. In Proc. of 2nd Institute of Radio Engineers (pp. 10-21).
[26] Tanaka, Y., & Uehara, K. (2003). Discover motifs in multi dimensional time-series using the principal component analysis and the MDL principle. In Proc. of 3rd International Conference on Machine Learning and Data Mining in Pattern Recognition (pp. 252-265). · Zbl 1029.68592
[27] Vigario, R., Jousmaki, V., Hamalainen, M., Hari, R., and Oja, E. (1998), Independent component analysis for identification of artifacts in magnetoencephalographic recording. In Proc. of NIPS ?97 (pp. 229-235).
[28] Vlachos, M., Hadjieleftheriou, M., Gunopulos, D., & Keogh, E. (2003). Indexing multi-dimensional time-series with support for multiple distance measure. In Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 216-225).
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.