×

Discovery of time series motifs on Intel many-core systems. (English) Zbl 1458.62209

Summary: A motif is a pair of subsequences of a longer time series, which are very similar to each other. Motif discovery is applied in a wide range of subject areas involving time series: medicine, biology, entertainment, weather prediction, and others. In this paper, we propose a novel parallel algorithm for motif discovery using Intel MIC (Many Integrated Core) accelerators in the case when time series fit in the main memory. We perform parallelization through thread-level parallelism and OpenMP technology. The algorithm employs a set of matrix data structures to store and index the subsequences of a time series and to provide an efficient vectorization of computations on the Intel MIC platform. The experimental evaluation shows the high scalability of the proposed algorithm.

MSC:

62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
65Y10 Numerical algorithms for specific classes of architectures
68W10 Parallel algorithms in computer science
62-08 Computational methods for problems pertaining to statistics

Software:

iSAX
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bacon, D. F.; Graham, S. L.; Sharp, O. J., Compiler transformations for high-performance computing, ACM Comput. Surv., 26, 345-420 (1994) · doi:10.1145/197405.197406
[2] B. Y. Chiu, E. J. Keogh, and S. Lonardi, “Probabilistic discovery of time series motifs,” in Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, Aug. 24-27, 2003 (2003), pp. 493-498. 10.1145/956750.956808
[3] G. Chrysos, “Intel registered Xeon Phi coprocessor (codename Knights Corner),” in Proceedings of the 2012 IEEE Hot Chips 24th Symposium (HCS), Cupertino, CA, USA, Aug. 27-29, 2012 (2012), pp. 1-31. 10.1109/HOTCHIPS.2012.7476487
[4] Goldberger, A.; Amaral, L.; Glass, L.; Hausdorff, J.; Ivanov, P.; Mark, R.; Mietus, J.; Moody, G.; Peng, C.; Stanley, H., PhysioBank, Physio Toolkit, and PhysioNet: components of a new research resource for complex physiologic signals, Circulation, 101, 23, 215-220 (2000) · doi:10.1161/01.CIR.101.23.e215
[5] P. Kostenetskiy and P. Semenikhina, “SUSU supercomputer resources for industry and fundamental science,” in Proceedings of the 2018 Global Smart Industry Conference (GloSIC), Chelyabinsk, Russia, Nov. 13-15, 2018 (2018), p. 8570068. 10.1109/GloSIC.2018.8570068
[6] Kraeva, Ya; Zymbler, M., Scalable algorithm for subsequence similarity search in very large time series data on cluster of Phi KNL, Proceedings of the 20th International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2018, Moscow, Russia, Oct. 9-12, 2018, Commun. Comput. Inform. Sci., 1003, 149-164 (2019)
[7] T. Mattson, “Introduction to OpenMP,” in Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, Nov. 11-17, 2006, Tampa, FL, USA (ACM Press, 2006). 10.1145/1188455.1188673
[8] J. Meng, J. Yuan, M. Hans, and Y. Wu, “Mining motifs from human motion,” in Proceedings of the Eurographics 2008, Crete, Greece, April 14-18, 2008 (Eurographics Association, 2008), pp. 71-74.
[9] D. Minnen, C. L. Isbell, I. A. Essa, and T. Starner, “Discovering multivariate motifs using subsequence density estimation and greedy mixture learning,” in Proceedings of the 22nd AAAI Conference on Artificial Intelligence, July 22-26, 2007, Vancouver, British Columbia, Canada (AAAI Press, 2007), pp. 615-620.
[10] A. Mueen, E. J. Keogh, Q. Zhu, S. Cash, and M. B. Westover, “Exact discovery of time series motifs,” in Proceedings of the SIAM International Conference on Data Mining, SDM 2009, April 30-May 2, 2009, Sparks, Nevada, USA (SIAM, 2009), pp. 473-484. 10.1137/L9781611972795.41
[11] Narang, A.; Bhattacherjee, S., Parallel exact time series motif discovery, Proceedings of the 16th International Euro-Par Conference, Ischia, Italy, Aug. 31-Sept. 3, 2010, Lect. Notes Comput. Sci., 6272, 304-315 (2010)
[12] Padua, D. A., POSIX threads (pthreads), Encyclopedia of Parallel Computing, 1592-1593 (2011), Berlin: Springer, Berlin
[13] P. Patel, E. J. Keogh, J. Lin, and S. Lonardi, “Mining motifs in massive time series databases,” in Proceedings of the 2002 IEEE International Conference on Data Mining ICDM 2002, Dec. 9-12, 2002, Maebashi City, Japan (IEEE Comput. Soc., 2002), pp. 370-377. 10.1109/ICDM.2002.1183925
[14] Pearson, K., Theproblem of therandom walk, Nature (London, U.K.), 72, 1865, 294 (1905) · JFM 36.0303.02 · doi:10.1038/072294b0
[15] J. Shieh and E. J. Keogh, “iSAX: indexing and mining terabyte sized time series,” in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, Aug.24-27, 2008 (ACM, 2008), pp. 623-631. 10.1145/1401890.1401966
[16] A. Sodani, “Knights Landing (KNL): 2nd generation Intel® Xeon Phi processor,” in Proceedings of the 2015 IEEE Hot Chips 27th Symposium HCS, Cupertino, CA, USA, Aug. 22-25, 2015 (IEEE, 2015), pp. 1-24. doi 10.1109/HOTCHIPS.2015.7477467
[17] Sokolinskaya, I.; Sokolinsky, L., Revised pursuit algorithm for solving non-stationary linear programming problems on modern computing clusters with manycore accelerators, Proceedings of the 2nd Russian Conference Supercomputing Days, RuSCDays 2016, Moscow, Russia, Sept. 26-27, 2016, Commun. Comput. Inform. Sci, 687, 212-223 (2016)
[18] Tanaka, Y.; Iwamoto, K.; Uehara, K., Discovery of time-series motif from multi-dimensional data based on MDL Principle, Machine Learning, 58, 269-300 (2005) · Zbl 1075.62084 · doi:10.1007/s10994-005-5829-2
[19] Wilson, D. R.; Martinez, T. R., Reduction techniques for instance-based learning algorithms, Machine Learning, 38, 257-286 (2000) · Zbl 0954.68126 · doi:10.1023/A:1007626913721
[20] Zymbler, M.; Polyakov, A.; Kipnis, M., Time series discord discovery on Intel many-core systems, Proceedings of the 13th International Conference, PCT 2019, Kaliningrad, Russia, April 2-4, 2019, Commun. Comput. Inform. Science, 1063, 168-182 (2019)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.