Temporal pattern attention for multivariate time series forecasting. (English) Zbl 07097476

Summary: Forecasting of multivariate time series data, for instance the prediction of electricity consumption, solar power production, and polyphonic piano pieces, has numerous valuable applications. However, complex and non-linear interdependencies between time steps and series complicate this task. To obtain accurate prediction, it is crucial to model long-term dependency in time series data, which can be achieved by recurrent neural networks (RNNs) with an attention mechanism. The typical attention mechanism reviews the information at each previous time step and selects relevant information to help generate the outputs; however, it fails to capture temporal patterns across multiple time steps. In this paper, we propose using a set of filters to extract time-invariant temporal patterns, similar to transforming time series data into its “frequency domain”. Then we propose a novel attention mechanism to select relevant time series, and use its frequency domain information for multivariate forecasting. We apply the proposed model on several real-world tasks and achieve state-of-the-art performance in almost all of cases. Our source code is available at https://github.com/gantheory/TPA-LSTM.


62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI arXiv


[1] Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. ICLR.
[2] Bloomfield, P. (1976). Fourier analysis of time series: An introduction. New York, NY: Wiley. · Zbl 0353.62051
[3] Bouchachia, A., & Bouchachia, S. (2008). Ensemble learning for time series prediction. Proceedings of the 1st international workshop on nonlinear dynamics and synchronization.
[4] Box, G. E., Reinsel, G. C., Jenkins, G. M., & Ljung, G. M. (2015). Time series analysis: Forecasting and control. Hoboken, NJ: Wiley. · Zbl 1317.62001
[5] Cao, L. J., & Tay, F. E. H. (2003). Support vector machine with adaptive parameters in financial time series forecasting. IEEE Transactions on Neural Networks, pp. 1506-1518.
[6] Chen, S., Wang, X. X., & Harris, C. J. (2008). Narxbased nonlinear system identification using orthogonal least squares basis hunting. IEEE Transactions on Control Systems, pp. 78-84.
[7] Cho, K., Bahdanau, D., Van Merrienboer, B., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:14091259.
[8] Chuan, C. H., & Herremans, D. (2018). Modeling temporal tonal relations in polyphonic music through deep networks with a novel image-based representation. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16679.
[9] Connor, J., Atlas, L. E., & Martin, D. R. (1991). Recurrent networks and NARMA modeling. Advances in Neural Information Processing Systems, pp. 301-308.
[10] Dasgupta, S., & Osogami, T. (2017). Nonlinear dynamic Boltzmann machines for time-series prediction.
[11] Dong, H.-W., Yang, L. C., Hsiao, W.-Y., & Yang, Y. H. (2018). MuseGAN: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment.
[12] Elman, J. L. (1990). Finding structure in time. Cognitive Science, pp. 179-211.
[13] Frigola, R., & Rasmussen, C. E. (2014). Integrated pre-processing for Bayesian nonlinear system identification with Gaussian processes. IEEE Conference on Decision and Control, pp. 552-560.
[14] Frigola-Alcade, R. (2015). Bayesian time series learning with Gaussian processes. Ph.D. thesis, University of Cambridge.
[15] Hochreiter, S.; Schmidhuber, J., Long short-term memory, Neural Computation, 9, 1735-1780, (1997)
[16] Huang, NE; Shen, Z.; Long, SR; Wu, MC; Shih, HH; Zheng, Q.; etal., The empirical mode decomposition and Hilbert spectrum for nonlinear and nonstationary time series analysis, Proceedings of the Royal Society of London. Series A, 454, 903-995, (1998) · Zbl 0945.62093
[17] Jain, A.; Kumar, AM, Hybrid neural network models for hydrologic time series forecasting, Applied Soft Computing, 7, 585-592, (2007)
[18] Kim, KJ, Financial time series forecasting using support vector machines, Neurocomputing, 55, 307-319, (2003)
[19] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, pp. 1097-1105.
[20] Lai, G., Chang, W. C., Yang, Y., & Liu, H. (2018). Modeling long- and short-term temporal patterns with deep neural networks. SIGIR, pp. 95-104.
[21] LeCun, Y., & Bengio, Y. (1995). Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks.
[22] Luong, T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 conference on empirical methods in natural language processing, pp. 1412-1421.
[23] Nicolas Boulanger-Lewandowski, Y. B., & Vincent, P. (2012). Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription.
[24] Qin, Y., Song, D., Cheng, H., Cheng, W., Jiang, G., & Cottrell, G. W. (2017). A dual-stage attention-based recurrent neural network for time series prediction. In IJCAI’17, pp. 2627-2633. http://dl.acm.org/citation.cfm?id=3172077.3172254.
[25] Raffel, C. (2016). Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching. Ph.D. thesis.
[26] Rippel, O., Snoek, J., & Adams, R. P. (2015). Spectral representations for convolutional neural networks. NIPS, pp. 2449-2457.
[27] Roberts, S., Osborne, M., Ebden, M., Reece, S., Gibson, N., & Aigrain, S. (2011). Gaussian processes for time-series modelling. Philosophical Transactions of the Royal Society A. · Zbl 1353.62103
[28] Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by backpropagating errors. Nature, pp. 533-536. · Zbl 1369.68284
[29] Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, pp. 3104-3112.
[30] Tong, H., & Lim, K. S. (2009). Threshold autoregression, limit cycles and cyclical data. In Exploration of a nonlinear world: An appreciation of Howell Tong’s contributions to statistics, World Scientific, pp. 9-56.
[31] Vapnik, V., Golowich, S. E., & Smola, A. (1997). Support vector method for function approximation, regression estimation, and signal processing. Advances in Neural Information Processing Systems, pp. 281-287.
[32] Werbos, P. J. (1990). Backpropagation through time: What it does and how to do it. Proceedings of the IEEE, pp. 1550-1560.
[33] Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing, pp. 159-175. · Zbl 1006.68828
[34] Zhang, G., Patuwo, B. E., & Hu, M. Y. (1998). Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting, pp. 35-62.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.