Using multiple time series analysis for geosensor data forecasting. (English) Zbl 1435.62338

Summary: Forecasting in geophysical time series is a challenging problem with numerous applications. The presence of correlation (i.e. spatial correlation across several sites and time correlation within each site) poses difficulties with respect to traditional modeling, computation and statistical theory. This paper presents a cluster-centric forecasting methodology that allows us to yield a characterization of correlation in geophysical time series through a spatio-temporal clustering step. The clustering phase is designed for partitioning time series of numeric data routinely sampled at specific space locations. A forecasting model is then computed by resorting to multivariate time series analysis, in order to predict the future values of a time series by utilizing not only its own historical values, but also information from other cluster-time series. Experimental results highlight the importance of dealing with both temporal and spatial correlation and validate the proposed cluster-centric strategy in the computation of a multivariate time series forecasting model.


62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62M20 Inference from stochastic processes and prediction
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62P35 Applications of statistics to physics
86A32 Geostatistics
Full Text: DOI


[1] Appice, A.; Ciampi, A.; Malerba, D., Summarizing numeric spatial data streams by trend cluster discovery, Data Min. Knowl. Discov., 29, 1, 84-136 (2015) · Zbl 1403.68179
[2] Appice, A.; Guccione, P.; Malerba, D.; Ciampi, A., Dealing with temporal and spatial correlations to classify outliers in geophysical data streams, Inform. Sci., 285, 162-180 (2014) · Zbl 1355.68222
[3] Appice, A.; Pravilovic, S.; Malerba, D.; Lanza, A., Enhancing regression models with spatio-temporal indicator additions, (Baldoni, M.; Baroglio, C.; Boella, G.; Micalizio, R., AI*IA 2013: Advances in Artificial Intelligence, vol. 8249 of Lecture Notes in Computer Science (2013), Springer International Publishing), 433-444
[4] Asteriou, D.; Hall, S., ARIMA models and the box-jenkins methodology, Applied Econometrics, 265-286 (2011), Palgrave MacMillan
[5] Barbosa, S. M.; Silva, M. E.; Fernandes, M. J., Multivariate autoregressive modelling of sea level time series from TOPEX/poseidon satellite altimetry, Nonlinear Proc. Geoph., 13, 2, 177-184 (2006)
[6] Benjamini, Y.; Hochberg, Y., Controlling the false discovery rate: a practical and powerful approach to multiple testing, J. Roy. Stat. Soc. B, 57, 1, 289-300 (1995) · Zbl 0809.62014
[7] Birant, D.; Kut, A., ST-DBSCAN: an algorithm for clustering spatial-temporal data, Data Knowl. Eng., 60, 1, 208-221 (2007)
[8] Cheng, M.-Y.; Fan, J.; Spokoiny, V., Dynamic nonparametric filtering with application to volatility estimation, Recent advances and trends in nonparametric statistics, 315-333 (2003), Elsevier B. V.: Elsevier B. V. Amsterdam
[9] Conover, W., Practical nonparametric statistics, Wiley Series in Probability and Statistics: Applied Probability and Statistics (1999), Wiley
[10] De Luna, X.; Genton, M. G., Predictive spatio-temporal models for spatially sparse environmental data, Stat. Sinica, 15, 2, 547-568 (2005) · Zbl 1070.62080
[11] Demšar, J., Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res., 7, 1-30 (2006) · Zbl 1222.68184
[12] Egrioglu, E.; Yolcu, U.; Aladag, C.; Bas, E., Recurrent multiplicative neuron model artificial neural network for non-linear time series forecasting, Neural Process. Lett., 41, 2, 249-258 (2015)
[13] Elsayed, T.; Lin, J. J.; Oard, D. W., Pairwise document similarity in large collections with mapreduce, ACL 2008, Proceedings of the \(46^{th}\) Annual Meeting of the Association for Computational Linguistics, June 15-20, 2008, Columbus, Ohio, USA, Short Papers, The Association for Computer Linguistics, 265-268 (2008)
[14] Fan, J.; Han, F.; Liu, H., Challenges of big data analysis, NSR Natl. Sci. Rev., 1, 2, 293-314 (2014)
[15] Gelman, A.; Carlin, J.; Stern, H.; Dunson, D.; Vehtari, A.; Rubin, D., Bayesian data analysis (2013), Chapman & Hall/CRC Texts in Statistical Science, Taylor & Francis
[16] Golub, G. H.; Loan, C. F.V., Matrix Computations (2013), JHU Press · Zbl 1268.65037
[17] Golyandina, N.; Zhigljavsky, A., Singular spectrum analysis for time series (2013), SpringerBriefs in Statistics, Springer-Verlag Berlin Heidelberg · Zbl 1276.62053
[18] Guttorp, P.; Schmidt, A. M., Covariance structure of spatial and spatiotemporal processes, Wiley Interdiscip. Rev. Comput. Stat., 5, 4, 279-287 (2013)
[19] Hubert, L.; Arabie, P., Comparing partitions, J. Classif., 2, 1, 193-218 (1985)
[20] Hubrich, K.; Lütkepohl, H.; Saikkonen, P., A review of systems cointegration tests, Econometric Rev., 20, 3, 247-318 (2001) · Zbl 1044.62120
[21] Hyndman, R. J.; Athanasopoulos, G., Forecasting: Principles and Practice (2013), OTexts
[22] Hyndman, R. J.; Khandakar, Y., Automatic time series forecasting: the forecast package for r, J. Stat. Software, 27, 3, 1-22 (2008)
[23] Hyndman, R. J.; Koehler, A. B., Another look at measures of forecast accuracy, Int. J. Forecasting, 22, 4, 679-688 (2006)
[24] Kamarianakis, Y.; Prastacos, P., Space-time modeling of traffic flow, Comput. Geosci., 31, 2, 119-133 (2005)
[25] Kisilevich, S.; Mansmann, F.; Nanni, M.; Rinzivillo, S., Spatio-temporal clustering, (Maimon, O.; Rokach, L., Data Mining and Knowledge Discovery Handbook (2010), Springer US), 855-874
[26] Li, L.; Noorian, F.; Moss, D. J.M.; Leong, P. H.W., Rolling window time series prediction using mapreduce, (Joshi, J.; Bertino, E.; Thuraisingham, B. M.; Liu, L., Proceedings of the \(15^{th}\) IEEE International Conference on Information Reuse and Integration, IRI 2014, Redwood City, CA, USA, August 13-15, 2014, IEEE (2014)), 757-764
[27] Liu, Q.; Deng, M.; Bi, J.; Yang, W., A novel method for discovering spatio-temporal clusters of different sizes, shapes, and densities in the presence of noise, Int. J. Digit. Earth, 7, 2, 138-157 (2014)
[28] Lütkepohl, H., New Introduction to Multiple Time Series Analysis (2005), Springer-Verlag Berlin Heidelberg · Zbl 1072.62075
[29] Lütkepohl, H.; Krätzig, M., Applied Time Series Econometrics, Themes in Modern Econometrics (2004), Cambridge University Press
[30] Martin, V.; Hurn, S.; Harris, D., Econometric modelling with time series: specification, Estimation and Testing (2012), Cambridge University Press
[31] Matteson, D. S.; Tsay, R. S., Dynamic orthogonal components for multivariate time series, J. Am. Stat. Assoc., 106, 496, 1450-1463 (2011) · Zbl 1323.62086
[32] Montero-Lorenzo, J.-M.; Fernndez-Avils, G.; Mondjar-Jimnez, J.; Vargas-Vargas, M., A spatio-temporal geostatistical approach to predicting pollution levels: the case of mono-nitrogen oxides in madrid, Comput. Environ. Urban Syst., 37, 95-106 (2013)
[33] Ohashi, O.; Torgo, L., Wind speed forecasting using spatio-temporal indicators, ECAI \(2012 - 20^{th}\) European Conference on Artificial Intelligence. Including Prestigious Applications of Artificial Intelligence (PAIS-2012) System Demonstrations Track, Montpellier, France, August 27-31 , 2012, 975-980 (2012)
[34] Paulson, D., Applied Statistical Designs for the Researcher (2003), Chapman & Hall/CRC Biostatistics Series, Taylor & Francis · Zbl 1141.62033
[35] Pfaff, B., Analysis of Integrated and Cointegrated Time Series with R (2008), Springer: Springer New York
[37] Pokrajac, D.; Obradovic, Z., Improved spatial-temporal forecasting through modelling of spatial residuals in recent history, Proceedings of the First SIAM International Conference on Data Mining, SDM 2001, Chicago, IL, USA, April 5-7, 2001, 1-17 (2001)
[38] Pravilovic, S.; Appice, A.; Malerba, D., An intelligent technique for forecasting spatially correlated time series, (Baldoni, M.; Baroglio, C.; Boella, G.; Micalizio, R., AI*IA 2013: Advances in Artificial Intelligence SE - 39, vol. 8249 of Lecture Notes in Computer Science, Springer International Publishing (2013)), 457-468
[39] Pravilovic, S.; Appice, A.; Malerba, D., Integrating cluster analysis to the ARIMA model for forecasting geosensor data, (Andreasen, T.; Christiansen, H.; Cubero, J.-C.; Ra, Z., Foundations of Intelligent Systems, vol. 8502 of Lecture Notes in Computer Science (2014), Springer International Publishing), 234-243
[40] Qin, K.; Chen, Y.; Zhan, Y.; Cheng, F., Spatial clustering considering spatio-temporal correlation, Geoinformatics, \(2011 19^{th}\) International Conference on, 1-4 (2011)
[42] Refinetti, R.; Lissen, G. C.; Halberg, F., Procedures for numerical analysis of circadian rhythms, Biol. Rhythm. Res., 38, 4, 275-325 (2007)
[43] Reynolds, A.; Richards, G.; de la Iglesia, B.; Rayward-Smith, V., Clustering rules: a comparison of partitioning and hierarchical clustering algorithms, J. Math. Model. Algorithms, 5, 4, 475-504 (2006) · Zbl 1104.62073
[44] Rousseeuw, P. J., Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math., 20, 53-65 (1987) · Zbl 0636.62059
[45] Saengseedam, P.; Kantanantha, N., Spatial time series forecasts based on Bayesian linear mixed models for rice yields in Thailand, Proceedings of the International MultiConference of Engineers and Computer Scientists, IMECS 2014, vol. II, 1007-1012 (2014), Newswood Limited
[46] Schelter, S.; Boden, C.; Markl, V., Scalable similarity-based neighborhood methods with MapReduce, (Cunningham, P.; Hurley, N. J.; Guy, I.; Anand, S. S., Sixth ACM Conference on Recommender Systems, RecSys ’12, Dublin, Ireland, September 9-13, 2012 (2012), ACM), 163-170
[47] Sokolove, P. G.; Bushell, W. N., The chi square periodogram: its utility for analysis of circadian rhythms, J. Theor. Biol., 72, 1, 131-160 (1978)
[48] Struyf, A.; Hubert, M.; Rousseeuw, P., Clustering in an object-oriented environment, J. Stat. Software, 1, 4, 1-30 (1997)
[49] Tsay, R. S., Multivariate time series analysis, With R and Financial Applications (2014), Wiley
[50] Wickham, H., Advanced R(Chapman & Hall/CRC The R Series) (2014), Chapman and Hall/CRC
[51] Xianfeng, Y.; Liming, L., A new data mining algorithm based on MapReduce and Hadoop, Int. J. Signal Process. Image Process. Pattern Recognit., 7, 2, 131-142 (2014)
[52] Xu, K. S.; Kliger, M.; Hero, A. O., Adaptive evolutionary clustering, Data Min. Knowl. Discov., 28, 2, 304-336 (2014) · Zbl 1281.68200
[53] Zivot, E.; Wang, J., Modeling Financial Time Series with S-PLUS® (2006), Springer New York: Springer New York New York, NY
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.