×

Time series outlier detection based on sliding window prediction. (English) Zbl 1407.62335

Summary: In order to detect outliers in hydrological time series data for improving data quality and decision-making quality related to design, operation, and management of water resources, this research develops a time series outlier detection method for hydrologic data that can be used to identify data that deviate from historical patterns. The method first built a forecasting model on the history data and then used it to predict future values. Anomalies are assumed to take place if the observed values fall outside a given prediction confidence interval (PCI), which can be calculated by the predicted value and confidence coefficient. The use of PCI as threshold is mainly on the fact that it considers the uncertainty in the data series parameters in the forecasting model to address the suitable threshold selection problem. The method performs fast, incremental evaluation of data as it becomes available, scales to large quantities of data, and requires no preclassification of anomalies. Experiments with different hydrologic real-world time series showed that the proposed methods are fast and correctly identify abnormal data and can be used for hydrologic time series analysis.

MSC:

62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)

Software:

Rainbow
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Salas, J. D., Analysis and modeling of hydrologic time series, Handbook of Hydrology, 19, 1-72 (1993)
[2] Gujer, W., Systems Analysis for Water Technology (2008), Berlin, Germany: Springer, Berlin, Germany · doi:10.1007/978-3-540-77278-1
[3] Lauzon, N., Water resources data quality assessment and description of natural processes using artificial intelligence techniques [Ph.D. thesis] (2003), University of British Columbia
[4] Yang, K.; Shahabi, C., A PCA-based similarity measure for multivariate time series, Proceedings of the 2nd ACM International Workshop on Multimedia Databases, ACM
[5] Box, G. E. P.; Jenkins, G. M.; Reinsel, G. C., Time Series Analysis: Forecasting and Control (2013), New York, NY, USA: John Wiley & Sons, New York, NY, USA
[6] Machiwal, D.; Jha, M. K., Hydrologic Time Series Analysis: Theory and Practice (2012), New York, NY, USA: Springer, New York, NY, USA · doi:10.1007/978-94-007-1861-6
[7] Hawkins, D. M., Identification of Outliers, 11 (1980), London, UK: Chapman &Hall, London, UK · Zbl 0438.62022
[8] Chandola, V.; Banerjee, A.; Kumar, V., Anomaly detection: a survey, ACM Computing Surveys, 41, 3, article 15 (2009) · doi:10.1145/1541880.1541882
[9] Gupta, M.; Gao, J.; Aggarwal, C.; Han, J., Outlier Detection for Temporal Data. Outlier Detection for Temporal Data, Synthesis Lectures on Data Mining and Knowledge Discovery (2014), Morgan & Claypool · Zbl 1307.62002
[10] Hodge, V. J.; Austin, J., A survey of outlier detection methodologies, Artificial Intelligence Review, 22, 2, 85-126 (2004) · Zbl 1101.68023 · doi:10.1023/B:AIRE.0000045502.10941.a9
[11] Das, K.; Schneider, J., Detecting anomalous records in categorical datasets, Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM · doi:10.1145/1281192.1281219
[12] Breuniq, M. M.; Kriegel, H.-P.; Ng, R. T.; Sander, J., LOF: identifying density-based local outliers, ACM Sigmod Record, 29, 2, 93-104 (2000)
[13] He, Z.; Xu, X.; Deng, S., Discovering cluster-based local outliers, Pattern Recognition Letters, 24, 9-10, 1641-1650 (2003) · Zbl 1048.68084 · doi:10.1016/S0167-8655(03)00003-5
[14] Aggarwal, C. C.; Yu, P. S., Outlier detection with uncertain data, Proceedings of the 8th SIAM International Conference on Data Mining
[15] Ando, S., Clustering needles in a haystack: An information theoretic analysis of minority and outlier detection, Proceedings of the 7th IEEE International Conference on Data Mining (ICDM ’07) · doi:10.1109/ICDM.2007.53
[16] Agovic, A.; Arindam, B.; Auroop, G.; Vladimir, P., Anomaly detection using manifold embedding and its applications in transportation corridors, Intelligent Data Analysis, 13, 3, 435-455 (2009) · doi:10.3233/IDA-2009-0375
[17] Barua, S.; Alhajj, R., A parallel multi-scale region outlier mining algorithm for meteorological data, Proceedings of the 15th ACM International Symposium on Advances in Geographic Information Systems (GIS ’07), ACM · doi:10.1145/1341012.1341075
[18] Rasheed, F.; Peng, P.; Alhajj, R.; Rokne, J., Fourier transform based spatial outlier mining, Intelligent Data Engineering and Automated Learning—IDEAL 2009. Intelligent Data Engineering and Automated Learning—IDEAL 2009, Lecture Notes in Computer Science, 5788, 317-324 (2009), Berlin, Germany: Springer, Berlin, Germany · doi:10.1007/978-3-642-04394-9_39
[19] Rebbapragada, U.; Protopapas, P.; Brodley, C. E.; Alcock, C., Finding anomalous periodic time series : an application to catalogs of periodic variable stars, Machine Learning, 74, 3, 281-313 (2009) · Zbl 1470.68162 · doi:10.1007/s10994-008-5093-3
[20] Fox, A. J., Outliers in time series, Journal of the Royal Statistical Society. Series B (Methodological), 34, 350-363 (1972) · Zbl 0249.62089
[21] Ma, J.; Perkins, S., Online novelty detection on temporal sequences, Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’03), ACM · doi:10.1145/956750.956828
[22] Hill, D. J.; Minsker, B. S., Anomaly detection in streaming environmental sensor data: a data-driven modeling approach, Environmental Modelling and Software, 25, 9, 1014-1022 (2010) · doi:10.1016/j.envsoft.2009.08.010
[23] Oliveira, A. L. I.; Meira, S. R. L., Detecting novelties in time series through neural networks forecasting with robust confidence intervals, Neurocomputing, 70, 1-3, 79-92 (2006) · doi:10.1016/j.neucom.2006.05.008
[24] Keogh, E.; Lin, J.; Fu, A. W.; Van Herle, H., Finding unusual medical time-series subsequences: algorithms and applications, IEEE Transactions on Information Technology in Biomedicine, 10, 3, 429-439 (2006) · doi:10.1109/TITB.2005.863870
[25] Keogh, E.; Lin, J.; Fu, A., HOT SAX: efficiently finding the most unusual time series subsequence, Proceedings of the 5th IEEE International Conference on Data Mining (ICDM ’05) · doi:10.1109/ICDM.2005.79
[26] Wei, L.; Keogh, E.; Xi, X., SAXually explicit images: finding unusual shapes, Proceedings of the 6th International Conference on Data Mining (ICDM ’06) · doi:10.1109/ICDM.2006.138
[27] Bu, Y.; Leung, T.-W.; Fu, A. W.-C.; Keogh, E. J.; Pei, J.; Meshkin, S., WAT: finding top-K discords in time series database, Proceedings of the 7th SIAM International Conference on Data Mining (SDM ’07)
[28] Lin, J.; Keogh, E.; Fu, A.; van Herle, H., Approximations to magic: finding unusual medical time series, Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems
[29] Gupta, M.; Gao, J.; Aggarwal, C. C.; Han, J., Outlier detection for temporal data: a survey, IEEE Transactions on Knowledge and Data Engineering, 26, 9, 1 (2014)
[30] Zhang, Y.; Hamm, N. A. S.; Meratnia, N.; Stein, A.; van de Voort, M.; Havinga, P. J. M., Statistics-based outlier detection for wireless sensor networks, International Journal of Geographical Information Science, 26, 8, 1373-1392 (2012) · doi:10.1080/13658816.2012.654493
[31] Frieda, R.; Agueusopa, I.; Bornkampb, B., Bayesian outlier detection in INGARCH time series
[32] Grané, A.; Veiga, H., Wavelet-based detection of outliers in financial time series, Computational Statistics & Data Analysis, 54, 11, 2580-2593 (2010) · Zbl 1284.91585 · doi:10.1016/j.csda.2009.12.010
[33] Bilen, C.; Huzurbazar, S., Wavelet-based detection of outliers in time series, Journal of Computational and Graphical Statistics, 11, 2, 311-327 (2002) · doi:10.1198/106186002760180536
[34] Chebana, F.; Ouarda, T. B. M. J., Depth-based multivariate descriptive statistics with hydrological applications, Journal of Geophysical Research: Atmospheres, 116, D10 (2011) · doi:10.1029/2010JD015338
[35] McCuen, R. H., Modeling Hydrologic Change: Statistical Methods (2002), New York, NY, USA: CRC Press, New York, NY, USA
[36] Interagency Advisory Committee on Water Data, Guidelines for Determining Flood Flow Frequency: Bulletin 17B (1982), Reston, Va, USA: U.S. Geological Survey, Office of Water Data Coordination, Reston, Va, USA
[37] Hyndman, R. J.; Shang, H. L., Rainbow plots, bagplots, and boxplots for functional data, Journal of Computational and Graphical Statistics, 19, 1, 29-45 (2010) · doi:10.1198/jcgs.2009.08158
[38] Chebana, F.; Dabo-Niang, S.; Ouarda, T. B. M. J., Exploratory functional flood frequency analysis and outlier detection, Water Resources Research, 48, 4 (2012) · doi:10.1029/2011WR011040
[39] Ng, W. W.; Panu, U. S.; Lennox, W. C., Chaos based Analytical techniques for daily extreme hydrological observations, Journal of Hydrology, 342, 1-2, 17-41 (2007) · doi:10.1016/j.jhydrol.2007.04.023
[40] Chambers, J. M.; Cleveland, W. S.; Kleiner, B.; Tukey, P. A., Graphical Methods for Data Analysis (1983), Belmont, Calif, USA: Wadsworth, Belmont, Calif, USA · Zbl 0532.65094
[41] Han, J.; Kamber, M., Data Mining: Concepts and Techniques (2001), San Francisco, Calif, USA: Morgan Kaufmann, San Francisco, Calif, USA
[42] Ma, J.; Perkins, S., Time-series novelty detection using one-class support vector machines, Proceedings of the International Joint Conference on Neural Networks · doi:10.1109/IJCNN.2003.1223670
[43] Fawcett, T., An introduction to ROC analysis, Pattern Recognition Letters, 27, 8, 861-874 (2006) · doi:10.1016/j.patrec.2005.10.010
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.