×

Time series clustering and classification by the autoregressive metric. (English) Zbl 1452.62624

Summary: The statistical properties of the autoregressive (AR) distance between ARIMA processes are investigated. In particular, the asymptotic distribution of the squared AR distance and an approximation which is computationally efficient are derived. Moreover, the problem of time series clustering and classification is discussed and the performance of the AR distance is illustrated by means of some empirical applications.

MSC:

62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62-08 Computational methods for problems pertaining to statistics

Software:

AS 256; clusfind
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Agrawal, R.; Imielinski, T.; Swami, A., Database mining: a performance perspective, IEEE Trans. Knowledge Data Eng., 5, 914-925 (1993)
[2] Agrawal, R., Faloutsos, C., Swami, A., 1994. Efficient similarity search in sequence databases. Fourth Proceedings of F.O.D.O. ’93, Lecture Notes in Computer Science, vol. 730, Springer, New York, pp. 69-84.; Agrawal, R., Faloutsos, C., Swami, A., 1994. Efficient similarity search in sequence databases. Fourth Proceedings of F.O.D.O. ’93, Lecture Notes in Computer Science, vol. 730, Springer, New York, pp. 69-84.
[3] Akaike, H., A new look at the statistical model identification, IEEE Trans. Automatic Control, AC 19, 203-217 (1974)
[4] Alagón, J., Spectral discrimination of two groups of time series, J. Time Ser. Anal., 10, 203-214 (1989)
[5] Alonso, A. M.; Berrendero, J. R.; Hernández, A.; Justel, B., Time series clustering based on forecast densities, Comput. Statist. Data Anal., 51, 762-776 (2006) · Zbl 1157.62484
[6] Ananthanarayana, V. S.; Murty, M. N.; Subramanian, D. K., Efficient clustering of large data set, Pattern Recognition, 34, 2561-2563 (2001) · Zbl 1012.68895
[7] Anderson, T. W., Goodness of fit tests for spectral distributions, The Ann. Statist., 21, 830-847 (1993) · Zbl 0779.62083
[8] Ansley, C. F.; Newbold, P., Finite sample properties for Autoregressive Moving Average models, J. Econo., 13, 159-183 (1980) · Zbl 0432.62063
[9] Arabie, P.; Hubert, L. J., The bond energy algorithm revisited, IEEE Trans. Systems Man Cybernet., 20, 268-274 (1990)
[10] Baragona, R.; Battaglia, F.; Cucina, D., Clustering of time series with genetic algorithms, Metron, 59, 113-130 (2001) · Zbl 1053.62535
[11] Basawa, I. V.; Billard, L.; Srnivasan, R., Large sample tests of homogeneity for time series, Biometrika, 71, 203-206 (1984) · Zbl 0532.62066
[12] Bohte, Z., Cepar, D., Kosmelij, K., 1980. Clustering of time series. Proceedings of COMPSTAT80, pp. 587-593.; Bohte, Z., Cepar, D., Kosmelij, K., 1980. Clustering of time series. Proceedings of COMPSTAT80, pp. 587-593.
[13] Box, G. E.P.; Jenkins, G. M., Time Series Analysis: Forecasting and Control (rev edition) (1976), Holden-Day: Holden-Day San Francisco · Zbl 0363.62069
[14] Brockwell, P. J.; Davies, R. A., Time Series: Theory and Methods (1991), Springer: Springer New York
[15] Caiado, J.; Crato, N.; Peña, D., A periodogram-based metric for time series classification, Comput. Statist. Data Anal, 50, 2668-2684 (2006) · Zbl 1445.62222
[16] Chaudury, G.; Borwarkar, J. D.; Rao, P. R.K., Bhattacharyya distance based linear discriminant function for stationary time series, Comm. Statist. Theory Methods, 20, 2195-2205 (1991) · Zbl 0900.62461
[17] Climer, S.; Zhang, W., Rearrangement clustering: pitfalls, remedies and applications, J. Mach. Learn., 7, 919-943 (2006) · Zbl 1222.68172
[18] Corduas, M., Preliminary estimation of ARFIMA models, (Betlehem, J. G.; van der Heijden, P. G.M., Proceedings in Computational Statistics (2000), Physica: Physica Heidelberg), 247-252 · Zbl 1455.62168
[19] Corduas, M., 2004. Time series discrimination using AR metric. Proceedings of XLII Riunione Scientifica SIS, CLEUP, Padova, pp. 143-146.; Corduas, M., 2004. Time series discrimination using AR metric. Proceedings of XLII Riunione Scientifica SIS, CLEUP, Padova, pp. 143-146.
[20] Corduas, M., Piccolo D., 1999. An application of the AR metric to seasonal adjustment. Bulletin of the International Statistical Institute, vol. LVIII, pp. 217-218.; Corduas, M., Piccolo D., 1999. An application of the AR metric to seasonal adjustment. Bulletin of the International Statistical Institute, vol. LVIII, pp. 217-218.
[21] Dargahi-Noubary, G. R.; Laycock, P. J., Spectral ratio discriminants and information theory, J. Time Ser. Anal., 2, 71-86 (1981) · Zbl 0509.62052
[22] Farebrother, R. W., The distribution of a quadratic form in normal variables, Appl. Statist., 39, 294-309 (1990) · Zbl 0715.62096
[23] Galeano, P.; Peña, D., Multivariate analysis in vector time series, Resenhas, 4, 383-404 (2000) · Zbl 1098.62558
[24] Ge, D., Srinivasan, N., Krishnan, S.M., 2002. Cardiac arrhythmia classification using autoregressive modeling. Biomed. Eng. OnLine, \( \langle;\) http://www.biomedical-engineering-online.com \(\rangle;\); Ge, D., Srinivasan, N., Krishnan, S.M., 2002. Cardiac arrhythmia classification using autoregressive modeling. Biomed. Eng. OnLine, \( \langle;\) http://www.biomedical-engineering-online.com \(\rangle;\)
[25] Gersh, W.; Martinelli, F.; Yonemoto, J.; Low, M. D.; McEwan, J. A., Automatic classification of electroencephalograms: Kullback-Liebler nearest neighbor rules, Science, 205, 193-195 (1979)
[26] Gonzalo, J.; Lee, T. H., Relative power of t type tests for stationary and unit root processes, J. Time Ser. Anal., 17, 37-47 (1996) · Zbl 0835.62078
[27] Gray, A. H.; Markel, J. D., Distance measures for speech processing, IEEE Trans. Acoust., Speech and Signal Processing, ASSP-24, 380-391 (1976)
[28] Grimaldi, S., Linear parametric models applied on daily hydrological series, J. Hydrol. Eng., 9, 383-391 (2004)
[29] Hurvich, C. M.; Tsai, C. L., Regression and time series model selection in small samples, Biometrika, 76, 297-307 (1989) · Zbl 0669.62085
[30] Imhof, P. J., Computing the distribution of quadratic forms in Normal variables, Biometrika, 48, 419-426 (1961) · Zbl 0136.41103
[31] Ingrassia, S.; Cerioli, A.; Corbellini, A., Some issues on clustering of functional data, (Schader, M.; Gaul, W.; Vichi, M., Between Data Science and Applied Data Analysis (2003), Springer: Springer Berlin), 49-56 · Zbl 05280157
[32] Kailath, T., The divergence and Bhattacharyya distance measures in signal selection, IEEE Trans. Comm. Technol., COM-15, 52-60 (1967)
[33] Kakizawa, Y.; Shumway, R. H.; Taniguchi, M., Discrimination and clustering for multivariate time series, J. Amer. Statist. Assoc., 93, 328-340 (1998) · Zbl 0906.62060
[34] Kalpakis, K., Gada, D., Puttagunda, V., 2001. Distance measures for effective clustering of ARIMA time series. Proc. IEEE Internat. Conf. Data Mining, 273-280.; Kalpakis, K., Gada, D., Puttagunda, V., 2001. Distance measures for effective clustering of ARIMA time series. Proc. IEEE Internat. Conf. Data Mining, 273-280.
[35] Kang, W.; Cheng, C.; Lai, J.; Tsao, H., The application of cepstral coefficients and maximum likelihood method in EGM pattern recognition, IEEE Trans. Biomed. Eng., 42, 777-785 (1995)
[36] Kaufman, L.; Rousseeuw, P. J., Finding Groups in Data: An Introduction to Cluster Analysis (1990), Wiley: Wiley New York · Zbl 1345.62009
[37] Kazakos, D.; Papantoni-Kazakos, P., Spectral distances between Gaussian processes, IEEE Trans. Automat. Control, AC-25, 950-959 (1980) · Zbl 0454.93040
[38] Keogh, E.; Kasetty, S., On the need for time series data mining benchmarks: a survey and empirical demonstration, Data Mining Knowledge Discovery, 7, 349-371 (2003)
[39] Kosěc, D., Parametric estimation of continuous non stationary spectrum and its dynamics in surface EMG studies, Internat. J. Med. Inform., 58-59, 59-69 (2000)
[40] Kovac˘ić, Z.J., 1996. Classification of time series with application to the leading indicator selection. Proceedings of the Fifth Conference of IFCS, vol. 2, pp. 204-207.; Kovac˘ić, Z.J., 1996. Classification of time series with application to the leading indicator selection. Proceedings of the Fifth Conference of IFCS, vol. 2, pp. 204-207.
[41] Liao, T. W., Clustering time series data—a survey, Pattern Recognition, 38, 1857-1874 (2005) · Zbl 1077.68803
[42] Maharaj, E. A., A significance test for classifying ARMA models, J. Statist. Comput. Simulation, 54, 305-331 (1996) · Zbl 0899.62116
[43] Maharaj, E. A., Comparison and classification of stationary multivariate time series, Pattern Recognition, 32, 1129-1138 (1999)
[44] Maharaj, E. A., Clusters of time series, J. Classification, 17, 297-314 (2000) · Zbl 1017.62079
[45] Mathai, A. M.; Provost, S. B., Quadratic Forms in Random Variables (1992), Marcell Decker: Marcell Decker New York · Zbl 0792.62045
[46] McCormick, W. T.; Schweitzer, P. J.; White, T. W., Problem decomposition and data reorganization by a clustering technique, Oper. Res., 20, 993-1009 (1972) · Zbl 0249.90046
[47] Mélard, G.; Roy, R., Sur un test d’égalité des autocovariances de deux séries chronologiques, Canad. J. Statist., 12, 333-342 (1984) · Zbl 0569.62078
[48] Ng, M. K.; Huang, Z., Data mining massive time series astronomical data: challenges, problems and solutions, Inform. Software Technol., 41, 545-556 (1999)
[49] Otranto, E.; Triacca, U., Measures to evaluate the discrepancy between direct and indirect model-based seasonal adjustment, J. Official Statist., 18, 511-530 (2002)
[50] Pattarin, F.; Paterlini, S.; Minerva, T., Clustering financial time series: an application to mutual funds style analysis, Comput. Statist. Data Anal., 47, 353-372 (2004) · Zbl 1429.62476
[51] Peña, D., Influential observation in time series, J. Business and Econom. Statist., 8, 235-242 (1990)
[52] Piccolo, D., 1984. Una topologia per la classe dei processi ARIMA. Statistica, XLIV, 47-59.; Piccolo, D., 1984. Una topologia per la classe dei processi ARIMA. Statistica, XLIV, 47-59.
[53] Piccolo, D., 1989. On the measure of dissimilarity between ARIMA models. In: Proceedings of the A.S.A. Meetings, Business and Economic Statistics Section, Washington, DC, pp. 231-236.; Piccolo, D., 1989. On the measure of dissimilarity between ARIMA models. In: Proceedings of the A.S.A. Meetings, Business and Economic Statistics Section, Washington, DC, pp. 231-236.
[54] Piccolo, D., A distance measure for classifying ARIMA models, J. Time Ser. Anal., 11, 153-164 (1990) · Zbl 0691.62083
[55] Sarno, E., Testing information redundancy in environmental monitoring networks, Environmetrics, 16, 71-79 (2005)
[56] Sarno, E.; Zazzaro, A., An index of dissimilarity among time series: an application to the inflation rates of the EU countries, (Klinke, S.; Ahrend, P.; Richter, L., Proceedings of COMPSTAT 2002 (2002), Springer: Springer Berlin)
[57] Schwarz, G., Estimating the dimension of a model, Ann. Statist., 6, 461-464 (1978) · Zbl 0379.62005
[58] Shumway, R. H., Discriminant analysis for time series, (Krishnaiah, P. R.; Kanal, L. N., Handbook of Statistics, vol. 2 (1982), North Holland: North Holland Amsterdam), 1-46 · Zbl 0566.62058
[59] Shumway, R. H., Time-frequency clustering and discriminant analysis, Statist. Probab. Lett., 63, 307-314 (2003) · Zbl 1116.62364
[60] Shumway, R. H.; Unger, A. N., Linear discriminant functions for stationary time series, J. Amer. Statist. Assoc., 65, 1527-1546 (1974) · Zbl 0296.62085
[61] Struzik, Z. R.; Siebes, A., The Haar wavelet in the time series similarity paradigm, (Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery (1999), Springer: Springer Prague), 12-22
[62] Taniguchi, M.; Kakizawa, Y., Asymptotic Theory of Statistical Inference for Time Series (2000), Springer: Springer New York · Zbl 0955.62088
[63] Thomson, P. J.; De Souza, P., Speech recognition using LPC distance measures, (Hannan, E. J.; Krishnaiah, P. R.; Rao, M. M., Handbook of Statistics, vol. 5 (1985), North Holland: North Holland Amsterdam), 389-412
[64] Tong, H.; Dabas, P., Cluster of time series, J. Appl. Statist., 17, 187-198 (1990)
[65] Tran-Luu, T. D.; DeClaris, N., Visual heuristics for data clustering, IEEE Trans. Systems Man Cybernet., 1, 19-24 (1997)
[66] Zhang, G.; Taniguchi, M., Nonparametric approach for discriminant analysis in time series, Nonparametric Statist., 5, 91-101 (1995) · Zbl 0873.62036
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.