de Gregorio, Alessandro; Iacus, Stefano Maria Clustering of discretely observed diffusion processes. (English) Zbl 1464.62056 Comput. Stat. Data Anal. 54, No. 2, 598-606 (2010). Summary: A new distance to classify time series is proposed. The underlying generating process is assumed to be a diffusion process solution to stochastic differential equations and observed at discrete times. The mesh of observations is not required to shrink to zero. The new dissimilarity measure is based on the \(L^{1}\) distance between the Markov operators estimated on two observed paths. Simulation experiments are used to analyze the performance of the proposed distance under several conditions including perturbation and misspecification. As an example, real financial data from NYSE/NASDAQ stocks are analyzed and evidence is provided that the new distance seems capable to catch differences in both the drift and diffusion coefficients better than other commonly used non-parametric distances. Corresponding software is available in the add-on package sde for the R statistical environment. Cited in 5 Documents MSC: 62-08 Computational methods for problems pertaining to statistics 62H30 Classification and discrimination; cluster analysis (statistical aspects) 62M05 Markov processes: estimation; hidden Markov models 62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH) 62P05 Applications of statistics to actuarial sciences and financial mathematics Software:sde; R; dtw; fda (R) PDFBibTeX XMLCite \textit{A. de Gregorio} and \textit{S. M. Iacus}, Comput. Stat. Data Anal. 54, No. 2, 598--606 (2010; Zbl 1464.62056) Full Text: DOI arXiv Link References: [1] Aït-Sahalia, Y., Nonparametric pricing of interest rate derivative securities, Econometrica, 64, 527-560 (1996) · Zbl 0844.62094 [2] Alonso, A. M.; Berrendero, J. R.; Hernández, A.; Justel, A., Time series clustering based on forecast densities, Computational Statistics & Data Analysis, 51, 2, 762-776 (2006) · Zbl 1157.62484 [3] Bailey, N., (The Mathematical Theory of Epidemics. The Mathematical Theory of Epidemics, Lecture Notes in Biomathematics (1957), Springer-Verlag: Springer-Verlag Griffin, London) [4] Banks, H., (Modeling and Control in the Biological Sciences. Modeling and Control in the Biological Sciences, Lecture Notes in Biomathematics, vol. 6 (1975), Springer-Verlag: Springer-Verlag Berlin) · Zbl 0315.92001 [5] Bergstrom, A., Continuous Time Econometric Modeling (1990), Oxford University Press: Oxford University Press Oxford [6] Black, F.; Scholes, M., The pricing of options and corporate liabilities, The Journal of Political Economy, 81, 3, 637-654 (1973) · Zbl 1092.91524 [7] Caiado, J.; Crato, N.; Peña, D., A periodogram-based metric for time series classification, Computational Statistics & Data Analysis, 50, 10, 2668-2684 (2006) · Zbl 1445.62222 [9] Cobb, L., Stochastic differential equations for the social sciences, (Cobb, L.; Thrall, M., Mathematical Frontiers of the Social and Policy Sciences (1981), Westview Press), 1-26 [10] Corduas, M., Dissimilarity criteria for time series data mining, Quaderni di Statistica, 9, 107-129 (2007) [11] Corduas, M.; Piccolo, D., Time series clustering and classification by the autoregressive metric, Computational Statistics & Data Analysis, 52, 1860-1872 (2008) · Zbl 1452.62624 [12] Ditlevsen, P.; Ditlevsen, S.; Andersen, K., The fast climate fluctuations during the stadial and interstadial climate states, Annals of Glaciology, 35, 457-462 (2002) [13] Giorgino, T., Computing and visualizing dynamic time warping alignments in r: The DTW package, Journal of Statistical Software, 31, 1-24 (2009) [14] Gobet, E.; Hoffmann, M.; Reiß, M., Nonparametric estimation of scalar diffusions based on low frequency data, The Annals of Statistics, 32, 2223-2253 (2004) · Zbl 1056.62091 [16] Hansen, L.; Scheinkman, J.; Touzi, N., Spectral methods for identifying scalar diffusions, Journal of Econometrics, 86, 1-32 (1998) · Zbl 0962.62094 [17] Hirukawa, J., Cluster analysis for non-Gaussian locally stationary processes, Int. Journal of Theoretical and Applied Finance, 9, 113-132 (2006) · Zbl 1137.91595 [18] Holden, A., Models for Stochastic Activity of Neurones (1976), Springer-Verlag: Springer-Verlag New York · Zbl 0353.92001 [19] Holland, C., On a formula in diffusion processes in population genetics, Proceedings of the American Mathematical Society, 54, 316-318 (1976) · Zbl 0326.60098 [20] Holmes, E., Beyond theory to application and evaluation: Diffusion approximations for population viability analysis, Ecological Applications, 14, 1272-1293 (2004) [21] Iacus, S., Simulation and Inference for Stochastic Differential Equations. With R Examples (2008), Springer: Springer New York · Zbl 1210.62112 [22] Kakizawa, Y.; Sumway, R. H.; Taniguchi, M., Discrimination and clustering for multivariate time series, Journal of the American Statistical Association, 93, 328-340 (1998) · Zbl 0906.62060 [23] Karatzas, I.; Shrevre, S., Brownian Motion and Stochastic Calculus (1988), Springer-Verlag: Springer-Verlag New York [24] Kessler, M.; Sørensen, M., Estimating equations based on eigenfunctions for a discretely observed diffusion process, Bernoulli, 5, 299-314 (1999) · Zbl 0980.62074 [25] Kloden, P.; Platen, E.; Schurz, H., Numerical Solution of SDE through Computer Experiments (2000), Springer: Springer Berlin [26] Kushner, H., Stochastic Stability and Control (1967), Academic Press: Academic Press New York · Zbl 0183.19401 [27] Liao, T., Clustering of time series data — A survey, Pattern Recognition, 38, 1857-1874 (2005) · Zbl 1077.68803 [28] Maharaj, E. A., Comparison and classification of stationary multivariate time series, Pattern Recognition, 32, 1129-1138 (1999) [29] Merton, R., Theory of rational option pricing, Bell Journal of Economics and Management Science, 4, 141-183 (1973) · Zbl 1257.91043 [30] Möller-Levet, C.; Klawonn, F.; Cho, K.-H.; Wolkenhauer, O., Dynamic programming algorithm optimization for spoken work recognition, IEEE Transactions on Acoustic, Speech and Signal Processing, 26, 143-165 (1978) [31] Otranto, E., Clustering heteroskedastic time series by model-based procedures, Computational Statistics & Data Analysis, 52, 4685-4698 (2008) · Zbl 1452.62784 [32] Papanicolaou, G., Diffusions in random media, (Keller, J. B.; McLaughin, D.; Papanicolaou, G., Surveys in Applied Mathematics (1995)), 205-255 [33] Piccolo, D., A distance measure for classifying ARIMA models, Journal of Time Series Analysis, 11, 153-164 (1990) · Zbl 0691.62083 [35] Ramsay, J. O.; Silverman, B. W., Functional Data Analysis (2005), Springer: Springer New York · Zbl 1079.62006 [37] Ricciardi, L., (Diffusion Processes and Related Topics in Biology. Diffusion Processes and Related Topics in Biology, Lecture Notes in Biomathematics (1977), Springer: Springer New York) · Zbl 0356.60023 [38] Sakoe, H.; Chiba, S., Dynamic programming algorithm optimization for spoken work recognition, IEEE Transactions on Acoustic, Speech and Signal Processing, 26, 143-165 (1978) [39] Schuecker, P.; Böhringer, H.; Arzner, K.; Reiprich, T., Cosmic mass functions from Gaussian stochastic diffusion processes, Astronomy and Astrophysics, 370, 715-728 (2001) · Zbl 1066.85006 [40] Tuerlink, F.; Maris, E.; Ratcliff, R.; De Boeck, P., A comparison of four methods for simulating the diffusion process, Behavior Research Methods, Instruments, Computers, 33, 443-456 (2001) [41] Wang, K.; Gasser, T., Alignment of curves by dynamic time warping, Annals of Statistics, 25, 1251-1276 (1997) · Zbl 0898.62051 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.