Clustering of discretely observed diffusion processes. (English) Zbl 1464.62056

Summary: A new distance to classify time series is proposed. The underlying generating process is assumed to be a diffusion process solution to stochastic differential equations and observed at discrete times. The mesh of observations is not required to shrink to zero. The new dissimilarity measure is based on the \(L^{1}\) distance between the Markov operators estimated on two observed paths. Simulation experiments are used to analyze the performance of the proposed distance under several conditions including perturbation and misspecification. As an example, real financial data from NYSE/NASDAQ stocks are analyzed and evidence is provided that the new distance seems capable to catch differences in both the drift and diffusion coefficients better than other commonly used non-parametric distances. Corresponding software is available in the add-on package sde for the R statistical environment.


62-08 Computational methods for problems pertaining to statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62M05 Markov processes: estimation; hidden Markov models
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62P05 Applications of statistics to actuarial sciences and financial mathematics


sde; dtw; fda (R); R
Full Text: DOI arXiv Link


[1] Aït-Sahalia, Y., Nonparametric pricing of interest rate derivative securities, Econometrica, 64, 527-560, (1996) · Zbl 0844.62094
[2] Alonso, A.M.; Berrendero, J.R.; Hernández, A.; Justel, A., Time series clustering based on forecast densities, Computational statistics & data analysis, 51, 2, 762-776, (2006) · Zbl 1157.62484
[3] Bailey, N., ()
[4] Banks, H., ()
[5] Bergstrom, A., Continuous time econometric modeling, (1990), Oxford University Press Oxford
[6] Black, F.; Scholes, M., The pricing of options and corporate liabilities, The journal of political economy, 81, 3, 637-654, (1973) · Zbl 1092.91524
[7] Caiado, J.; Crato, N.; Peña, D., A periodogram-based metric for time series classification, Computational statistics & data analysis, 50, 10, 2668-2684, (2006) · Zbl 1445.62222
[8] Chen, X., Hansen, L., Scheinkman, J., 1997. Shape preserving spectral approximation of diffusions. Working Paper
[9] Cobb, L., Stochastic differential equations for the social sciences, (), 1-26
[10] Corduas, M., Dissimilarity criteria for time series data mining, Quaderni di statistica, 9, 107-129, (2007)
[11] Corduas, M.; Piccolo, D., Time series clustering and classification by the autoregressive metric, Computational statistics & data analysis, 52, 1860-1872, (2008) · Zbl 1452.62624
[12] Ditlevsen, P.; Ditlevsen, S.; Andersen, K., The fast climate fluctuations during the stadial and interstadial climate states, Annals of glaciology, 35, 457-462, (2002)
[13] Giorgino, T., Computing and visualizing dynamic time warping alignments in r: the DTW package, Journal of statistical software, 31, 1-24, (2009)
[14] Gobet, E.; Hoffmann, M.; Reiß, M., Nonparametric estimation of scalar diffusions based on low frequency data, The annals of statistics, 32, 2223-2253, (2004) · Zbl 1056.62091
[15] Gravilov, M., Anguelov, D., Indyk, P., Motwani, R., 2000. Mining the stock market; which measure is best?. In: Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, pp. 487-496
[16] Hansen, L.; Scheinkman, J.; Touzi, N., Spectral methods for identifying scalar diffusions, Journal of econometrics, 86, 1-32, (1998) · Zbl 0962.62094
[17] Hirukawa, J., Cluster analysis for non-Gaussian locally stationary processes, Int. journal of theoretical and applied finance, 9, 113-132, (2006) · Zbl 1137.91595
[18] Holden, A., Models for stochastic activity of neurones, (1976), Springer-Verlag New York · Zbl 0353.92001
[19] Holland, C., On a formula in diffusion processes in population genetics, Proceedings of the American mathematical society, 54, 316-318, (1976) · Zbl 0326.60098
[20] Holmes, E., Beyond theory to application and evaluation: diffusion approximations for population viability analysis, Ecological applications, 14, 1272-1293, (2004)
[21] Iacus, S., Simulation and inference for stochastic differential equations. with R examples, (2008), Springer New York · Zbl 1210.62112
[22] Kakizawa, Y.; Sumway, R.H.; Taniguchi, M., Discrimination and clustering for multivariate time series, Journal of the American statistical association, 93, 328-340, (1998) · Zbl 0906.62060
[23] Karatzas, I.; Shrevre, S., Brownian motion and stochastic calculus, (1988), Springer-Verlag New York
[24] Kessler, M.; Sørensen, M., Estimating equations based on eigenfunctions for a discretely observed diffusion process, Bernoulli, 5, 299-314, (1999) · Zbl 0980.62074
[25] Kloden, P.; Platen, E.; Schurz, H., Numerical solution of SDE through computer experiments, (2000), Springer Berlin
[26] Kushner, H., Stochastic stability and control, (1967), Academic Press New York · Zbl 0183.19401
[27] Liao, T., Clustering of time series data — A survey, Pattern recognition, 38, 1857-1874, (2005) · Zbl 1077.68803
[28] Maharaj, E.A., Comparison and classification of stationary multivariate time series, Pattern recognition, 32, 1129-1138, (1999)
[29] Merton, R., Theory of rational option pricing, Bell journal of economics and management science, 4, 141-183, (1973) · Zbl 1257.91043
[30] Möller-Levet, C.; Klawonn, F.; Cho, K.-H.; Wolkenhauer, O., Dynamic programming algorithm optimization for spoken work recognition, IEEE transactions on acoustic, speech and signal processing, 26, 143-165, (1978)
[31] Otranto, E., Clustering heteroskedastic time series by model-based procedures, Computational statistics & data analysis, 52, 4685-4698, (2008) · Zbl 1452.62784
[32] Papanicolaou, G., Diffusions in random media, (), 205-255
[33] Piccolo, D., A distance measure for classifying ARIMA models, Journal of time series analysis, 11, 153-164, (1990) · Zbl 0691.62083
[34] R Development Core Team,, R: A Language and Environment for Statistical Computing. Vienna, Austria, 2009. URL http://www.R-project.org. ISBN 3-900051-07-0
[35] Ramsay, J.O.; Silverman, B.W., Functional data analysis, (2005), Springer New York · Zbl 1079.62006
[36] Reiß, M., 2003. Simulation results for estimating the diffusion coefficient from discrete time observation. Available at http://www.mathematik.hu-berlin.se/ reiss/sim-diff-est.pdf
[37] Ricciardi, L., ()
[38] Sakoe, H.; Chiba, S., Dynamic programming algorithm optimization for spoken work recognition, IEEE transactions on acoustic, speech and signal processing, 26, 143-165, (1978)
[39] Schuecker, P.; Böhringer, H.; Arzner, K.; Reiprich, T., Cosmic mass functions from Gaussian stochastic diffusion processes, Astronomy and astrophysics, 370, 715-728, (2001) · Zbl 1066.85006
[40] Tuerlink, F.; Maris, E.; Ratcliff, R.; De Boeck, P., A comparison of four methods for simulating the diffusion process, Behavior research methods, instruments, computers, 33, 443-456, (2001)
[41] Wang, K.; Gasser, T., Alignment of curves by dynamic time warping, Annals of statistics, 25, 1251-1276, (1997) · Zbl 0898.62051
[42] Xiong, Y., Yeung, D., 2002. Mixtures of ARMA models for model-based time series clustering. In: Proceedings of the IEEE International Conference on Data Mining, pp. 717-720
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.