Functional data clustering: a survey. (English) Zbl 1414.62018

Summary: Clustering techniques for functional data are reviewed. Four groups of clustering algorithms for functional data are proposed. The first group consists of methods working directly on the evaluation points of the curves. The second groups is defined by filtering methods which first approximate the curves into a finite basis of functions and second perform clustering using the basis expansion coefficients. The third groups is composed of methods which perform simultaneously dimensionality reduction of the curves and clustering, leading to functional representation of data depending on clusters. The last group consists of distance-based methods using clustering algorithms based on specific distances for functional data. A software review as well as an illustration of the application of these algorithms on real data are presented.


62-07 Data analysis (statistics) (MSC2010)
62M99 Inference from stochastic processes
62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: DOI Link


[1] Abraham C, Cornillon PA, Matzner-Løber E, Molinari N (2003) Unsupervised curve clustering using B-splines. Scand J Stat Theory Appl 30(3):581-595. doi:10.1111/1467-9469.00350
[2] Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19:716-723 (system identification and time-series analysis) · Zbl 0314.62039
[3] Antoniadis, A.; Beder, JH, Joint estimation of the mean and the covariance of a Banach valued Gaussian vector, Statistics, 20, 77-93, (1989) · Zbl 0684.62065
[4] Banfield, J.; Raftery, A., Model-based Gaussian and non-Gaussian clustering, Biometrics, 49, 803-821, (1993) · Zbl 0794.62034
[5] Bergé, L.; Bouveyron, C.; Girard, S., HDclassif : an R package for model-based clustering and discriminant analysis of high-dimensional data, J Stat Softw, 42, 1-29, (2012)
[6] Besse P (1979) Etude descriptive d’un processus. Thèse de doctorat \(3^{\grave{{\rm e}}{\rm me}}\) cycle Université Paul Sabatier, Toulouse
[7] Biernacki, C.; Celeux, G.; Govaert, G., Assessing a mixture model for clustering with the inegrated completed likelihood, IEEE Trans Pattern Anal Mach Intell, 22, 719-725, (2000)
[8] Bosq D (2000) Linear processes in function spaces, Lecture Notes in Statistics, vol 149. Springer, New York (theory and applications) · Zbl 0962.60004
[9] Boullé, M., Functional data clustering via piecewise constant nonparametric density estimation, Pattern Recognit, 45, 4389-4401, (2012) · Zbl 1248.68398
[10] Boumaza R (1980) Contribution a l’étude descriptive d’une fonction aléatoire qualitative. PhD thesis, Université Paul Sabatier, Toulouse, France
[11] Bouveyron C, Brunet C (2013) Model-based clustering of high-dimensional data : a review. Technical report · Zbl 1471.62032
[12] Bouveyron, C.; Jacques, J., Model-based clustering of time series in group-specific functional subspaces, Adv Data Anal Classif, 5, 281-300, (2011) · Zbl 1274.62416
[13] Bouveyron C, Girard S, Schmid C (2007) High dimensional data clustering. Comput Stat Data Anal 52: 502-519 · Zbl 1452.62433
[14] Cardot, H.; Ferraty, F.; Sarda, P., Functional linear model, Stat Probab Lett, 45, 11-22, (1999) · Zbl 0962.62081
[15] Cattell, R., The scree test for the number of factors, Multivar Behav Res, 1, 245-276, (1966)
[16] Celeux, G.; Govaert, G., Gaussian parsimonious clustering models, J Pattern Recognit Soc, 28, 781-793, (1995)
[17] Chiou, JM; Li, PL, Functional clustering and identifying substructures of longitudinal data, J R Stat Soc Ser B Stat Methodol, 69, 679-699, (2007)
[18] Coifman, R.; Wickerhauser, M., Entropy-based algorithms for best basis selection, IEEE Trans Inf Theory, 38, 713-718, (1992) · Zbl 0849.94005
[19] Cox T, Cox M (2001) Multidimensional scaling. Chapman and Hall, New York · Zbl 1004.91067
[20] Cuesta-Albertos, J.; Fraiman, R., Impartial trimmed k-means for functional data, Comput Stat Data Anal, 51, 4864-4877, (2000) · Zbl 1162.62377
[21] Dauxois, J.; Pousse, A.; Romain, Y., Asymptotic theory for the principal component analysis of a vector random function: some applications to statistical inference, J Multivar Anal, 12, 136-154, (1982) · Zbl 0539.62064
[22] Delaigle, A.; Hall, P., Defining probability density for a distribution of random functions, Ann Stat, 38, 1171-1193, (2010) · Zbl 1183.62061
[23] Deville J (1974) Méthodes statistiques et numériques de l’analyse harmonique. Annales de l’INSEE 15:3-101
[24] Escabias, M.; Aguilera, A.; Valderrama, M., Modeling environmental data by functional principal component logistic regression, Environmetrics, 16, 95-107, (2005)
[25] Ferraty F, Vieu P (2006) Nonparametric functional data analysis, Springer Series in Statistics. Springer, New York · Zbl 1119.62046
[26] Gaffney S (2004) Probabilistic curve-aligned clustering and prediction with mixture models. PhD thesis, Department of Computer Science, University of California, Irvine, USA
[27] Giacofci M, Lambert-Lacroix S, Marot G, Picard F (2012) Wavelet-based clustering for mixed-effects functional models in high dimension. Biometrics (in press) · Zbl 1274.62774
[28] Guyon I, Von Luxburg U, Williamson R (2009) Clustering: science or art. In: NIPS 2009 workshop on clustering theory
[29] Hartigan, J.; Wong, M., Algorithm as 1326: a k-means clustering algorithm, Appl Stat, 28, 100-108, (1978) · Zbl 0447.62062
[30] Heard, N.; Holmes, C.; Stephens, D., A quantitative study of gene regulation involved in the immune response of anopheline mosquitoes: an application of Bayesian hierarchical clustering of curves, J Am Stat Assoc, 101, 18-29, (2006) · Zbl 1118.62368
[31] Hébrail, G.; Hugueney, B.; Lechevallier, Y.; Rossi, F., Exploratory analysis of functional data via clustering and optimal segmentation, Neurocomput EEG Neurocomput, 73, 1125-1141, (2010)
[32] Ieva F, Paganoni A, Pigoli D, Vitelli V (2012) Multivariate functional clustering for the analysis of ecg curves morphology. J R Stat Soc Ser C Appl Stat (in press)
[33] Jacques J, Preda C (2013a) Funclust: a curves clustering method using functional random variable density approximation. Neurocomputing. doi:10.1016/j.neucom.2012.11.042
[34] Jacques J, Preda C (2013b) Model-based clustering for multivariate functional data. Comput Stat Data Anal. doi:10.1016/j.csda.2012.12.004 · Zbl 1471.62096
[35] James, G.; Sugar, C., Clustering for sparsely sampled functional data, J Am Stat Assoc, 98, 397-408, (2003) · Zbl 1041.62052
[36] Karhunen, K., Über lineare Methoden in der Wahrscheinlichkeitsrechnung, Ann Acad Sci Fennicae Ser A I Math-Phys, 1947, 79, (1947) · Zbl 0030.16502
[37] Kayano, M.; Dozono, K.; Konishi, S., Functional cluster analysis via orthonormalized gaussian basis expansions and its application, J Classif, 27, 211-230, (2010) · Zbl 1337.62134
[38] Kohonen T (1995) Self-organizing maps. Springer, New York
[39] Lévéder C, Abraham P, Cornillon E, Matzner-Lober E, Molinari N (2004) Discrimination de courbes de prétrissage. In: Chimiométrie 2004, Paris, pp 37-43
[40] Liu, X.; Yang, M., Simultaneous curve registration and clustering for functional data, Comput Stat Data Anal, 53, 1361-1376, (2009) · Zbl 1452.62993
[41] Loève, M., Fonctions aléatoires de second ordre, C R Acad Sci Paris, 220, 469, (1945) · Zbl 0063.03612
[42] MATLAB (2010) version 7.10.0 (R2010a) The MathWorks Inc., Natick, Massachusetts · Zbl 1200.93001
[43] McLachlan G, Peel D (2000) Finite mixture models. Wiley Series in Probability and Statistics. Applied Probability and Statistics, Wiley-Interscience, New York. doi:10.1002/0471721182 · Zbl 0963.62061
[44] Olszewski R (2001) Generalized feature extraction for structural pattern recognition in time-series data. PhD thesis, Carnegie Mellon University, Pittsburgh, PA
[45] Peng, J.; Müller, HG, Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions, Ann Appl Stat, 2, 1056-1077, (2008) · Zbl 1149.62053
[46] Preda, C.; Saporta, G.; Lévéder, C., PLS classification of functional data, Comput Stat, 22, 223-235, (2007) · Zbl 1196.62086
[47] R Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/. ISBN: 3-900051-07-0
[48] Ramsay JO, Silverman BW (2002) Applied functional data analysis. Springer Series in Statistics. Springer, New York (methods and case studies)
[49] Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer Series in Statistics. Springer, New York
[50] Ray, S.; Mallick, B., Functional clustering by Bayesian wavelet methods, J R Stat Soc Ser B Stat Methodol, 68, 305-332, (2006) · Zbl 1100.62058
[51] Romano E, Giraldo R, Mateu J (2011) Recent advances in functional data analysis and related topics, Springer, chap clustering spatially correlated functional data
[52] Rossi F, Conan-Guez B, El Golli A (2004) Clustering functional data with the som algorithm. In: Proceedings of ESANN 2004. Bruges, Belgium, pp 305-312
[53] Saito, N.; Coifman, R., Local discriminant bases and thier applications, J Math Imaging Vis, 5, 337-358, (1995) · Zbl 0863.94004
[54] Samé, A.; Chamroukhi, F.; Govaert, G.; Aknin, P., Model-based clustering and segmentation of times series with changes in regime, Adv Data Anal Classif, 5, 301-322, (2011) · Zbl 1274.62427
[55] Sangalli, L.; Secchi, P.; Vantini, S.; Vitelli, V., Functional clustering and alignment methods with applications, Commun App Ind Math, 1, 205-224, (2010) · Zbl 1329.62289
[56] Sangalli, L.; Secchi, P.; Vantini, S.; Vitelli, V., \(k\)-mean alignment for curve clustering, Comput Stat Data Anal, 54, 1219-1233, (2010) · Zbl 1464.62153
[57] Saporta G (1981) Méthodes exploratoires d’analyse de données temporelles. Cahiers du BURO 37-38
[58] Schwarz, G., Estimating the dimension of a model, Ann Stat, 6, 461-464, (1978) · Zbl 0379.62005
[59] Secchi P, Vantini S, Vitelli V (2011) Recent advances in functional data analysis and related topics, Springer, chap Spatial Clustering of Functional Data
[60] Serban, N.; Jiang, H., Multilevel functional clustering analysis, Biometrics, 68, 805-814, (2012) · Zbl 1272.62085
[61] Slaets, L.; Claeskens, G.; Hubert, M., Phase and amplitude-based clustering for functional data, Comput Stat Data Anal, 56, 2360-2374, (2012) · Zbl 1252.62066
[62] Sugar C, James G (2003) Finding the number of clusters in a dataset: an information-theoretic approach. J Am Stat Assoc 98(463):750-763 · Zbl 1046.62064
[63] Tarpey, T.; Kinateder, K., Clustering functional data, J Classif, 20, 93-114, (2003) · Zbl 1112.62327
[64] Tipping, ME; Bishop, C., Mixtures of principal component analyzers, Neural Comput, 11, 443-482, (1999)
[65] Tokushige, S.; Yadohisa, H.; Inada, K., Crisp and fuzzy k-means clustering algorithms for multivariate functional data, Comput Stat, 22, 1-16, (2007) · Zbl 1196.62089
[66] Tuddenham, R.; Snyder, M., Physical growth of california boys and girls from birth to eighteen years, Univ Calif Public Child Dev, 1, 188-364, (1954)
[67] Wahba G (1990) Spline models for observational data. SIAM, Philadelphia · Zbl 0813.62001
[68] Ward, J.; Joe, H., Hierarchical grouping to optimize an objective function, J Am Stat Assoc, 58, 236-244, (1963)
[69] Yamamoto, M., Clustering of functional data in a low-dimensional subspace, Adv Data Anal Classif, 6, 219-247, (2012) · Zbl 1254.62077
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.