Clustering of functional data in a low-dimensional subspace. (English) Zbl 1254.62077

Summary: To find optimal clusters of functional objects in a lower-dimensional subspace of data, a sequential method, called tandem analysis, is often used, though such a method is problematic. A new procedure is developed to find optimal clusters of functional objects and also to find an optimal subspace for clustering simultaneously. The method is based on the \(k\)-means criterion for functional data and seeks the subspace that is maximally informative of the clustering structure in the data. An efficient alternating least-squares algorithm is described, and the proposed method is extended to a regularized method. Analyses of artificial and real data examples demonstrate that the proposed method gives correct and interpretable results.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
65C60 Computational problems in statistics (MSC2010)


funHDDC; R; fda (R)
Full Text: DOI


[1] Abraham C, Cornillon PA, Matzner-Lober E, Molinari N (2003) Unsupervised curve clustering using B-splines. Scand J Statist 30: 581–595 · Zbl 1039.91067
[2] Arabie P, Hubert L (1994) Cluster analysis in marketing research. In: Bagozzi RP (eds) Advanced methods of marketing research. Blackwell Business, Cambridge, pp 160–189
[3] Besse PC, Cardot H, Ferraty F (1997) Simultaneous non-parametric regressions of unbalanced longitudinal data. Comput Stat Data Anal 24: 255–270 · Zbl 0900.62199
[4] Besse PC, Ramsay JO (1986) Principal components analysis of sampled functions. Psychometorika 51: 285–311 · Zbl 0623.62048
[5] Boente G, Fraiman R (2000) Kernel-based functional principal components. Stat Probab Lett 48: 335–345 · Zbl 0997.62024
[6] Bouveyron C, Jacques J (2011) Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif 5: 281–300 · Zbl 1274.62416
[7] De Boor C (2001) A practical guide to splines, revised edition. Springer, New York · Zbl 0987.65015
[8] de Leeuw J, Young FW, Takane Y (1976) Additive structure in qualitative data: An alternating least squares method with optimal scaling features. Psychometorika 41: 471–503 · Zbl 0351.92031
[9] DeSarbo WS, Jedidi K, Cool K, Schendel D (1990) Simultaneous multidimensional unfolding and cluster analysis: an investigation of strategic groups. Mark Lett 2: 129–146
[10] De Soete G, Carroll JD (1994) K-means clustering in a low-dimensional Euclidean space. In: Diday E, Lechevallier Y, Schader M, Bertrand P, Burtschy B (eds) New approaches in classification and data analysis. Springer, Heidelberg, pp 212–219
[11] Dunford N, Schwartz JT (1988) Linear operators, spectral theory, self adjoint operators in Hilbert space, part 2. Interscience, NewYork · Zbl 0635.47002
[12] Green PJ, Silverman BW (1994) Nonparametric regression and generalized linear models: a roughness penalty approach. Chapman and Hall, London · Zbl 0832.62032
[13] Hardy A (1996) On the number of clusters. Comput Stat Data Anal 23: 83–96 · Zbl 0900.92186
[14] Hartigan J (1975) Clustering algorithms. Wiley, New York · Zbl 0372.62040
[15] Hubert L, Arabie P (1985) Comparing partitions. J Classif 2: 193–218 · Zbl 0587.62128
[16] Kneip A (1994) Nonparametric estimation of common regressors for similar curve data. Ann Stat 22: 1386–1427 · Zbl 0817.62029
[17] Illian JB, Prosser JI, Baker KL, Rangel-Castro JI (2009) Functional principal component data analysis: A new method for analysing microbial community fingerprints. J Microbiol Methods 79: 89–95
[18] Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28: 128–137 · Zbl 0504.94015
[19] Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50: 159–179
[20] Ocaña FA, Aguilera AM, Valderrama MJ (1982) Functional principal components analysis by choice of norm. J Multivariate Anal 71: 262–276 · Zbl 0944.62059
[21] Pezzulli SD, Silverman BW (1993) Some properties of smoothed principal components analysis for functional data. Comput Stat 8: 1–16 · Zbl 0775.62146
[22] R Development Core Team (2005) R: A language and environment for statistical computing. R Foundation for Statistical Computing. Austria. ISBN 3-900051-07-0, URL http://www.R-project.org
[23] Ramsay JO, Wang X, Flanagan R (1995) A functional data analysis of the pinch force of human fingers. J Roy Stat Soc Ser C 44: 17–30 · Zbl 0821.62085
[24] Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd Edn. Springer, New York
[25] Rice JA, Silverman BW (1991) Estimating the mean and covariance structure nonparametrically when the data are curves. J Roy Stat Soc Ser B 53: 233–243 · Zbl 0800.62214
[26] Rossi F, Conan-Guez B, Golli AE (2004) Clustering functional data with the SOM algorithm. ESANN’2004 proceedings, pp 305–312
[27] Silverman BW (1996) Smoothed functional principal components analysis by choice of norm. Ann Stat 24: 1–24 · Zbl 0853.62044
[28] Steinley D (2003) K-means clustering: What you don’t know may hurt you. Psychol Methods 8: 294–304
[29] Steinley D, Henson R (2005) OCLUS: an analytic method for generating clusters with known overlap. J Classif 22: 221–250 · Zbl 1336.62191
[30] Suyundykov R, Puechmorel S, Ferre L (2010) Multivariate functional data clusterization by PCA in Sobolev space using wavelets. Hyper Articles en Ligne:inria-00494702
[31] Tarpey T (2007) Linear transformations and the k-means clustering algorithm: Applications to clustering curves. Am Stat 61: 34–40 · Zbl 05680714
[32] Timmerman ME, Ceulemans E, Kiers HAL, Vichi M (2010) Factorial and reduced K-means reconsidered. Comput Stat Data Anal 54: 1858–1871 · Zbl 1284.62396
[33] Vichi M, Kiers HAL (2001) Factorial k-means analysis for two-way data. Comput Stat Data Anal 37: 49–64 · Zbl 1051.62056
[34] Wahba G (1990) Spline models for observational data. Society for Industrial and Applied Mathematics, Philadelphia · Zbl 0813.62001
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.