×

A divisive clustering method for functional data with special consideration of outliers. (English) Zbl 1416.62343

Summary: This paper presents DivClusFD, a new divisive hierarchical method for the non-supervised classification of functional data. Data of this type present the peculiarity that the differences among clusters may be caused by changes as well in level as in shape. Different clusters can be separated in different subregion and there may be no subregion in which all clusters are separated. In each step of division, the DivClusFD method explores the functions and their derivatives at several fixed points, seeking the subregion in which the highest number of clusters can be separated. The number of clusters is estimated via the gap statistic. The functions are assigned to the new clusters by combining the \(k\)-means algorithm with the use of functional boxplots to identify functions that have been incorrectly classified because of their atypical local behavior. The DivClusFD method provides the number of clusters, the classification of the observed functions into the clusters and guidelines that may be for interpreting the clusters. A simulation study using synthetic data and tests of the performance of the DivClusFD method on real data sets indicate that this method is able to classify functions accurately.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

fda.usc; fda (R); funHDDC
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Abraham, C.; Cornillon, PA; Matzner-Lber, E.; Molinari, N., Unsupervised curves clustering using B-splines, Scand J Stat, 30, 581-595, (2003) · Zbl 1039.91067
[2] Alonso, AM; Casado, D.; Romo, J., Supervised classification for functional data: a weighted distance aprproach, Comput Stat Data Anal, 56, 2334-2346, (2012) · Zbl 1252.62061
[3] Berrendero, JR; Justel, A.; Svarc, M., Principal components for multivariate functional data, Comput Stat Data Anal, 55, 2619-2634, (2011) · Zbl 1464.62025
[4] Bouveyron, C.; Jacques, J., Model-based clustering of time series in group-specific functional subspaces, Adv Data Anal Classif, 5, 281-300, (2011) · Zbl 1274.62416
[5] Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The UCR time series classification archive. http://www.cs.ucr.edu/ eamonn/time_series_data/
[6] Chiou, JM; Li, PL, Functional clustering and identifying substructures of longitudinal data, J R Stat Soc B, 69, 679-699, (2011)
[7] Febrero-Bande, M.; Oviedo de la Fuente, M., Statistical computing in functional data analysis: the R package fda.usc, J Stat Softw, 51, 1-28, (2012)
[8] Fraiman, R.; Justel, A.; Svarc, M., Selection of variables for cluster analysis and classification rules, J Am Stat Assoc, 103, 1294-1303, (2008) · Zbl 1205.62077
[9] Ieva, F.; Paganoni, AM; Pigoli, D.; Vitelli, V., Multivariate functional clustering for the morphological analysis of electrocardiograph curves, J R Stat Soc Ser C (Appl Stat), 62, 401-418, (2013)
[10] Jacques, J.; Preda, C., Funclust: a curves clustering method using functional random variable density approximation, Neurocomputing, 171, 112-164, (2013)
[11] Jacques, J.; Preda, C., Model-based clustering for multivariate functional data, Comput Stat Data Anal, 71, 92-106, (2014) · Zbl 1471.62096
[12] Jacques, J.; Preda, C., Functional data clustering: a survey, Adv Data Anal Classif, 8, 231-255, (2014)
[13] James, G.; Sugar, C., Clustering for sparsely sampled functional data, J Am Stat Assoc, 98, 397-408, (2003) · Zbl 1041.62052
[14] Kate, Rohit J., Using dynamic time warping distances as features for improved time series classification, Data Mining and Knowledge Discovery, 30, 283-312, (2015)
[15] López-Pintado, S.; Romo, J., On the concept of depth for functional data, J Am Stat Assoc, 104, 718-734, (2009) · Zbl 1388.62139
[16] López-Pintado, S.; Sun, Y.; Lin, JK; Genton, MG, Simplicial band depth for multivariate functional data, Adv Data Anal Classif, 8, 321-338, (2014)
[17] Mosler, Karl, Depth statistics, 17-34, (2013), Berlin, Heidelberg
[18] Ray, S.; Mallick, B., Functional clustering by Bayesian wavelet methods, J R Stat Soc Ser B Stat Methodol, 68, 305-332, (2006) · Zbl 1100.62058
[19] Ramsay, J.; Hooker, G.; Graves, S., Functional data analysis with R and Matlab, Springer, Berlin., (2009) · Zbl 1179.62006
[20] Billheimer, D., Functional data analysis, 2nd edition edited by J. O. ramsay and B. W. silverman, Biometrics, 63, 300-301, (2007)
[21] Sangalli, LM; Secchi, P.; Vantini, S.; Vitelli, V., \(k\)-means alignment for curve clustering, Comput Stat Data Anal, 54, 1219-1233, (2010) · Zbl 1464.62153
[22] Sangalli, LM; Secchi, P.; Vantini, S.; Vitelli, V., Functional clustering and alignment methods with applications, Commun Appl Ind Math, 1, 205-224, (2010) · Zbl 1329.62289
[23] Serban, N.; Wasserman, L., CATS: cluster analysis by transformation and smoothing, J Am Stat Assoc, 100, 990-999, (2005) · Zbl 1117.62422
[24] Sun, Y.; Genton, MG, Functional boxplots, J Comput Graph Stat, 20, 316-334, (2011)
[25] Tarpey, T.; Kinateder, KKJ, No article title, Clustering functional data. J Classif, 20, 93-114, (2003)
[26] Tibshirani, R.; Walther, G.; Hastie, T., Estimating the number of data clusters via the gap statistic, J R Stat Soc B, 63, 411-423, (2001) · Zbl 0979.62046
[27] Tokushige, S.; Yadohisa, H.; Inada, K., Crisp and fuzzy \(k\)-means clustering algorithms for multivariate functional data, Comput Stat, 21, 1-16, (2007) · Zbl 1196.62089
[28] Tuddenham RD, Snyder MM (1954) Physical growth of California boys and girls from birth to eighteen years. Tech. Rep. 1, University of California Publications in Child Development
[29] Tukey, J., Exploratory data analysis, Addison-Westley, Boston., (1977) · Zbl 0409.62003
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.