Benchmarking different clustering algorithms on functional data.

*(English)*Zbl 1414.62289Summary: Theoretical knowledge of clustering functions is still scarce and only few models are available in form of applicable code. In literature, most methods are based on the projection of the functions onto a basis and building fixed or random effects models of the basis coefficients. They involve various parameters, among them number of basis functions, projection dimension, number of iterations etc. They usually work well on the data presented in the articles, but their performance has in most cases not been tested objectively on other data sets, nor against each other. The purpose of this paper is to give an overview of several existing methods to cluster functional data. An outline of their theoretic concepts is given and the meaning of their hyperparameters is explained. A simulation study was set up to analyze the parameters’ efficiency and sensitivity on different types of data sets, that were registered on regular and on irregular grids. For each method, a linear model of the clustering results was evaluated with different parameter levels as predictors. Later, the methods’ performances were compared to each other with the help of a visualization tool, to identify which method works the best on a specific kind of data.

##### MSC:

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |

PDF
BibTeX
XML
Cite

\textit{C. Yassouridis} and \textit{F. Leisch}, Adv. Data Anal. Classif., ADAC 11, No. 3, 467--492 (2017; Zbl 1414.62289)

Full Text:
DOI

**OpenURL**

##### References:

[1] | Amato, U.; Theofanis, S., Wavelet shrinkage approaches to baseline signal estimation from repeated noisy measurements, Adv Appl Stat, 5, 21-50, (2005) · Zbl 1096.62028 |

[2] | Chiou, JM; Li, PL, Functional clustering and identifying substructures of longitudinal data, J R Stat Soc Ser B (Statistical Methodology), 69, 679-699, (2007) |

[3] | Eckart, C.; Young, G., The approximation of one matrix by another of lower rank, Psychometrika, 1, 211-218, (1936) · JFM 62.1075.02 |

[4] | Gareth M James (2003) http://www-bcf.usc.edu/ gareth/research/fclust |

[5] | Giacofci, M.; Lambert-Lacroix, S.; Marot, G.; Picard, F., Wavelet-based clustering for mixed-effects functional models in high dimension, Biometrics, 69, 31-40, (2011) · Zbl 1274.62774 |

[6] | Giacofci M, Lambert-Lacroix S, Marot G, Picard F (2012) curvclust: Curve clustering. R package version 0.0.1. http://cran.r-project.org/src/contrib/Archive/curvclust |

[7] | Hitchcock, DB; Ferreira, L., A comparison of hierarchical methods for clustering functional data, Commun Stat Simul Comput, 38, 1925-1949, (2009) · Zbl 1182.62135 |

[8] | Hubert, L.; Arabie, P., Comparing partitions, J Classif, 2, 193-218, (1985) · Zbl 0587.62128 |

[9] | James, GM; Sugar, CA, Clustering for sparsely sampled functional data, J Am Stat Assoc, 98, 397-408, (2003) · Zbl 1041.62052 |

[10] | Minh, H.; Niyogi, P.; Yao, Y.; Lugosi, G. (ed.); Simon, H. (ed.), Mercer’s theorem, feature maps, and smoothing, 154-168, (2006), Berlin Heidelberg · Zbl 1143.68554 |

[11] | Nason G (2013) wavethresh: Wavelets statistics and transforms. R package version 4.6.4. http://CRAN.R-project.org/package=wavethresh |

[12] | Peng, J.; Müller, HG, Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions, Ann Appl Stat, 2, 1056-1077, (2008) · Zbl 1149.62053 |

[13] | Tang, R.; Müller, HG, Time-synchronized clustering of gene expression trajectories, Biostatistics, 10, 32-45, (2009) |

[14] | TU Wien (2009) Vienna scientific cluster. http://vsc.ac.at/ |

[15] | Venables W, Ripley B (2002) Modern Applied Statistics with S. Statistics and computing. Springer, New York · Zbl 1006.62003 |

[16] | Yassouridis C, Leisch FL, Winkler C, Ziegler A, Beyerlein A (2016) Associations of growth patterns and islet autoimmunity in children with increased risk for type 1 diabetes: a functional analysis approach. Pediatric Diabetes |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.