Schmutz, Amandine; Jacques, Julien; Bouveyron, Charles; Chèze, Laurence; Martin, Pauline Clustering multivariate functional data in group-specific functional subspaces. (English) Zbl 1505.62360 Comput. Stat. 35, No. 3, 1101-1131 (2020). Summary: With the emergence of numerical sensors in many aspects of everyday life, there is an increasing need in analyzing multivariate functional data. This work focuses on the clustering of such functional data, in order to ease their modeling and understanding. To this end, a novel clustering technique for multivariate functional data is presented. This method is based on a functional latent mixture model which fits the data into group-specific functional subspaces through a multivariate functional principal component analysis. A family of parsimonious models is obtained by constraining model parameters within and between groups. An Expectation Maximization algorithm is proposed for model inference and the choice of hyper-parameters is addressed through model selection. Numerical experiments on simulated datasets highlight the good performance of the proposed methodology compared to existing works. This algorithm is then applied to the analysis of the pollution in French cities for 1 year. Cited in 12 Documents MSC: 62-08 Computational methods for problems pertaining to statistics 62H30 Classification and discrimination; cluster analysis (statistical aspects) 62H25 Factor analysis and principal components; correspondence analysis 62R10 Functional data analysis Keywords:multivariate functional curves; multivariate functional principal component analysis; model-based clustering; EM algorithm Software:MBCbook; R; funHDDC; fda (R) PDFBibTeX XMLCite \textit{A. Schmutz} et al., Comput. Stat. 35, No. 3, 1101--1131 (2020; Zbl 1505.62360) Full Text: DOI Link References: [1] Akaike, H., A new look at the statistical model identification, IEEE Tran Autom Control, 9, 716-723 (1974) · Zbl 0314.62039 [2] Basso, RM; Lachos, VH; Cabral, CRB; Ghosh, P., Robust mixture modeling based on scale mixtures of skew-normal distributions, Comput Stat Data Anal, 54, 12, 2926-2941 (2010) · Zbl 1284.62193 [3] Berrendero, J.; Justel, A.; Svarc, M., Principal components for multivariate functional data, Comput Stat Data Anal, 55, 2619-263 (2011) · Zbl 1464.62025 [4] Biernacki, C.; Celeux, G.; Govaert, G., Assessing a mixture model for clustering with the integrated completed likelihood, IEEE Trans PAMI, 22, 719-725 (2000) [5] Birge, L.; Massart, P., Minimal penalties for Gaussian model selection, Probab Theory Relat Fields, 138, 33-73 (2007) · Zbl 1112.62082 [6] Bongiorno, EG; Goia, A., Classification methods for hilbert data based on surrogate density, Comput Stat Data Anal, 99, C, 204-222 (2016) · Zbl 1468.62030 [7] Bouveyron, C.; Jacques, J., Model-based clustering of time series in group-specific functional subspaces, Adv Data Anal Classif, 5, 4, 281-300 (2011) · Zbl 1274.62416 [8] Bouveyron, C.; Come, E.; Jacques, J., The discriminative functional mixture model for the analysis of bike sharing systems, Ann Appl Stat, 9, 4, 1726-1760 (2015) · Zbl 1397.62511 [9] Bouveyron, C.; Celeux, G.; Murphy, T.; Raftery, A., Model-based clustering and classification for data science: with applications in R (2019), Cambridge: Cambridge University Press, Cambridge · Zbl 1436.62006 [10] Byers, S.; Raftery, AE, Nearest-neighbor clutter removal for estimating features in spatial point processes, J Am Stat Assoc, 93, 442, 577-584 (1998) · Zbl 0926.62089 [11] Cattell, R., The scree test for the number of factors, Multivar Behav Res, 1, 2, 245-276 (1966) [12] Chen, L.; Jiang, C., Multi-dimensional functional principal component analysis, Stat Comput, 27, 1181-1192 (2016) · Zbl 1505.62097 [13] Chiou, J.; Chen, Y.; Yang, Y., Multivariate functional principal component analysis: a normalization approach, Stat Sin, 24, 1571-1596 (2014) · Zbl 1480.62115 [14] Chiou, JM; Li, PL, Functional clustering and identifying substructures of longitudinal data, J R Stat Soc Ser B Stat Methodol, 69, 4, 679-699 (2007) [15] Dempster, A.; Laird, N.; Rubin, D., Maximum likelihood from incomplete data via the EM algorithm, J R Stat Soc, 39, 1, 1-38 (1977) · Zbl 0364.62022 [16] Ferraty, F.; Vieu, P., Curves discrimination: a nonparametric approach, Comput Stat Data Anal, 44, 161-173 (2003) · Zbl 1429.62241 [17] Gallegos, MT; Ritter, G., A robust method for cluster analysis, Ann Stat, 33, 1, 347-380 (2005) · Zbl 1064.62074 [18] Gallegos, MT; Ritter, G., Trimming algorithms for clustering contaminated grouped data and their robustness, Adv Data Anal Classif, 3, 135-167 (2009) · Zbl 1284.62372 [19] Hennig, C.; Coretto, P., The noise component in model-based cluster analysis, 127-138 (2007), Berlin: Springer, Berlin [20] Ieva, F.; Paganoni, AM, Risk prediction for myocardial infarction via generalized functional regression models, Stat Methods Med Res, 25, 1648-1660 (2016) [21] Ieva, F.; Paganoni, A.; Pigoli, D.; Vitelli, V., Multivariate functional clustering for the morphological analysis of ECG curves, J R Stat Soc Series C (Appl Stat), 62, 3, 401-418 (2013) [22] Jacques, J.; Preda, C., Funclust: a curves clustering method using functional random variable density approximation, Neurocomputing, 112, 164-171 (2013) [23] Jacques, J.; Preda, C., Functional data clustering: a survey, Adv Data Anal Classif, 8, 3, 231-255 (2014) · Zbl 1414.62018 [24] Jacques, J.; Preda, C., Model based clustering for multivariate functional data, Comput Stat Data Anal, 71, 92-106 (2014) · Zbl 1471.62096 [25] James, G.; Sugar, C., Clustering for sparsely sampled functional data, J Am Stat Assoc, 98, 462, 397-408 (2003) · Zbl 1041.62052 [26] Kayano, M.; Dozono, K.; Konishi, S., Functional cluster analysis via orthonormalized Gaussian basis expansions and its application, J Classif, 27, 211-230 (2010) · Zbl 1337.62134 [27] Petersen KB, Pedersen MS (2012) The matrix cookbook. http://www2.imm.dtu.dk/pubdb/p.php?3274, version 20121115 [28] Preda, C., Regression models for functional data by reproducing kernel hilbert spaces methods, J Stat Plan Inference, 137, 829-840 (2007) · Zbl 1104.62043 [29] R Core Team (2017) R: a language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria, https://www.R-project.org/ [30] Ramsay, JO; Silverman, BW, Functional data analysis (2005), New York: Springer, New York [31] Rand, WM, Objective criteria for the evaluation of clustering methods, J Am Stat Assoc, 66, 336, 846-850 (1971) [32] Saporta, G., Méthodes exploratoires d’analyse de données temporelles, Cahiers du Bureau universitaire de recherche opérationnelle Série Recherche, 37-38, 7-194 (1981) [33] Schwarz, G., Estimating the dimension of a model, Ann Stat, 6, 2, 461-464 (1978) · Zbl 0379.62005 [34] Singhal, A.; Seborg, D., Clustering multivariate time-series data, J Chemom, 19, 427-438 (2005) [35] Tarpey, T.; Kinateder, K., Clustering functional data, J Classif, 20, 1, 93-114 (2003) · Zbl 1112.62327 [36] Tokushige, S.; Yadohisa, H.; Inada, K., Crisp and fuzzy k-means clustering algorithms for multivariate functional data, Comput Stat, 22, 1-16 (2007) · Zbl 1196.62089 [37] Traore, OI; Cristini, P.; Favretto-Cristini, N.; Pantera, L.; Vieu, P.; Viguier-Pla, S., Clustering acoustic emission signals by mixing two stages dimension reduction and nonparametric approaches, Comput Stat, 34, 2, 631-652 (2019) · Zbl 1417.62357 [38] Yamamoto, M., Clustering of functional data in a low-dimensional subspace, Adv Data Anal Classif, 6, 219-247 (2012) · Zbl 1254.62077 [39] Yamamoto, M.; Terada, Y., Functional factorial k-means analysis, Comput Stat Data Anal, 79, 133-148 (2014) · Zbl 1506.62200 [40] Yamamoto, M.; Hwang, H., Dimension-reduced clustering of functional data via subspace separation, J Classif, 34, 294-326 (2017) · Zbl 1373.62319 [41] Zambom, AZ; Collazos, JA; Dias, R., Functional data clustering via hypothesis testing k-means, Comput Stat, 34, 2, 527-549 (2019) · Zbl 1417.62181 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.