Nonparametric clustering of functional data using pseudo-densities. (English) Zbl 1357.62162

Summary: We study nonparametric clustering of smooth random curves on the basis of the \(L^{2}\) gradient flow associated to a pseudo-density functional and we discuss the conditions under which the clustering is well-defined both at the population and at the sample level. We provide an algorithm to idenify significant local modes of the estimated pseudo-density, which are associated to informative sample clusters, and we prove its consistency and other statistical properties. Our theory is developed under weak assumptions, which essentially reduce to the integrability of the random curves. If the underlying probability distribution is supported on a finite-dimensional subspace, we show that the proposed pseudo-density functional and the expectation of a kernel density estimator induce the same gradient flow, hence the same population clustering. Although our theory is developed for smooth curves that belong to a potentially infinite-dimensional functional space, we provide consistent procedures that can be used with real functional data (discretized and noisy curves). We illustrate these procedures by means of applications both on simulated and real datasets.


62G07 Density estimation
62G99 Nonparametric inference
62-07 Data analysis (statistics) (MSC2010)
62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: DOI arXiv Euclid