zbMATH — the first resource for mathematics

Data-driven penalty calibration: a case study for Gaussian mixture model selection. (English) Zbl 1395.62163
Summary: In the companion paper [the authors, ibid. 15, 41–68 (2011; Zbl 1395.62162)], a penalized likelihood criterion is proposed to select a Gaussian mixture model among a specific model collection. This criterion depends on unknown constants which have to be calibrated in practical situations. A “slope heuristics” method is described and experimented to deal with this practical problem. In a model-based clustering context, the specific form of the considered Gaussian mixtures allows us to detect the noisy variables in order to improve the data clustering and its interpretation. The behavior of our data-driven criterion is highlighted on simulated datasets, a curve clustering example and a genomics application.

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G07 Density estimation
Full Text: DOI