Tibshirani, Robert; Walther, Guenther; Hastie, Trevor Estimating the number of clusters in a data set via the gap statistic. (English) Zbl 0979.62046 J. R. Stat. Soc., Ser. B, Stat. Methodol. 63, No. 2, 411-423 (2001). Summary: We propose a method (the ‘gap statistic’) for estimating the number of clusters (groups) in a set of data. The technique uses the output of any clustering algorithm (e.g., \(K\)-means or hierarchical), comparing the change in within-cluster dispersion with that expected under an appropriate reference null distribution. Some theory is developed for the proposal and a simulation study shows that the gap statistic usually outperforms other methods that have been proposed in the literature. Cited in 2 ReviewsCited in 196 Documents MSC: 62H30 Classification and discrimination; cluster analysis (statistical aspects) Keywords:hierarchy; K-means; uniform distribution; groups; clustering PDF BibTeX XML Cite \textit{R. Tibshirani} et al., J. R. Stat. Soc., Ser. B, Stat. Methodol. 63, No. 2, 411--423 (2001; Zbl 0979.62046) Full Text: DOI OpenURL