OCLUS: an analytic method for generating clusters with known overlap. (English) Zbl 1336.62191

Summary: The primary method for validating cluster analysis techniques is through Monte Carlo simulations that rely on generating data with known cluster structure [G. W. Milligan, in: Clustering and classification. Singapore: World Scientific. 341–375 (1996; Zbl 0895.62069)]. This paper defines two kinds of data generation mechanisms with cluster overlap, marginal and joint; current cluster generation methods are framed within these definitions. An algorithm generating overlapping clusters based on shared densities from several different multivariate distributions is proposed and shown to lead to an easily understandable notion of cluster overlap. Besides outlining the advantages of generating clusters within this framework, a discussion is given of how the proposed data generation technique can be used to augment research into current classification techniques such as finite mixture modeling, classification algorithm robustness, and latent profile analysis.


62H30 Classification and discrimination; cluster analysis (statistical aspects)


Zbl 0895.62069
Full Text: DOI