MDCGen swMATH ID: 32196 Software Authors: Iglesias, Félix; Zseby, Tanja; Ferreira, Daniel; Zimek, Arthur Description: MDCGen: multidimensional dataset generator for clustering. We present a tool for generating multidimensional synthetic datasets for testing, evaluating, and benchmarking unsupervised classification algorithms. Our proposal fills a gap observed in previous approaches with regard to underlying distributions for the creation of multidimensional clusters. As a novelty, normal and non-normal distributions can be combined for either independently defining values feature by feature (i.e., multivariate distributions) or establishing overall intra-cluster distances. Being highly flexible, parameterizable, and randomizable, MDCGen also implements classic pursued features: (a) customization of cluster-separation, (b) overlap control, (c) addition of outliers and noise, (d) definition of correlated variables and rotations, (e) flexibility for allowing or avoiding isolation constraints per dimension, (f) creation of subspace clusters and subspace outliers, (g) importing arbitrary distributions for the value generation, and (h) dataset quality evaluations, among others. As a result, the proposed tool offers an improved range of potential datasets to perform a more comprehensive testing of clustering algorithms. Homepage: https://ww2.mathworks.cn/matlabcentral/fileexchange/71871-mdcgen-v2 Keywords: clustering; dataset generator; synthetic data Related Software: Silhouettes Cited in: 1 Publication Standard Articles 1 Publication describing the Software, including 1 Publication in zbMATH Year MDCGen: multidimensional dataset generator for clustering. Zbl 1436.62262Iglesias, Félix; Zseby, Tanja; Ferreira, Daniel; Zimek, Arthur 2019 Cited by 3 Authors 1 Iglesias, Félix 1 Zimek, Arthur 1 Zseby, Tanja Cited in 1 Serial 1 Journal of Classification Cited in 2 Fields 1 Statistics (62-XX) 1 Computer science (68-XX) Citations by Year