Azzalini, Adelchi; Menardi, Giovanna Density-based clustering with non-continuous data. (English) Zbl 1342.65017 Comput. Stat. 31, No. 2, 771-798 (2016). Summary: Density-based clustering relies on the idea of associating groups with regions of the sample space characterized by high density of the probability distribution underlying the observations. While this approach to cluster analysis exhibits some desirable properties, its use is necessarily limited to continuous data only. The present contribution proposes a simple but working way to circumvent this problem, based on the identification of continuous components underlying the non-continuous variables. The basic idea is explored in a number of variants applied to simulated data, confirming the practical effectiveness of the technique and leading to recommendations for its practical usage. Some illustrations using real data are also presented. MSC: 62-08 Computational methods for problems pertaining to statistics 62H30 Classification and discrimination; cluster analysis (statistical aspects) 62G07 Density estimation Keywords:density estimation; mixed variables; modal clustering; model-based clustering; multidimensional scaling Software:pdfCluster; clusfind; mclust; flexmix; R; MULTIMIX; UCI-ml; MASS (R); cluster (R) PDFBibTeX XMLCite \textit{A. Azzalini} and \textit{G. Menardi}, Comput. Stat. 31, No. 2, 771--798 (2016; Zbl 1342.65017) Full Text: DOI References: [1] Anderlucci L, Hennig C (2014) Clustering of categorical data: a comparison of a model- based and a distance-based approach. Commun Stat Theory Methods 43(4):704-721 · Zbl 1287.62010 [2] Arabie, P.; Hubert, L.; Bagozzi, R. (ed.), Cluster analysis in marketing research (1994), Oxford [3] Asuncion A, Newman D (2010) UCI machine learning repository. School of Information and Computer Sciences, University of California, Irvine [4] Azzalini A, Menardi G (2014) Clustering via nonparametric density estimation: the R package pdfCluster. J Stat Softw 57(11):1-26 · Zbl 1322.62175 [5] Azzalini A, Torelli N (2007) Clustering via nonparametric density estimation. Stat Comput 17:71-80 [6] Bartholomew DJ (1980) Factor analysis for categorical data. J R Stat Soc Series B 42:293-321 · Zbl 0471.62054 [7] Bartholomew DJ, Knott M (1999) Latent variable models and factor analysis, 2nd edn. Arnold Publisher, London · Zbl 1066.62528 [8] Browne RP, McNicholas PD (2012) Model-based clustering, classification, and discriminant analysis of data with mixed type. J Stat Plan Inference 142:2976-2984 · Zbl 1335.62093 [9] Fraley C, Raftery A (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41:578-588 · Zbl 0920.68038 [10] Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis and density estimation. J Am Stat Assoc 97:611-631 · Zbl 1073.62545 [11] Fraley C, Raftery AE, Murphy B, Scrucca L (2012) Mclust version 4 for R: normal mixture modeling and model-based clustering, classification, and density estimation. Technical Report 597, Department of Statistics, University of Washington · Zbl 1520.62002 [12] Fukunaga K, Hostetler LD (1975) The estimation of the gradient of a density function, with application in pattern recognition. IEEE Trans Inf Theory 21:32-40 · Zbl 0297.62025 [13] Goodman LA (1974) Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61:215-231 · Zbl 0281.62057 [14] Gruen B, Leisch F (2008) FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. J Stat Softw 28(4):1-35. http://www.jstatsoft.org/v28/i04/ · Zbl 1073.62545 [15] Hartigan JA (1975) Clustering algorithms. Wiley, New York · Zbl 0372.62040 [16] Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193-218 · Zbl 0587.62128 [17] Hunt L, Jorgensen M (2003) Mixture model clustering for mixed data with missing information. Comput Stat Data Anal 41:429-440 · Zbl 1256.62037 [18] Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York · Zbl 1345.62009 [19] Leisch F (2004) FlexMix: a general framework for finite mixture models and latent class regression in R. J Stat Softw 11(8):1-18. http://www.jstatsoft.org/v11/i08/ [20] Lin TI (2010) Robust mixture modeling using multivariate skew t distributions. Stat Comput 20(3):343-356 [21] Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K (2013) Cluster: cluster analysis basics and extensions. R package version 1.14.4 [22] Marbac M, Biernacki C, Vandewalle V (2015) Model-based clustering for conditionally correlated categorical data. J Classif 32(2):145-175 · Zbl 1335.62103 [23] Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, Cambridge · Zbl 0432.62029 [24] Menardi G, Azzalini A (2014) An advancement in clustering via nonparametric density estimation. Stat Comput 24:753-767 · Zbl 1322.62175 [25] Oh M, Raftery AE (1998) Model-based clustering with dissimilarities: a Bayesian approach. J Comput Graph Stat 16:559-585 [26] R Development Core Team (2011) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. ISBN 3-900051-07-0 [27] Stuetzle W (2003) Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J Classif 20:25-47 · Zbl 1055.62075 [28] Stuetzle W, Nugent R (2010) A generalized single linkage method for estimating the cluster tree of a density. J Comput Graph Stat 19:397-418 [29] Tzeng J, Lu HH, Li WH (2008) Multidimensional scaling for large genomic data sets. BMC Bioinformatics 9(1):179 [30] Venables VN, Ripley BD (2002) Modern applied statistics with S. Springer, New York. http://www.stats.ox.ac.uk/pub/MASS4 · Zbl 1006.62003 [31] Vermunt, JK; Magidson, J.; Hagenaars, JA (ed.); McCutcheon, AL (ed.), Latent class cluster analysis, 89-106 (2002), Cambridge [32] Wishart, D.; Cole, AJ (ed.), Mode analysis: a generalization of nearest neighbor which reduces chaining effects, 282-308 (1969), Cambridge [33] Wolfe JH (1970) Pattern clustering by multivariate mixture analysis. Multivar Behav Res 5:329-350 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.