zbMATH — the first resource for mathematics

A parametric version of probabilistic distance clustering. (English) Zbl 1436.62281
Greselin, Francesca (ed.) et al., Statistical learning of complex data. Selected papers of the 11th scientific meeting of the Classification and Data Analysis Group of the Italian Statistical Society (CLADAG 2017), Milan, Italy, September 13–15, 2017. Cham: Springer. Stud. Classification Data Anal. Knowl. Organ., 33-43 (2019).
Summary: Probabilistic distance (PD) clustering method grounds on the basic assumption that the product between the probability of the unit belonging to a cluster and the distance between the unit and the cluster center is constant, for each statistical unit. This constant is a measure of the classificability of the point, and the sum of the constant over units is referred to as the joint distance function (JDF). The parameters that minimize the JDF maximize the classificability of the units. The goal of this paper is to introduce a new distance measure based on a probability density function, specifically, we use the multivariate Gaussian and Student-$$t$$ distributions. We show using two simulated data sets that the use of a distance based on these two density functions improves the performance of PD clustering.
For the entire collection see [Zbl 1427.62004].
MSC:
 62H30 Classification and discrimination; cluster analysis (statistical aspects) 62F03 Parametric hypothesis testing 62H10 Multivariate distribution of statistics
Software:
FPDclustering; mixture; mvtnorm; R; teigen
Full Text:
References:
 [1] Andrews, J.L., Wickins, J.R., Boers, N.M., McNicholas, P.D.: teigen: an R package for model-based clustering and classification via the multivariate t distribution. J. Stat. Softw. 83, 1-32 (2017) [2] Ben-Israel, A., Iyigun, C.: Probabilistic d-clustering. J. Classif. 25, 5-26 (2008) · Zbl 1260.62039 [3] Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10, 191-203 (1984) [4] Browne, R.P., ElSherbiny, A., McNicholas, P.D.: FCM: mixture: Mixture Models for Clustering and Classification. R package version 1.4 (2015). https://cran.r-project.org/web/packages/mixture/index.html [5] Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B-met Ser. B 39, 1-38 (1977) · Zbl 0364.62022 [6] Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Cluster Analysis. Wiley Series in Probability and Statistics. Wiley, New York (2011) · Zbl 1274.62003 [7] Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., Hothorn, T.: mvtnorm: multivariate normal and t distributions. R package version 1.0-7 (2009). https://cran.r-project.org/web/packages/mvtnorm/index.html [8] Gordon, A.D.: Classification, 2nd edn. Chapman and Hall/CRC, Boca Raton (1999) [9] Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193-218 (1985) · Zbl 0587.62128 [10] Iyigun, C.: Probabilistic distance clustering. Ph.D. thesis, State University of New Jersey (2007) [11] Iyigun, C., Ben-Israel, A.: Probabilistic distance clustering adjusted for cluster size. Probab. Eng. Inform. Sci. 22, 68-125 (2008) · Zbl 1152.68500 [12] MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium, vol. 1, pp. 281-297 (1967) · Zbl 0214.46201 [13] McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley Interscience, New York (2000) · Zbl 0963.62061 [14] R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (2016) [15] Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846-850 (1971) [16] Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 2nd edn. Academic Press, New York (2003) · Zbl 1093.68103 [17] Tortora, C., McNicholas, P.D.: FPDclustering: PD-clustering and factor PD-clustering. R package version 1.1 (2016). https://cran.r-project.org/web/packages/FPDclustering/index.html [18] Tortora, C. · Zbl 1414.62279
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.