×

Evaluating the numerical instability in fuzzy clustering validation of high-dimensional data. (English) Zbl 1443.62181

Summary: Fuzzy clustering validation of high-dimensional datasets is only possible using a reliable cluster validity index (CVI). A good CVI must correctly recognize a data structure and its validations must be independently of any parameter of a clustering algorithm or data property. However, some classical fuzzy CVIs as Partition Coefficient (PC), Partition Entropy (PE) and Fukuyama-Sugeno (FS) have the monotonic tendency in function of the number of clusters. Although the literature presents extensive investigations about such tendency, they were conducted for low-dimensional data, in which such data property does not affect the clustering behavior. In order to investigate how such aspects affect the fuzzy clustering results of high-dimensional data, in this work we have clustered objects of thirteen real datasets, using the Fuzzy c-Means algorithm. The fuzzy partitions were validated by PC, PE, FS and some proposed improvements of them to lead with the monotonic tendency, totaling eight fuzzy CVIs analyzed. Besides the analysis made about the number of clusters selected by the CVIs, the Mann-Kendall test was performed to verify statistically the monotonic trend of the CVIs results. From the two analysis made, the Modified Partition Coefficient and Scaled Partition Entropy indices were successful in respectively improving the PC and PE indices.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H86 Multivariate analysis and fuzziness
62-08 Computational methods for problems pertaining to statistics

Software:

trend
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Steinbach, M.; Ertöz, L.; Kumar, V., The Challenges of Clustering High Dimensional Data, New Directions in Statistical Physics: Econophysics, Bioinformatics, and Pattern Recognition, 273-309 (2004) · Zbl 1078.62066
[2] Jensen, R.; Cornelis, C., Fuzzy-rough nearest neighbour classification and prediction, Rough Sets and Fuzzy Sets in Natural Computing. Rough Sets and Fuzzy Sets in Natural Computing, Theor. Comput. Sci., 412, 42, 5871-5884 (2011) · Zbl 1223.68095
[3] Xu, R.; Wunsch, D., Clustering, IEEE Series on Computational Intelligence (2009), Wiley
[4] Bezdek, J. C., Pattern Recognition with Fuzzy Objective Function Algorithms (1981), Kluwer Academic Publishers: Kluwer Academic Publishers Norwell, MA, USA · Zbl 0503.68069
[5] Wang, W.; Zhang, Y., On fuzzy cluster validity indices, Fuzzy Sets Syst., 158, 19, 2095-2117 (2007) · Zbl 1123.62046
[6] Bezdek, J. C., Cluster validity with fuzzy sets, J. Cybern., 3, 3, 58-73 (1973) · Zbl 0294.68035
[7] Bezdek, J. C., Numerical taxonomy with fuzzy sets, J. Math. Biol., 1, 1, 57-71 (1974) · Zbl 0403.62039
[8] Bezdek, J. C., Mathematical models for systematics and taxonomy, (8th Int. Conf. Numerical Taxonomy San Francisco (1975)), 143-166 · Zbl 0362.62067
[9] Fukuyama, Y.; Sugeno, M., A new method of choosing the number of clusters for fuzzy c-means method, (Fuzzy Systems Symposium (1989)), 247-250
[10] Xie, X. L.; Beni, G., A validity measure for fuzzy clustering, IEEE Trans. Pattern Anal. Mach. Intell., 13, 8, 841-847 (1991)
[11] Schwämmle, V.; Jensen, O. N., A simple and fast method to determine the parameters for fuzzy c-means cluster analysis, Bioinformatics, 26, 22, 2841-2848 (2010)
[12] Pal, N. R.; Bezdek, J. C., On cluster validity for the fuzzy c-means model, IEEE Trans. Fuzzy Syst., 3, 3, 370-379 (1995)
[13] Li, H.; Zhang, S.; Ding, X.; Zhang, C.; Dale, P., Performance Evaluation of Cluster Validity Indices (CVIs) on Multi/Hyperspectral Remote Sensing Datasets, Remote Sensing, 8 (2016)
[14] Kwon, S. H., Cluster validity index for fuzzy clustering, Electron. Lett., 34, 22, 2176-2177 (1998)
[15] Tang, Y.; Sun, F.; Sun, Z., Improved validation index for fuzzy clustering, (Proceedings of the 2005, American Control Conference, 2005, vol. 2 (2005)), 1120-1125
[16] Capitaine, H. L.; Frelicot, C., A cluster-validity index combining an overlap measure and a separation measure based on fuzzy-aggregation operators, IEEE Trans. Fuzzy Syst., 19, 3, 580-588 (2011)
[17] Kaile Zhou, C. F.S. Y.; Ding, Shuai, Comparison and weighted summation type of fuzzy cluster validity indices, Int. J. Comput. Commun. Control, 9, 3, 370-378 (2014)
[18] Tang, Y.; Hu, X.; Pedrycz, W.; Song, X., Possibilistic fuzzy clustering with high-density viewpoint, Neurocomputing, 329, 407-423 (2019)
[19] Hu, Y.; Zuo, C.; Yang, Y.; Qu, F., A robust cluster validity index for fuzzy c-means clustering, (International Conference on Transportation, Mechanical, and Electrical Engineering (2011)), 448-451
[20] Eustáquio, F.; Nogueira, T., On monotonic tendency of some fuzzy cluster validity indices for high-dimensional data, (2018 7th Brazilian Conference on Intelligent Systems (BRACIS) (2018)), 558-563
[21] Dave, R. N., Validating fuzzy partitions obtained through c-shells clustering, Pattern Recognit. Lett., 17, 6, 613-623 (1996)
[22] Chong, A.; Gedeon, T. D.; Koczy, L. T., A hybrid approach for solving the cluster validity problem, (International Conference on Digital Signal Processing Proceedings, vol. 2 (2002)), 1207-1210
[23] Yang, M.-S.; Wu, K.-L., A new validity index for fuzzy clustering, (IEEE International Conference on Fuzzy Systems, vol. 1 (2001)), 89-92
[24] Li sheng, C., The improved partition coefficient, International Conference on Advances in Engineering, 24, 534-538 (2011)
[25] Dunn, J. C., Indices of partition fuzziness and the detection of clusters in large data sets, (Fuzzy Automata and Decision Processes (1977), Elsevier: Elsevier New York), 271-284
[26] Eustáquio, F.; Camargo, H.; Rezende, S.; Nogueira, T., On fuzzy cluster validity indexes for high dimensional feature space, (Advances in Fuzzy Logic and Technology 2017: Proceedings of the 10th Conference of the European Society for Fuzzy Logic and Technology, 2017. Advances in Fuzzy Logic and Technology 2017: Proceedings of the 10th Conference of the European Society for Fuzzy Logic and Technology, 2017, Warsaw, Poland, vol. 2 (2018)), 12-23
[27] Pal, N. R.; Pal, K.; Keller, J. M.; Bezdek, J. C., A possibilistic fuzzy c-means clustering algorithm, IEEE Trans. Fuzzy Syst., 13, 4, 517-530 (2005)
[28] Dunn, J. C., A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters, J. Cybern., 3, 3, 32-57 (1973) · Zbl 0291.68033
[29] Bezdek, J. C., Pattern Recognition with Fuzzy Objective Function Algorithms (1981), Kluwer Academic Publishers: Kluwer Academic Publishers Norwell, MA, USA · Zbl 0503.68069
[30] Hartigan, J. A., Clustering Algorithms (1975), John Wiley and Sons, Inc.: John Wiley and Sons, Inc. New York, NY, USA · Zbl 0372.62040
[31] Klir, G. J.; Yuan, B., Fuzzy Sets and Fuzzy Logic: Theory and Applications (1995), Prentice-Hall, Inc.: Prentice-Hall, Inc. Upper Saddle River, NJ, USA · Zbl 0915.03001
[32] Bezdek, J. C.; Windham, M. P.; Ehrlich, R., Statistical parameters of cluster validity functionals, Int. J. Comput. Inf. Sci., 9, 4, 323-336 (1980) · Zbl 0468.62051
[33] Pazzani, M., Syskill and Webert web page ratings data set (1998)
[34] Group, M. L., Irish economic sentiment dataset (2009)
[35] Forman, G., 19mclasstextwc dataset (2006)
[36] Karypis, G., Cluto - software for clustering high-dimensional datasets (2006)
[37] Rossi, R. G.; Marcacini, R. M.; Rezende, S. O., Benchmarking Text Collections for Classication and Clustering Tasks (2013), Institute of Mathematics and Computer Sciences, Federal University of São Carlos, Tech. Rep. 395
[38] Rennie, J., 20 newsgroup dataset (2008)
[39] Han, J.; Kamber, M., Data Mining: Concepts and Techniques (2006), Morgan Kaufmann Publishers: Morgan Kaufmann Publishers 500 Sansome Street, Suite 400, San Francisco, CA 94111 · Zbl 1445.68004
[40] Subhashini, R.; Kumar, V. J.S., Evaluating the performance of similarity measures used in document clustering and information retrieval, (International Conference on Integrated Intelligent Computing (2010)), 27-31
[41] Kumar, D.; Bezdek, J.; Palaniswami, M.; Rajasegarar, S.; Leckie, C.; Havens, T., A hybrid approach to clustering in big data, IEEE Trans. Cybern., 99, 1 (2015)
[42] Bezdek, J. C.; Moshtaghi, M.; Runkler, T.; Leckie, C., The generalized c index for internal fuzzy cluster validity, IEEE Trans. Fuzzy Syst., 24, 6, 1500-1512 (2016)
[43] Mann, H. B., Nonparametric tests against trend, Econometrica, 13, 3, 245-259 (1945) · Zbl 0063.03770
[44] Kendall, M., Rank Correlation Methods (1948), C. Griffin · Zbl 0032.17602
[45] Pohlert, T., trend: Non-Parametric Trend Tests and Change-Point Detection (2018), r package version 1.1.1
[46] Wu, K.-L., An analysis of robustness of partition coefficient index, (IEEE International Conference on Fuzzy Systems (2008)), 372-376
[47] Valente, R. X.; Braga, A. P.; Pedrycz, W., A new fuzzy clustering validity index based on fuzzy proximity matrices, (Brazilian Congress on Computational Intelligence (2013)), 489-494
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.