×

zbMATH — the first resource for mathematics

A review and proposal of (fuzzy) clustering for nonlinearly separable data. (English) Zbl 07174630
Summary: In many practical situations data may be characterized by nonlinearly separable clusters. Classical (hard or fuzzy) clustering algorithms produce a partition of objects by computing the Euclidean distance. As such, they are based on the linearity assumption and, therefore, do not identify properly clusters characterized by nonlinear structures. To overcome this limitation, several approaches can be followed: density-, kernel-, graph- or manifold-based clustering. A review of these approaches is offered and some new fuzzy manifold-based clustering algorithms, involving the so-called geodesic distance, are proposed. The effectiveness of such algorithms is shown by synthetic, benchmark and real data.

MSC:
68T37 Reasoning under uncertainty in the context of artificial intelligence
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] MacQueen, J. B., Some methods for classification and analysis of multivariate observations, (Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (1967), University of California Press: University of California Press Berkeley), 281-297
[2] Bezdek, J. C., Pattern Recognition with Fuzzy Objective Function Algorithm (1981), Plenum Press: Plenum Press New York · Zbl 0503.68069
[3] Gustafson, D. E.; Kessel, W. C., Fuzzy clustering with a fuzzy covariance matrix, (Proceedings of the 1978 IEEE Conference on Decision and Control Including the 17th Symposium on Adaptive Processes (1979)), 761-766
[4] Gath, I.; Geva, A. B., Unsupervised optimal fuzzy clustering, IEEE Trans. Pattern Anal. Mach. Intell., 7, 773-781 (1989) · Zbl 0709.62592
[5] Ester, M.; Kriegel, H. P.; Jörg, S.; Xu, X., A density-based algorithm for discovering clusters in large spatial databases with noise, (Proceedings of the Second International Conference on Knowledge Discovery and Data Mining KDD’96 (1996), AAAI Press: AAAI Press Portland), 226-231
[6] Shawe-Taylor, J.; Cristianini, N., Kernel Methods for Pattern Analysis (2004), Cambridge University Press: Cambridge University Press Cambridge
[7] Dhillon, I. S.; Guan, Y.; Kulis, B., Kernel k-means: spectral clustering and normalized cuts, (Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD’04 (2004), ACM: ACM New York), 551-556
[8] Ng, A.; Jordan, M.; Weiss, Y., On spectral clustering: analysis and an algorithm, (Dietterich, T.; Becker, S.; Ghahramani, Z., Advances in Neural Information Processing Systems, vol. 14 (2002), MIT Press: MIT Press Massachusetts), 849-856
[9] Asgharbeygi, N.; Maleki, A., Geodesic k-means clustering, (Proceedings of the 19th International Conference on Pattern Recognition ICPR 2008, Tampa, Florida (2008)), 1-4
[10] Stanford, D. C.; Raftery, A. E., Finding curvilinear features in spatial point patterns: principal curve clustering with noise, IEEE Trans. Pattern Anal. Mach. Intell., 22, 601-609 (2000)
[11] Campello, R. J.G. B.; Moulavi, D.; Zimek, A.; Sander, J., Hierarchical density estimates for data clustering, visualization, and outlier detection, ACM Trans. Knowl. Discov. Data, 10, 1-51 (2015)
[12] Tran, T. N.; Drab, K.; Daszykowski, M., Revised DBSCAN algorithm to cluster data with dense adjacent clusters, Chemom. Intell. Lab. Syst., 120, 92-96 (2013)
[13] Han, D.; Agrawal, A.; Liao, W. K.; Choudhary, A., A fast DBSCAN algorithm with spark implementation, (Roy, S. S.; Samui, P.; Deo, R.; Ntalampiras, S., Big Data in Engineering Applications (2018), Springer: Springer Singapore), 173-192
[14] Zhou, A.; Zhou, S.; Cao, J.; Fan, Y.; Hu, Y., Approaches for scaling DBSCAN algorithm to large spatial databases, J. Comput. Sci. Technol., 15, 509-526 (2000) · Zbl 0970.68583
[15] Ankerst, M.; Breunig, M. M.; Kriegel, H. P.; Sander, J., OPTICS: Ordering points to identify the clustering structure, (Proceedings of the ACM SIGMOD International Conference on Management of Data SIGMOD’99 (1999), ACM: ACM New York), 49-60
[16] Hahsler, M.; Piekenbrock, M.; Arya, S.; Mount, D., dbscan: Density Based Clustering of Applications with Noise (DBSCAN) and related algorithms. R package version 1.1-3 (2018)
[17] Hennig, C., fpc: flexible procedures for clustering. R package version 2.1-11.1 (2018)
[18] Kriegel, H. P.; Kröger, P.; Sander, J.; Zimek, A., Density-based clustering, Wiley Interdiscip. Rev. Data Min. Knowl. Discov., 1, 3, 231-240 (2011)
[19] Sander, J., Density-based clustering, (Sammut, C.; Webb, G. I., Encyclopedia of Machine Learning and Data Mining. (2016), Springer: Springer Boston), 1-5
[20] Sander, J.; Ester, M.; Kriegel, H. P.; Xu, X., Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications, Data Min. Knowl. Discov., 2, 169-194 (1998)
[21] Schubert, E.; Sander, J.; Ester, M.; Kriegel, H. P.; Xu, X., DBSCAN revisited, revisited: why and how you should (still) use DBSCAN, ACM Trans. Database Syst., 42, 19 (2017)
[22] Nasibova, E. N.; Ulutagay, G., Robustness of density-based clustering methods with various neighbourhood relations, Fuzzy Sets Syst., 160, 3601-3615 (2009) · Zbl 1185.68555
[23] Bordogna, G.; Ienco, D., Fuzzy core dbscan clustering algorithm, (Laurent, A.; Strauss, O.; Bouchon-Meunier, B.; Yager, R. R., Information Processing and Management of Uncertainty in Knowledge-Based Systems. Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2014. Information Processing and Management of Uncertainty in Knowledge-Based Systems. Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2014, Communications in Computer and Information Science, vol. 442 (2014), Springer: Springer Cham), 100-109
[24] Ienco, D.; Bordogna, G., Fuzzy extensions of the DBScan clustering algorithm, Soft Comput., 22, 1719-1730 (2018) · Zbl 1398.62165
[25] Ulutagaya, G.; Nasibov, E., Fuzzy and crisp clustering methods based on the neighbourhood concept: a comprehensive review, J. Intell. Fuzzy Syst., 23, 1-11 (2012)
[26] Cristianini, N.; Shawe-Taylor, J., An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods (2000), Cambridge University Press: Cambridge University Press Cambridge
[27] Karatzoglou, A.; Smola, A.; Hornik, K.; Zeileis, A., kernlab: an S4 package for kernel methods in R, J. Stat. Softw., 11, 9, 1-20 (2004)
[28] Zhang, D. Q.; Chen, S. C., Clustering incomplete data using kernel-based fuzzy c-means algorithm, Neural Process. Lett., 18, 155-162 (2003)
[29] Zhang, D. Q.; Chen, S. C., A novel kernelized fuzzy C-means algorithm with application in medical image segmentation, Artif. Intell. Med., 32, 37-50 (2004)
[30] Ding, Y.; Fu, X., Kernel-based fuzzy c-means clustering algorithm based on genetic algorithm, Neurocomputing, 188, 233-238 (2016)
[31] Memon, K. H.; Lee, D. H., Generalised kernel weighted fuzzy C-means clustering algorithm with local information, Fuzzy Sets Syst., 340, 91-108 (2018) · Zbl 1397.62226
[32] Schaeffer, S. E., Graph clustering, Comput. Sci. Rev., 1, 27-64 (2007) · Zbl 1302.68237
[33] Luxburg, U., A tutorial on spectral clustering, Stat. Comput., 17, 395-416 (2007)
[34] Shi, J.; Malik, J., Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell., 22, 888-905 (2000)
[35] Zahn, C. T., Graph-theoretical methods for detecting and describing gestalt clusters, IEEE Trans. Comput., 20, 68-86 (1971) · Zbl 0264.68040
[36] Oksanen, J.; Blanchet, F. G.; Friendly, M.; Kindt, R.; Legendre, P.; McGlinn, D.; Minchin, P. R.; O’Hara, R. B.; Simpson, G. L.; Solymos, P.; Stevens, M. H.H.; Szoecs, E.; Wagner, H., vegan: community ecology package. R package version 2.5-4 (2019)
[37] Sharan, R.; Shamir, R., CLICK: a clustering algorithm with applications to gene expression analysis, (Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology ISMB ’00 (2000), AAAI Press: AAAI Press Menlo Park), 307-316
[38] Dijkstra, E. W., A note on two problems in connexion with graphs, Numer. Math., 1, 269-271 (1959) · Zbl 0092.16002
[39] Floyd, R. W., Algorithm 97: shortest path, Commun. ACM, 5, 345 (1962)
[40] Warshall, S., A theorem on Boolean matrices, J. ACM, 9, 11-12 (1962) · Zbl 0118.33104
[41] Tenenbaum, J. B.; de Silva, V.; Langford, J. C., A global geometric framework for nonlinear dimensionality reduction, Science, 290, 2319-2323 (2000)
[42] Lee, J. A.; Verleysen, M., Nonlinear Dimensionality Reduction (2007), Springer-Verlag: Springer-Verlag New York · Zbl 1128.68024
[43] Asgharbeygi, N.; Maleki, A., Geodesic k-means clustering, (Proceedings of the 19th International Conference on Pattern Recognition ICPR 2008 (2008)), 1-4
[44] Kaufman, L.; Rousseeuw, P. J., Clustering by means of medoids, (Dodge, Y., Statistical Data Analysis Based on the L1-Norm and Related Methods (1987), North Holland: North Holland Amsterdam), 405-416
[45] Kaufman, L.; Rousseeuw, P. J., Finding Groups in Data: an Introduction to Cluster Analysis (1990), John Wiley and Sons: John Wiley and Sons Hoboken · Zbl 1345.62009
[46] Wu, A. Y.; Garland, M.; Han, J., Mining scale-free networks using geodesic clustering, (Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD’04 (2004), ACM: ACM New York), 719-724
[47] Ray, S.; Pakhira, M. K., Clustering of scale free networks using a k-medoid framework, (Proceedings of the 2nd International Conference on Computer and Communication Technology ICCCT-2011 (2011)), 94-99
[48] Karygianni, S.; Frossard, P., Tangent-based manifold approximation with locally linear models, Signal Process., 104, 232-247 (2014)
[49] Babaeian, A.; Babaee, M.; Bayestehtashk, A.; Bandarabadi, M., Nonlinear subspace clustering using curvature constrained distances, Pattern Recognit. Lett., 68, 118-125 (2015)
[50] Feil, B.; Abonyi, J., Geodesic distance based fuzzy clustering, (Saad, A.; Dahal, K.; Sarfraz, M.; Roy, R., Soft Computing in Industrial Applications. Advances in Soft Computing, vol. 39 (2007), Springer: Springer Heidelberg), 50-59 · Zbl 05218151
[51] Borg, I.; Groenen, P., Modern Multidimensional Scaling: Theory and Applications (2005), Springer-Verlag: Springer-Verlag New York · Zbl 1085.62079
[52] Balasubramanian, M.; Schwartz, E. L.; Tenenbaum, J. B.; de Silva, V.; Langford, J. C., The Isomap algorithm and topological stability, Science, 295, 5552, 7 (2002)
[53] Krishnapuram, R.; Joshi, A.; Yi, L., A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering, (Proceedings of the Fuzzy Systems Conference FUZZ-IEEE ’99 (1999)), 1281-1286
[54] Király, A.; Vathy-Fogarassy, Á.; Abonyi, J., Geodesic distance based fuzzy c-medoid clustering - searching for central points in graphs and high dimensional data, Fuzzy Sets Syst., 286, 157-172 (2016)
[55] Kim, J.; Shim, K. H.; Choi, S., Soft geodesic kernel k-means, (Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP ’07 (2007)), 429-432
[56] Runkler, T. A., Relational fuzzy clustering, (de Oliveira, J. V.; Pedrycz, W., Advances in Fuzzy Clustering and Its Applications (2007), Wiley: Wiley Chichester), 31-51
[57] Giordani, P.; Ramos-Guajardo, A. B., A fuzzy clustering procedure for random fuzzy sets, Fuzzy Sets Syst., 305, 54-69 (2016) · Zbl 1368.62175
[58] Banerjee, A.; Davé, R. N., Robust clustering, Wiley Interdiscip. Rev. Data Min. Knowl. Discov., 2, 29-59 (2011)
[59] Davé, R. N., Characterization and detection of noise in clustering, Pattern Recognit. Lett., 12, 657-664 (1991)
[60] Davé, R. N.; Sen, S., Robust fuzzy clustering of relational data, IEEE Trans. Fuzzy Syst., 10, 713-727 (2002)
[61] Ferraro, M. B.; Giordani, P., A toolbox for fuzzy clustering using the R programming language, Fuzzy Sets Syst., 279, 1-16 (2015)
[62] Ferraro, M. B.; Giordani, P.; Serafini, A., fclust: an R package for fuzzy clustering, R J., 9 (2019)
[63] Dua, D.; Graff, C., UCI Machine Learning Repository (2019), University of California, School of Information and Computer Science: University of California, School of Information and Computer Science Irvine
[64] Hubert, L.; Arabie, P., Comparing partitions, J. Classif., 2, 193-218 (1985)
[65] (2017), (in Italian)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.