×

A survey on feature weighting based K-means algorithms. (English) Zbl 1349.62291

Summary: In a real-world data set, there is always the possibility, rather high in our opinion, that different features may have different degrees of relevance. Most machine learning algorithms deal with this fact by either selecting or deselecting features in the data preprocessing phase. However, we maintain that even among relevant features there may be different degrees of relevance, and this should be taken into account during the clustering process.
With over 50 years of history, K-means is arguably the most popular partitional clustering algorithm there is. The first K-means based clustering algorithm to compute feature weights was designed just over 30 years ago. Various such algorithms have been designed since but there has not been, to our knowledge, a survey integrating empirical evidence of cluster recovery ability, common flaws, and possible directions for future research. This paper elaborates on the concept of feature weighting and addresses these issues by critically analyzing some of the most popular, or innovative, feature weighting mechanisms based in K-means.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T05 Learning and adaptive systems in artificial intelligence
68T10 Pattern recognition, speech recognition
91C20 Clustering in the social and behavioral sciences

Software:

Vlfeat; OVWTRE; UCI-ml
PDFBibTeX XMLCite
Full Text: DOI arXiv Link

References:

[1] ALOISE, D., DESHPANDE, A., HANSEN, P., and POPAT, P. (2009), “NP-Hardness of Euclidean Sum-of-Squares Clustering”, Machine Learning, 75(2), 245-248. · Zbl 1378.68047 · doi:10.1007/s10994-009-5103-0
[2] ARBELAITZ, O., GURRUTXAGA, I., MUGUERZA, J., PÉREZ, J.M., and PERONA, I. (2013), “An Extensive Comparative Study of Cluster Validity Indices”, Pattern Recognition, 46(1), 243-256.
[3] BALL, G.H., and HALL, D.J. (1967), “A Clustering Technique for Summarizing Multivariate Data”, Behavioral Science, 12(2), 153-155. · doi:10.1002/bs.3830120210
[4] BELLMAN, R. (1957), Dynamic Programming, Princeton, NJ: Princeton University Press. · Zbl 0077.13605
[5] BEYER, K., GOLDSTEIN, J., RAMAKRISHNAN, R., and SHAFT, U. (1999), “When is Nearest Neighbor Meaningful?”, in Proceedings of the 7th International Conference on Database Theory, Vol. 1540, Springer, pp. 217-235.
[6] BEZDEK, J.C. (1981), Pattern Recognition with Fuzzy Objective Function Algorithms, Norwell, MA: Kluwer Academic Publishers. · Zbl 0503.68069 · doi:10.1007/978-1-4757-0450-1
[7] BLUM, A.L., and RIVEST, R.L. (1992), “Training a 3-Node Neural Network Is NP-Complete”, Neural Networks, 5(1), 117-127. · doi:10.1016/S0893-6080(05)80010-3
[8] CHAN, E.Y., CHING,W.K., NG, M.K., and HUANG, J.Z. (2004), “An Optimization Algorithm for Clustering Using Weighted Dissimilarity Measures”, Pattern Recognition, 37(5), 943-952. · Zbl 1072.68549
[9] CHATZIS, S.P. (2011), “A Fuzzy C-Means-Type Algorithm for Clustering of Data with Mixed Numeric and Categorical Attributes Employing a Probabilistic Dissimilarity Functional”, Expert Systems with Applications, 38(7), 8684-8689. · doi:10.1016/j.eswa.2011.01.074
[10] CHEN, X., YE, Y., XU, X., and HUANG, J.Z. (2012),“A Feature Group Weighting Method for Subspace Clustering of High-Dimensional Data”, Pattern Recognition, 45(1), 434-446. · Zbl 1225.68162
[11] DE AMORIM, R.C., and HENNIG, C. (2015), “Recovering the Number of Clusters in Data Sets with Noise Features Using Feature Rescaling Factors”, Information Sciences, 324, 126-145. · doi:10.1016/j.ins.2015.06.039
[12] DE AMORIM, R.C., and MAKARENKOV, V. (to appear), “Applying Subclustering and Lp Distance in Weighted K-Means with Distributed Centroid”, Neurocomputing, doi:10.1016/j.neucom.2015.08.018.
[13] DE AMORIM, R.C., and MIRKIN, B. (2012), “Minkowski Metric, Feature Weighting and Anomalous Cluster Initializing in K-Means Clustering”, Pattern Recognition, 45(3), 1061-1075. · doi:10.1016/j.patcog.2011.08.012
[14] DE AMORIM, R.C., and MIRKIN, B. (2014), “Selecting the Minkowski Exponent for Intelligent K-Means with Feature Weighting”, in Clusters, Orders, Trees: Methods and Applications, Optimization and Its Applications, eds, F. Aleskerov, B. Goldengorin, and P. Pardalos, Berlin: Springer, pp. 103-117. · Zbl 1365.62257
[15] DE SOETE, G. (1988), “OVWTRE: A Program for Optimal Variable Weighting for Ultrametric and Additive Tree Fitting”, Journal of Classification, 5(1), 101-104. · doi:10.1007/BF01901677
[16] DE SOETE, G. (1986), “Optimal Variable Weighting for Ultrametric and Additive Tree Clustering”, Quality and Quantity, 20(2-3), 169-180. · doi:10.1007/BF00227423
[17] DEMPSTER, A.P., LAIRD, N.M., and RUBIN, D.B. (1977), “Maximum Likelihood from Incomplete Data via the EM Algorithm”, Journal of the Royal Statistical Society, Series B, 39(1), 1-38. · Zbl 0364.62022
[18] DESARBO, W.S., and CRON, W.L. (1988), “A Maximum Likelihood Methodology for Clusterwise Linear Regression”, Journal of classification, 5(2), 249-282. · Zbl 0692.62052 · doi:10.1007/BF01897167
[19] DESARBO, W.S., and MAHAJAN, V. (1984), “Constrained Classification: The Use of A Priori Information in Cluster Analysis”, Psychometrika, 49(2), 187-215. · Zbl 0554.62050 · doi:10.1007/BF02294172
[20] DESARBO,W.S., CARROLL, J.D., CLARK, L.A., and GREEN, P.E. (1984), “Synthesized Clustering: A Method for Amalgamating Alternative Clustering Bases with Differential Weighting of Variables, <Emphasis Type=”Italic”>Psychometrika, 49(1), 57-78. · Zbl 0594.62067 · doi:10.1007/BF02294206
[21] DEVANEY, M., and RAM, A. (1997), “Efficient Feature Selection in Conceptual Clustering”, in Proceedings of the 14th ACMInternational Conference in Machine Learning, Nashville, TN, pp. 92-97. · Zbl 0825.62540
[22] DING, C., and HE, X. (2004), “K-means Clustering via Principal Component Analysis”, in Proceedings of the Twenty-First ACM International Conference on Machine Learning, pp. 29.
[23] DOMENICONI, C., GUNOPULOS, D., MA, S., YAN, B., AL-RAZGAN, M., and PAPADOPOULOS, D. (2007), “Locally Adaptive Metrics for Clustering High Dimensional Data”, Data Mining and Knowledge Discovery, 14(1), 63-97.
[24] DRINEAS, P., FRIEZE, A., KANNAN, R., VEMPALA, S., and VINAY, V. (2004), “Clustering Large Graphs via the Singular Value Decomposition”, Machine Learning, 56(1-3), 9-33. · Zbl 1089.68090 · doi:10.1023/B:MACH.0000033113.59016.96
[25] DY, J.G. (2008), “Unsupervised Feature Selection”, in Data Mining & Knowledge Discovery, Computational Methods of Feature Selection, eds. H. Liu, and H. Motoda, Chapman & Hall/CRC, pp. 19-30. · Zbl 1089.68090
[26] GASCH, A.P., and EISEN,M.B. (2002), “Exploring the Conditional Coregulation of Yeast Gene Expression Through Fuzzy K-Means Clustering”, Genome Biology, 3(11), 1-22. · doi:10.1186/gb-2002-3-11-research0059
[27] GNANADESIKAN, R., KETTENRING, J.R., and TSAO, S.L. (1995), “Weighting and Selection of Variables for Cluster Analysis”, Journal of Classification, 12(1), 113-136. · Zbl 0825.62540 · doi:10.1007/BF01202271
[28] GODER, A., and FILKOV, V. (2008), “Consensus Clustering Algorithms: Comparison and Refinement”, in Proceedings of the 10th ALENEX SIAM, Vol. 8, pp. 109-117.
[29] GREEN, P.E., KIM, J., and CARMONE, F.J. (1990), “A Preliminary Study of Optimal Variable Weighting in K-Means Clustering”, Journal of Classification, 7(2), 271-285. · doi:10.1007/BF01908720
[30] GUYON, I., and ELISSEEFF, A. (2003), “An Introduction to Variable and Feature Selection”, The Journal of Machine Learning Research, 3, 1157-1182. · Zbl 1102.68556
[31] HUANG, J.Z., NG, M.K., RONG, H., and LI, Z. (2005), “Automated Variable Weighting in K-Means Type Clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5), 657-668.
[32] HUANG, J.Z., XU, J., NG, M., and YE, Y. (2008), “Weighting Method for Feature Selection in K-Means, in <Emphasis Type=”Italic”>Computational Methods of Feature Selection, Data Mining & Knowledge Discovery, eds. H. Liu and H. Motoda, Chapman & Hall/CRC, pp. 193-209. · Zbl 1089.68090
[33] HUANG, Z. (1998), “Extensions to the K-Means Algorithm for Clustering Large Data Sets with Categorical Values”, Data Mining and Knowledge Discovery, 2(3), 283-304. · doi:10.1023/A:1009769707641
[34] HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions”, Journal of classification, 2(1), 193-218. · Zbl 0587.62128 · doi:10.1007/BF01908075
[35] JAIN, A.K. (2010), “Data Clustering: 50 years Beyond K-Means”, Pattern Recognition Letters, 31(8), 651-666. · doi:10.1016/j.patrec.2009.09.011
[36] JI, J., PANG,W., ZHOU, C., HAN, X., and WANG, Z. (2012),“A Fuzzy K-Prototype Clustering Algorithm for Mixed Numeric and Categorical Data”, Knowledge-Based Systems, 30, 129-135. · Zbl 0692.62052
[37] JI, J., BAI, T., ZHOU, C., MA, C., and WANG, Z. (2013), “An Improved K-Prototypes Clustering Algorithm for Mixed Numeric and Categorical Data, <Emphasis Type=”Italic”>Neurocomputing, 120, 590-596.
[38] JING, L., NG, M.K., and HUANG, J.Z. (2007), “An Entropy Weighting K-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data”, IEEE Transactions on Knowledge and Data Engineering, 19(8), 1026-1041.
[39] KOHAVI, R., and JOHN, G.H. (1997), “Wrappers for Feature Subset Selection”, Artificial Intelligence, 97(1), 273-324. · Zbl 0904.68143 · doi:10.1016/S0004-3702(97)00043-X
[40] LI, C., and YU, J. (2006), “A Novel Fuzzy C-Means Clustering Algorithm”, in Proceedings of the First International Conference on Rough Sets and Knowledge Technology, Berlin, Heidelberg: Springer, pp. 510-515. · Zbl 1196.68196
[41] LICHMAN, M. (2013), “UCI Machine Learning Repository”, University of California, Irvine, School of Information and Computer Sciences, http://archive.ics.uci.edu/ml. · Zbl 0139.24606
[42] LIU, H., and YU, L. (2005), “Toward Integrating Feature Selection Algorithms for Classification and Clustering”, IEEE Transactions on Knowledge and Data Engineering, 17(4), 491-502.
[43] MACQUEEN, J. (1967), “Some Methods for Classification and Analysis of Multivariate Observations”, in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1(14), Berkley, CA: University of California Press, pp. 281-297. · Zbl 0214.46201
[44] MAKARENKOV, V., and LEGENDRE, P. (2001), “Optimal Variable Weighting for Ultrametric and Additive Trees and K-Means Partitioning: Methods and Software”, Journal of Classification, 18(2), 245-271. · Zbl 1040.91087
[45] MIRKIN, B. (2012), Clustering: A Data Recovery Approach, Boca Raton FL: Chapman and Hall/CRC. · Zbl 1297.68003
[46] MIRKIN, B. (1998), Mathematical Classification and Clustering: From How to What and Why, Dordrecht: Springer. · Zbl 0949.91030 · doi:10.1007/978-3-642-72087-1_20
[47] MITRA, P.,MURTHY, C.A., and PAL, S.K. (2002), “Unsupervised Feature Selection Using Feature Similarity”, IEEE transactions on Pattern Analysis and Machine Intelligence, 24(3), 301-312. · doi:10.1109/34.990133
[48] MODHA, D.S., and SPANGLER, W.S. (2003), “Feature Weighting in K-Means Clustering”, Machine Learning, 52(3), 217-237. · Zbl 1039.68111 · doi:10.1023/A:1024016609528
[49] MURTAGH, F. (1984), “Complexities of Hierarchic Clustering Algorithms: State of the Art”, Computational Statistics Quarterly, 1(2), 101-113. · Zbl 0614.62076
[50] MURTAGH, F., and CONTRERAS, P. (2011), “Methods of Hierarchical Clustering”, arXiv preprint arXiv:1105.0121. · Zbl 1404.68041
[51] NG, M.K., and WONG, J.C. (2002), “Clustering Categorical Data Sets Using Tabu Search Techniques”, Pattern Recognition, 35(12), 2783-2790. · Zbl 1010.68149
[52] POLAK, E. (1971), Computational Methods in Optimization: A Unified Approach, New York: Academic press. · Zbl 0301.90040
[53] SNEATH, P.H.A., and SOKAL, R.R. (1973), Numerical Taxonomy. The Principles and Practice of Numerical Classification, San Francisco, CA: W.H.Freeman & Co Ltd. · Zbl 0285.92001
[54] STEINHAUS, H. (1956), “Sur la Division des Corp Materiels en Parties”, Bulletin of the Polish Academy of Sciences, 1, 801-804. · Zbl 0079.16403
[55] STEINLEY, D. (2006), “K-Means Clustering: A Half-Century Synthesis”, British Journal of Mathematical and Statistical Psychology, 59(1), 1-34. · doi:10.1348/000711005X48266
[56] STEINLEY, D., and BRUSCO, M.J. (2008a), “Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures”, Psychometrika, 73(1), 125-144. · Zbl 1143.62327 · doi:10.1007/s11336-007-9019-y
[57] STEINLEY, D., and BRUSCO, M.J. (2008b), “A New Variable Weighting and Selection Procedure for K-Means Cluster Analysis”, Multivariate Behavioral Research, 43(1), 77-108. · doi:10.1080/00273170701836695
[58] STURN, A., QUACKENBUSH, J., and TRAJANOSKI, Z. (2002), “Genesis: Cluster Analysis of Microarray Data”, Bioinformatics, 18(1), 207-208. · doi:10.1093/bioinformatics/18.1.207
[59] TALAVERA, L. (1999), “Feature Selection as a Preprocessing Step for Hierarchical Clustering”, in Proceedings of the 16th International Conference in Machine Learning, Bled, Slovenia, pp. 389-397. · Zbl 1040.91087
[60] TSAI, C.Y., and CHIU, C.C. (2008), “Developing a FeatureWeight Self-AdjustmentMechanism for a K-Means Clustering Algorithm”, Computational Statistics & Data Analysis, 52(10), 4658-4672. · Zbl 1452.62471 · doi:10.1016/j.csda.2008.03.002
[61] VEDALDI, A., and FULKERSON, B. (2010), “VLFeat: An Open and Portable Library of Computer Vision Algorithms”, in Proceedings of the ACM International Conference on Multimedia, 1469-1472.
[62] WETTSCHERECK, D., AHA, D.W., and MOHRI, T. (1997), “A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms”, Artificial Intelligence Review, 11(1-5), 273-314. · doi:10.1023/A:1006593614256
[63] ZADEH, L.A. (1965), “Fuzzy Sets”, Information and Control, 8(3), 338-353. · Zbl 0139.24606 · doi:10.1016/S0019-9958(65)90241-X
[64] ZHA, H., HE, X., DING, C., GU, M., and SIMON, H.D. (2001), “Spectral Relaxation for K-Means Clustering”, Advances in Neural Information Processing Systems, 1057-1064.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.