##
**A modified \(k\)-means clustering procedure for obtaining a cardinality-constrained centroid matrix.**
*(English)*
Zbl 07223613

Summary: \(k\)-means clustering is a well-known procedure for classifying multivariate observations. The resulting centroid matrix of clusters by variables is noted for interpreting which variables characterize clusters. However, between-clusters differences are not always clearly captured in the centroid matrix. We address this problem by proposing a new procedure for obtaining a centroid matrix, so that it has a number of exactly zero elements. This allows easy interpretation of the matrix, as we may focus on only the nonzero centroids. The development of an iterative algorithm for the constrained minimization is described. A cardinality selection procedure for identifying the optimal cardinality is presented, as well as a modified version of the proposed procedure, in which some restrictions are imposed on the positions of nonzero elements. The behaviors of our proposed procedure were evaluated in simulation studies and are illustrated with three real data examples, which demonstrate that the performances of the procedure is promising.

### MSC:

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |

PDF
BibTeX
XML
Cite

\textit{N. Yamashita} and \textit{K. Adachi}, J. Classif. 37, No. 2, 509--525 (2020; Zbl 07223613)

Full Text:
DOI

### References:

[1] | Adachi, K., Joint Procrustes analysis for simultaneous nonsingular transformation of component score and loading matrices, Psychometrika, 74, 667-683 (2009) · Zbl 1179.62084 |

[2] | Adachi, K.; Trendafilov, NT, Sparse principal component analysis subject to prespecified cardinality of loadings, Computational Statistics, 31, 1-25 (2015) |

[3] | Adachi, K., Multivariate data analysis (2006), Tokyo: Nakanishiya Shuppan, Tokyo |

[4] | Adachi, K.; Trendafilov, NT, Sparsest factor analysis for clustering variables: a matrix decomposition approach, Advances in Data Analysis and Classification, 25, 1-29 (2017) |

[5] | Aggarwal, CC; Reddy, CK, Data clustering: algorithms and applications (2013), Boca Raton: CRC Press, Boca Raton |

[6] | Alsius, A.; Wayne, RV; Paré, M.; Munhall, KG, High visual resolution matters in audiovisual speech perception, but only for some, Attention, Perception, & Psychophysics, 78, 1472-1487 (2016) |

[7] | Basu, S., Davidson, I., & Wagstaff, K. (Eds.). (2008). Constrained clustering: advances in algorithms, theory, and applications. Boca Raton: CRC Press. |

[8] | Bock, HH, Probabilistic models in cluster analysis, Computational Statistics & Data Analysis, 23, 5-28 (1996) · Zbl 0900.62324 |

[9] | Browne, MW, An overview of analytic rotation in exploratory factor analysis, Multivariate Behavioral Research, 36, 111-150 (2001) |

[10] | Brusco, MJ; Cradit, JD, A variable-selection heuristic for K-means clustering, Psychometrika, 66, 249-270 (2001) · Zbl 1293.62237 |

[11] | Cortina, LM; Wasti, SA, Profiles in coping: responses to sexual harassment across persons, organizations, and cultures, Journal of Applied Psychology, 90, 182-192 (2005) |

[12] | Dalton, C.; Jennings, E.; O’dwyer, B.; Taylor, D., Integrating observed, inferred and simulated data to illuminate environmental change: a limnological case study, Biology and Environment: Proceedings of the Royal Irish Academy, 116, 279-294 (2016) |

[13] | DeSarbo, WS; Mahajan, V., Constrained classification: the use of a priori information in cluster analysis, Psychometrika, 49, 187-215 (1984) · Zbl 0554.62050 |

[14] | Fisher, RA, The use of multiple measurements in taxonomic problems, Annals of Human Genetics, 7, 179-188 (1936) |

[15] | Fowlkes, EB; Mallows, CL, A method for comparing two hierarchical clusterings, Journal of the American Statistical Association, 78, 553-569 (1983) · Zbl 0545.62042 |

[16] | Gordon, AD, 359. Note: Classification in the presence of constraints, Biometrics, 29, 821-827 (1973) |

[17] | Harman, HH, Modern factor analysis (1976), Chicago: University of Chicago Press, Chicago |

[18] | Hendrickson, AE; White, PO, PROMAX: a quick method for rotation to oblique simple structure, British Journal of Mathematical and Statistical Psychology, 17, 65-70 (1964) |

[19] | Hubert, L.; Arabie, P., Comparing partitions, Journal of Classification, 2, 193-218 (1985) |

[20] | Hyland, JJ; Jones, DL; Parkhill, KA; Barnes, AP; Williams, AP, Farmers’ perceptions of climate change: identifying types, Agriculture and Human Values, 33, 323-339 (2016) |

[21] | Jetti, SK; Vendrell-Llopis, N.; Yaksi, E., Spontaneous activity governs olfactory representations in spatially organized habenular microcircuits, Current Biology, 24, 434-439 (2014) |

[22] | Kaiser, HF, An index of factorial simplicity, Psychometrika, 39, 31-36 (1974) · Zbl 0295.92017 |

[23] | Kuerbis, A.; Armeli, S.; Muench, F.; Morgenstern, J., Profiles of confidence and commitment to change as predictors of moderated drinking: a person-centered approach., Psychology of Addictive Behaviors, 28, 1065-1076 (2014) |

[24] | MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, vol. 1, pp. 281-297. · Zbl 0214.46201 |

[25] | Miyamoto, S.; Ichihashi, H.; Honda, K., Algorithms for fuzzy clustering (2008), Berlin: Springer, Berlin |

[26] | Peng, X.; Zhou, C.; Hepburn, DM, Application of K-Means method to pattern recognition in on-line cable partial discharge monitoring, IEEE Transactions on Dielectrics and Electrical Insulation, 20, 754-761 (2013) |

[27] | Rand, WM, Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association, 66, 846-850 (1971) |

[28] | Satomura, H.; Adachi, K., Oblique rotation in canonical correlation analysis reformulated as maximizing the generalized coefficient of determination, Psychometrika, 78, 526-573 (2013) · Zbl 1285.62135 |

[29] | Schloss, KB; Hawthorne-Madell, D.; Palmer, SE, Ecological influences on individual differences in color preference, Attention, Perception, & Psychophysics, 77, 2803-2816 (2015) |

[30] | Slobodenyuk, N.; Jraissati, Y.; Kanso, A.; Ghanem, L.; Elhajj, I., Cross-modal associations between color and haptics, Attention, Perception, & Psychophysics, 77, 1379-1395 (2015) |

[31] | Steinley, D., K-means clustering: a half-century synthesis, British Journal of Mathematical and Statistical Psychology, 59, 1-34 (2006) |

[32] | Steinley, D.; Brusco, MJ, Selection of variables in cluster analysis: an empirical comparison of eight procedures, Psychometrika, 73, 125-144 (2008) · Zbl 1143.62327 |

[33] | Steinley, D.; Brusco, MJ; Hubert, L., The variance of the adjusted Rand index, Psychological Methods, 21, 261-272 (2016) |

[34] | Steinley, D.; Hubert, L., Order-constrained solutions in K-means clustering: even better than being globally optimal, Psychometrika, 73, 647-664 (2008) · Zbl 1284.62749 |

[35] | Thurstone, LL, Multiple-factor analysis (1947), Chicago: University of Chicago Press, Chicago · Zbl 0029.22203 |

[36] | Ullman, JB, Structural equation modeling: reviewing the basics and moving forward, Journal of Personality Assessment, 87, 33-50 (2006) |

[37] | Yamashita, N., Canonical correlation analysis formulated as maximizing sum of squared correlations and rotation of structure matrices, The Japanese Journal of Behaviormetrics, 39, 1-9 (2012) · Zbl 1266.62039 |

[38] | Yamashita, N.; Mayekawa, S., A new biplot procedure with joint classification of objects and variables by fuzzy c-means clustering, Advances in Data Analysis and Classification, 9, 243—266 (2015) · Zbl 1414.62022 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.