## Factorial $$k$$-means analysis for two-way data.(English)Zbl 1051.62056

A discrete clustering model together with a continuous factorial one are fitted simultaneously to two-way data, with the aim of identifying the best partition of the objects, described by the best orthogonal linear combinations of the variables (factors) according to the least-squares criterion. This methodology, named for its features factorial $$k$$-means analysis, has a very wide range of applications since it fulfills a double objective: data reduction and synthesis, simultaneously in the direction of objects and variables; variable selection in cluster analysis, identifying variables that most contribute to determine the classification of the objects.
The least-squares fitting problem proposed here is mathematically formalized as a quadratic constrained minimization problem with mixed variables. An iterative alternating least-squares algorithm based on two main steps is proposed to solve the quadratic constrained problem. Starting from the cluster centroids, the subspace projection is found that leads to the smallest distances between object points and centroids. Updating the centroids, the partition is detected assigning objects to the closest centroids. At each step the algorithm decreases the least-squares criterion, thus converging to an optimal solution. Two data sets are analyzed to show the features of the factorial $$k$$-means model. The proposed technique has a fast algorithm that allows researchers to use it also with large data sets.

### MSC:

 62H30 Classification and discrimination; cluster analysis (statistical aspects) 62H25 Factor analysis and principal components; correspondence analysis
Full Text:

### References:

  Arabie, P., Hubert, L., 1994. Cluster analysis in marketing research. In: Bagozzi, R.P. (Ed.), Handbook of marketing research. Blackwell, Oxford. · Zbl 0825.92156  Ball, G. H.; Hall, D. J.: A clustering technique for summarizing multivariate data. Behav. sci. 12, 153-155 (1967)  Desarbo, W. S.; Jedidi, K.; Cool, K.; Schendel, D.: Simultaneous multidimensional unfolding and cluster analysis: an investigation of strategic groups. Marketing lett. 2, 129-146 (1990)  Desarbo, W. S.; Howard, D. J.; Jedidi, K.: MULTICLUS: a new method for simultaneous performing multidimensional scaling and clustering. Psychometrika 56, 121-136 (1991) · Zbl 0727.62107  De Soete, G.; Carroll, J. D.: K-means clustering in a low-dimensional Euclidean space.. New approaches in classification and data analysis., 212-219 (1994)  De Soete, G.; Heiser, W. J.: A latent class unfolding model for analyzing single stimulus preference ratings. Psychometrika 58, 545-565 (1993) · Zbl 0826.62098  Diday, E.: Optimisation en classification automatique. Inria 1. (1979) · Zbl 0471.62056  Gordon, A. D.: Classification. (1999) · Zbl 0929.62068  Green, B.F., Gower, J.C., 1979. A problem with congruence. Paper presented at the Annual Meeting of Psychometric Society. Monterey, USA.  Heiser, W. J.: Clustering in low-dimensional space.. Information and classification., 162-173 (1993)  Kaiser, H. F.: The varimax criterion for analytic rotation in factor analysis. Psychometrika 23, 187-200 (1958) · Zbl 0095.33603  Kiers, H. A. L.: Discrimination by means of components that are orthogonal in the data space. J. chemometrics 11, 533-545 (1997)  Macqueen, J.: Some methods for classification and analysis of multivariate observations.. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, statistics. 1, 281-297 (1967) · Zbl 0214.46201  Milligan, G. W.; Cooper, M.: An extimation of procedures for determining the number of clusters in a data set. Psychometrika 50, 159-179 (1985)  Milligan, G. W.; Cooper, M. C.: A study of standardization of variables in cluster analysis. J. classification 5, 181-204 (1988)  Berge, J. M. F. Ten: Least squares optimization.. (1993)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.