A new approach to clustering. (English) Zbl 0192.57101

A general formulation of data reduction and clustering processes is proposed. These procedures are regarded as mappings or transformations of the original space onto a “representation” or “code” space subjected to some constraints. The constraints that the grouping transformation satisfies are stated in terms of properties that pair of points in the original space should preserve after the mapping. Relaxing of conditions through minimization of quadratic functionals is introduced as a possible mean to obtain approximate groupings when the constraints are so restrictive that no solution exists. Current clustering techniques, as well as factor analysis and related techniques, the information theoretical correlation method of Watanabe and the information measure method of Wallace and Boulton are specified within the framework of this formulation. Watanabe’s method provides an example on how internal structure of a class determines the grouping. Wallace and Boulton’s method provides an example of a nontrivial code space.
Finally, a new approach is presented. It is based on the idea of fuzzy sets, introduced by Zadeh in 1965. The representation space is the set of \(N\)-tuples with positive elements that add up to 1. The number of clusters \(N\) is assumed fixed and known beforehand. While capable of conventional classifications, fuzzy clustering allows representation of “bridges” and stray points that are classified as undetermined points with a degree of indeterminacy proportional to their similarity to “core” points. The clustering process is shown to be equivalent to the decomposition of the sample density function \(P(x)\) into the weighted sum of the component cluster densities P(x/S_j)\) with weights \(P(S_j)\). Consideration of the degrees of belongingness as probabilities is useful for pattern recognition purposes after the fuzzy sets have been established. As one of the possible ways to insure that the density functions \(P(x/S_j)\) really represent clusters, optimality conditions are imposed. Three such conditions are presented. The first two deal with average properties of the clusters. They try, for example, to minimize the average mean density of a point with respect to the cluster to which it most likely belongs. The third formula is based on a function mapping the distance structure of the original space into a suitable distance structure defined on the representation space. Due to the sophistication of the numerical procedures used to optimize such functionals, its description and application are the object of a subsequent paper. Some thoughts on the problem of the number of clusters follow.
Reviewer: Enrique H. Ruspini


62H30 Classification and discrimination; cluster analysis (statistical aspects)
68P05 Data structures
68T10 Pattern recognition, speech recognition
Full Text: DOI