Cluster dissection and analysis. Theory, FORTRAN programs, examples. Transl. from the German by Johannes Goldschmidt. (English) Zbl 0584.62094

Ellis Horwood Series in Computers and Their Applications. Chichester: Ellis Horwood Limited; Chichester etc.: Halsted Press, a Division of John Wiley & Sons. 226 p. £25.00 (1985).
[For a review of the original 1983 German edition see Zbl 0536.62048.]
That this book on clustering techniques is the result of a long personal reflection and teaching experience of the author, can not be questioned. Indeed, it presents a well-balanced approach to the subject starting with a theoretical part, which is implemented by a series of Fortran programs (part 2) and ends with many well chosen examples (part 3).
This is at first sight a very large scope for a book of only 226 pages. Fortunately, the considered set of clustering techniques does not embrace the complete range of known methods. On the contrary, and unlike what one could expect reading the very general title, the author has limited himself to the rather small class of so-called optimization techniques, seeking to optimize an objective function involving a direct partition of the objects in a fixed number of clusters.
How deeply we may regret that nothing is to be found about hierarchical methods nor about fuzzy membership functions, we must however, recognize that, in choosing to concentrate on the objective optimization technique, the author was able to collect and present a coherent set of theorems and lemmas providing a strong basis for the different algorithms that are developed.
The theoretical developments, mostly based on matrix algebra, are most powerful, efficient and compact, with only a few dark spots. A general drawback is however the absence of a deeper unifying principle behind the different methods proposed. However this first part will undoubtfully be of major interest, not only for the pure mathematician but also for the practitioner seeking to discover what is the best method for his problem.
The implementation of the algorithms, directly deduced from the problem solving methods of the first part, takes the form of some 30 pages of Fortran listings, sandwiched between a few pages of explanations and comments. The different programs are subdivided in a series of modular subroutines, allowing for an arrangement of a program ”à la carte” according to one’s specific needs. However, some reservations have to be made about the presentation of these subroutines, limiting somewhat their applicability.
Not only is the lettertype rather small and badly readable, but the absence of any comments in the listings, neither about the operations performed nor about the definitions of the variables makes these listings hard to read. This shortcoming is not entirely compensated by the explanations in the text - certainly not for someone going through a subroutine and having punctual questions. Yet other queries arise about the storage requirements, about the sorts of errors occurring or about the way some constants must be adapted: these remain unanswered.
As for the sample programs, the examples of the third part of the book, they certainly look appealing from their 2-dimensional graphic representation; however, they are even less readable than the program listings of the 2nd part. Not only do the numerous printout pages with tables of figures miss the most elementary titles and legends, but even the text is parsimonious with its interpretation.
My conclusion would be that the author presents - on a rather limited part of cluster analysis - in the first part of his book, a very interesting set of theoretical developments. The program listings and examples displayed in the remaining of the book, will convince any critical reader that the proposed algorithms work all right and produce good results. For actual implementation of these programs, it would however be advisable to contact the author directly.
Reviewer: E.Trauwaert


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62-04 Software, source code, etc. for problems pertaining to statistics
62-02 Research exposition (monographs, survey articles) pertaining to statistics
65C99 Probabilistic methods, stochastic differential equations
68T10 Pattern recognition, speech recognition
62-07 Data analysis (statistics) (MSC2010)


Zbl 0536.62048