## Systems of sets such that each set properly intersects at most one other set – application to cluster analysis.(English)Zbl 1295.05247

Summary: A set $$X$$ is said to properly intersect a set $$Y$$ if none of the sets $$X\cap Y$$, $$X\setminus Y$$ and $$Y\setminus X$$ is empty. In this paper, we consider collections of subsets such that each member of the collection properly intersects at most one other member. Such collections are hereafter called paired hierarchical collections. The two following combinatorial properties are investigated. First, any paired hierarchical collection is a set of intervals of at least one linear order defined on the ground set. Next, the maximum size of a paired hierarchical collection defined on an $$n$$-element set is $$\lfloor \frac{5}{2}(n-1)\rfloor$$. The properties of these collections are also investigated from the cluster analysis point of view. In the framework of the general bijection defined by A. Batbedat [Metron 46, No. 1–4, 47–59 (1988; Zbl 0724.05055)] and the author [Eur. J. Comb. 21, No. 6, 727–743 (2000; Zbl 0957.05103)], we characterize the dissimilarities that are induced by weakly indexed paired hierarchical collections. Finally, we propose a proof of the so-called agglomerative paired hierarchical clustering (APHC) algorithm that extends the well-known AHC algorithm in order to allow that some clusters can be merged twice. An implementation and some illustrations of this algorithm and of a variant of it were presented by S. Chelcea et al. [in: Innovations in classification, data science, and information systems. Proceedings of the 27th annual conference of the Gesellschaft für Klassifikation e.V., Cottbus, Germany, March 12–14, 2003. Berlin: Springer. 3–10 (2005; Zbl 1296.62129); “Un nouvel algorithme de classification ascendante 2–3 hiérarchique”, in: Reconnaissance des formes et intelligence artificielle (RFIA 2004), Vol. 3, Toulouse, France, 2004. 1471–1480 (2004), http://www.laas.fr/rfia2004/actes/ARTICLES/388.pdf].

### MSC:

 05D05 Extremal set theory 62H30 Classification and discrimination; cluster analysis (statistical aspects)

Zbl 0724.05055; Zbl 0957.05103; Zbl 1296.62129
