## Partition of interval-valued observations using regression.(English)Zbl 07512353

Summary: Both regression modeling and clustering methodologies have been extensively studied as separate techniques. There has been some activity in using regression-based algorithms to partition a data set into clusters for classical data; we propose one such algorithm to cluster interval-valued data. The new algorithm is based on the $$k$$-means algorithm of J. MacQueen [in: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability. Vol. 1. Berkeley, CA: University of California Press. 281–297 (1967; Zbl 0214.46201)] and the dynamical partitioning method of E. Diday and J. C. Simon [in: Digital pattern recognition. Berlin, Heidelberg, New York: Springer. 47–94 (1976; Zbl 0331.62043)], with the partitioning criteria being based on establishing regression models for each sub-cluster. This also depends on distance measures between the underlying regression models for each sub-cluster. Several types of simulated data sets are generated for several different data structures. The proposed $$k$$-regressions algorithm consistently out-performs the $$k$$-means algorithm. Elbow plots are used to identify the total number of clusters $$K$$ in the partition. The new method is also applied to real data.

### MSC:

 62H30 Classification and discrimination; cluster analysis (statistical aspects)

### Citations:

Zbl 0214.46201; Zbl 0331.62043

Algorithm 39
Full Text: