Estimation and selection in regression clustering.

*(English)*Zbl 1389.62102Summary: Regression clustering is an important model-based clustering tool having applications in a variety of disciplines. It discovers and reconstructs the hidden structure for a data set which is a random sample from a population comprising a fixed, but unknown, number of sub-populations, each of which is characterized by a class-specific regression hyperplane. An essential objective, as well as a preliminary step, in most clustering techniques including regression clustering, is to determine the underlying number of clusters in the data. In this paper, we briefly review regression clustering methods and discuss how to determine the underlying number of clusters by using model selection techniques, in particular, the information-based technique. A computing algorithm is developed for estimating the number of clusters and other parameters in regression clustering. Simulation studies are also provided to show the performance of the algorithm.

##### MSC:

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |

68T10 | Pattern recognition, speech recognition |

91C20 | Clustering in the social and behavioral sciences |

PDF
BibTeX
XML
Cite

\textit{G. Qian} and \textit{Y. Wu}, Eur. J. Pure Appl. Math. 4, No. 4, 455--466 (2011; Zbl 1389.62102)

Full Text:
Link

##### References:

[1] | C Hennig. Identifiability of models for clusterwise linear regression. Journal of Classifi- cation, 17:273–296, 2000. · Zbl 1017.62058 |

[2] | C Rao and Y Wu and Q Shao. An M-Estimation-Based Procedure for Determining the Number of Regression Models in Regression Clustering. Journal of Applied Mathematics and Decision Sciences, 2007, 2007. · Zbl 05304406 |

[3] | D Pollard. Strong consistency of k-means clustering. The Annals of Statistics, 9:135–140, 1981. · Zbl 0451.62048 |

[4] | H Bock. The equivalence of two extremal problems and its application to the iterative classification of multivariate data. Manuscript for the medizinische statistik conference, Forschungsinstitut Oberworfachl, 1969. |

[5] | H Bock. Probability models and hypotheses testing in partitioning cluster analysis. In P Arabie and L Hubert and G De Soete, editor, Clustering and Classification., pages 377– 453, River Edge, New Jersey., 1996. World Scientific Publishing. · Zbl 1031.62504 |

[6] | H Späth. Clusterwise linear regression. Computing, 22:367–373, 1979. · Zbl 0387.65028 |

[7] | H Späth. Algorithm 48: A fast algorithm for clusterwise linear regression. Computing, 29:175–181, 1982. · Zbl 0485.65030 |

[8] | J Hartigan. Consistency of single linkage for high-density clusters. Journal of the Amer- ican Statistical Association, 76:388–394, 1981. · Zbl 0468.62053 |

[9] | J Hartigan and M Wong. Algorithm as 136: A k-means clustering algorithm. Applied Statistics, 28:100–108, 1978. · Zbl 0447.62062 |

[10] | J MacQueen. Some methods for classification and analysis of multivariate observations. In N Le Cam and J Neyman, editors, Proceedings of the 5th Berkeley Symposium on Math- ematical Statistics and Probability., volume 1, pages 281–297. University of California Press., 1967. REFERENCES466 · Zbl 0214.46201 |

[11] | L Kaufman and P Rousseeuw. Finding Groups in Data. Wiley-Interscience, New York, 1990. · Zbl 1345.62009 |

[12] | M Wong. A hybrid clustering method for identifying high-density clusters. Journal of the American Statistical Association, 77:841–847, 1982. · Zbl 0507.62061 |

[13] | P Rousseeuw and A Leroy. Robust Regression and Outlier Detection. Wiley, New York, 1987. · Zbl 0711.62030 |

[14] | Q Shao and Y Wu. A consistent procedure for determining the number of clusters in regression clustering. Journal of Statistical Planning and Inference, 135:461–476, 2005. · Zbl 1074.62042 |

[15] | R Quandt and J Ramsey. Estimating mixtures of normal distributions and switching regressions. Journal of the American statistical Association., 73:730–752, 1978. · Zbl 0401.62024 |

[16] | W DeSarbo and W Cron. A maximum likelihood methodology for clusterwise linear regression. Journal of Classification, 5:249–282, 1988. · Zbl 0692.62052 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.