×

A weighted least-squares approach to clusterwise regression. (English) Zbl 1477.62163

Summary: Clusterwise regression aims to cluster data sets where the clusters are characterized by their specific regression coefficients in a linear regression model. In this paper, we propose a method for determining a partition which uses an idea of robust regression. We start with some random weighting to determine a start partition and continue in the spirit of M-estimators. The residuals for all regressions are used to assign the observations to the different groups. As target function we use the determination coefficient \(R^2_w\) for the overall model. This coefficient is suitably defined for weighted regression.
Target functions for the clusterwise regression problem may have a large number of local optima that cannot be handled with optimization methods based on derivatives. The approach commonly employed to overcome this problem is to start several times from random partitions and then to improve the resulting partition. Because our procedure is very fast it can be used with many random starts. Eventually, the solution with the highest determination coefficient \(R^2_w\) for the overall model is chosen. The performance of the method is investigated with the help of Monte Carlo simulations. It is also compared to the finite-mixture approach to clusterwise regression. A sequence of bootstrap tests is proposed to determine the number of clusters.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62J05 Linear regression; mixed models
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Baier, D., A Constrained clusterwise regression procedure for benefit segmentation, No. 11, 676-683 (1997)
[2] Cohen, E., Some effects of inharmonic partials on interval perception, No. 1, 323-349 (1984)
[3] Cox, D. R., Test of separate families of hypotheses, No. 1, 105-123 (1961) · Zbl 0201.52102
[4] Cox, D.R.: Further results on tests of separate families of hypotheses. J. R. Stat. Soc. 24, 406-24 (1962) · Zbl 0131.35801
[5] Davidson, R., MacKinnon, J.G.: Econometric Theory and Methods. Oxford University Press, New York (2004)
[6] Davison, A.C., Hinkley, D.V.: Bootstrap Methods and Their Application. Cambridge University Press, Cambridge (1997) · Zbl 0886.62001
[7] DeSarbo, W.S., Cron, W.L.: A maximum likelihood methodology for clusterwise linear regression. J. Classif. 5, 249-282 (1988) · Zbl 0692.62052 · doi:10.1007/BF01897167
[8] Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, Berlin (2006) · Zbl 1108.62002
[9] Furrer, R., Nychka, D., Sain, S.: (2010) Fields: Tools for spatial data; R package version 6.3. http://cran.r-project.org/web/packages/fields
[10] Gruen, B., Leisch, F.: Fitting finite mixtures of generalized linear regressions in R. Comput. Stat. Data Anal. 51(11), 5247-5252 (2007) · Zbl 1445.62192 · doi:10.1016/j.csda.2006.08.014
[11] Gruen, B., Leisch, F.: FlexMix Version 2: finite mixtures with concomitant variables and varying and constant parameters. J. Stat. Softw. 28(4), 1-35 (2008). http://www.jstatsoft.org/v28/i04/
[12] Hennig, C.: Fixed point clusters for linear regression: computation and comparison. J. Classif. 19, 249-276 (2002) · Zbl 1017.62057 · doi:10.1007/s00357-001-0045-7
[13] Hennig, C.: Clusters, outliers, and regression: fixed point clusters. J. Multivar. Anal. 86, 183-212 (2003) · Zbl 1020.62051 · doi:10.1016/S0047-259X(02)00020-9
[14] Hennig, C.: fpc: Fixed point clusters, clusterwise regression and discriminant plots. R package version 1.2-7. http://CRAN.R-project.org/package=fpc (2009)
[15] Hurn, M., Justel, A., Robert, C.P.: Estimating mixtures of regressions. J. Comput. Graph. Stat. 12(1), 55-79 (2003) · doi:10.1198/1061860031329
[16] Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79-87 (1991) · doi:10.1162/neco.1991.3.1.79
[17] Jeong, J.: R2-based bootstrap tests for nonnested hypotheses in regression models. InterStat, http://interstat.statjournals.net/YEAR/2006/abstracts/0608001.php (2006). Accessed 21 January 2009
[18] Lau, K., Leung, P., Tse, K.: A mathematical programming approach to clusterwise regression model and its extensions. Eur. J. Oper. Res. 116, 640-652 (1999) · Zbl 0993.62052 · doi:10.1016/S0377-2217(98)00052-6
[19] Leisch, F.: FlexMix: a general framework for finite mixture models and latent class regression in R. J. Stat. Softw. 11(8), 1-18 (2004). http://www.jstatsoft.org/v11/i08/
[20] Luo, Z., Chou, E.Y.J.: Pavement condition prediction using clusterwise regression. TRB 85th Annual Meeting Compendium of Papers CD-ROM, www.eng.mu.edu/crovettj/courses/ceen175/06-2463.pdf (2006). Accessed 20 September 2009
[21] McLachlan, G.J.: On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Appl. Stat. 36(3), 318-324 (1987) · doi:10.2307/2347790
[22] R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org (2010)
[23] Späth, H.: Clusterwise linear regression. Computing 22, 367-373 (1979) · Zbl 0387.65028 · doi:10.1007/BF02265317
[24] Späth, H.: A fast algorithm for clusterwise linear regression. Computing 29, 175-181 (1981) · Zbl 0485.65030 · doi:10.1007/BF02249940
[25] Späth, H.: Clusterwise linear least absolute deviations regression. Computing 37, 371-378 (1986) · Zbl 0594.65100 · doi:10.1007/BF02251095
[26] Turner, T.R.: Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions. Appl. Stat. 49(3), 371-384 (2000) · Zbl 0971.62076
[27] Viele, K., Tong, B.: Modeling with mixtures of linear regressions. Stat. Comput. 12(4), 315-330 (2002) · doi:10.1023/A:1020779827503
[28] Wayne, S.D., Edwards, E.A.: Typologies of compulsive buying behavior: a constrained clusterwise regression approach. J. Consum. Psychol. 5, 231-262 (1996) · doi:10.1207/s15327663jcp0503_02
[29] Wedel, M., Kistemaker, C.: Consumer benefit segmentation using clusterwise linear regression. Int. J. Res. Mark. 6, 45-59 (1989) · doi:10.1016/0167-8116(89)90046-3
[30] Wulf, S.: Traditionelle nicht-metrische Conjointanalyse – ein Verfahrens vergleich. Münster, LIT-Verlag (2007)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.