×

zbMATH — the first resource for mathematics

Globally optimal clusterwise regression by column generation enhanced with heuristics, sequencing and ending subset optimization. (English) Zbl 1360.62318
Summary: A column generation based approach is proposed for solving the clusterwise regression problem. The proposed strategy relies firstly on several efficient heuristic strategies to insert columns into the restricted master problem. If these heuristics fail to identify an improving column, an exhaustive search is performed starting with incrementally larger ending subsets, all the while iteratively performing heuristic optimization to ensure a proper balance of exact and heuristic optimization. Additionally, observations are sequenced by their dual variables and by their inclusion in joint pair branching rules. The proposed strategy is shown to outperform the best known alternative (BBHSE) when the number of clusters is greater than three. Additionally, the current work further demonstrates and expands the successful use of the new paradigm of using incrementally larger ending subsets to strengthen the lower bounds of a branch and bound search as pioneered by Brusco’s Repetitive Branch and Bound Algorithm (RBBA).
MSC:
62H30 Classification and discrimination; cluster analysis (statistical aspects)
65F20 Numerical solutions to overdetermined systems, pseudoinverses
90C90 Applications of mathematical programming
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] ALOISE, D; CAFIERI, S; CAPOROSSI, G; HANSEN, P; PERRON, S; LIBERTI, L, Column generation algorithms for exact modularity maximization in networks, Physical Review E, 82, 046112, (2010)
[2] ALOISE, D., HANSEN, P., and LIBERTI, L. (2010b), “An Improved Column Generation Algorithm for Minimum Sum-of-Squares Clustering”, Mathematical Programming, 1-26. · Zbl 1236.90095
[3] AURIFEILLE, JM, A bio-mimetic approach to marketing segmentation: principles and comparative analysis, European Journal of Economic and Social Systems, 14, 93-108, (2000) · Zbl 0960.91064
[4] AURIFEILLE, JM; MEDLIN, C, A dyadic segmentation approach to business partnerships, European Journal of Economic and Social Systems, 15, 3-16, (2001) · Zbl 0996.91549
[5] AURIFEILLE, JM; QUESTER, PG, Predicting business ethical tolerance in international markets: A concomitant clusterwise regression analysis, International Business Review, 12, 253-272, (2003)
[6] BARNHART, C; JOHNSON, EL; NEMHAUSER, GL; SAVELSBERGH, MWP; VANCE, PH, Branch-and-price: column generation for solving huge integer programs, Operations Research, 46, 316-329, (1998) · Zbl 0979.90092
[7] BERKELAAR, M., EIKLAND, K., and NOTEBAERT, P. (2010), “Lp_Solve: Open Source (Mixed-Integer) Linear Programming System,” http://lpsolve.sourceforge.net/5.5/ · Zbl 0387.65028
[8] BRUSCO, MJ, A repetitive branch-and-bound procedure for minimum within-cluster sums of squares partitioning, Psychometrika, 71, 347-363, (2006) · Zbl 1306.62387
[9] BRUSCO, MJ; CRADIT, JD; STEINLEY, D; FOX, GL, Cautionary remarks on the use of clusterwise regression, Multivariate Behavioral Research, 43, 29-49, (2008)
[10] BRUSCO, M.J., and STAHL, S. (2005), Branch-and-Bound Applications in Combinatorial Data Analysis, New York: Springer Verlag. · Zbl 1093.62006
[11] CAPOROSSI, G., and HANSEN, P. (2005), “Variable Neighborhood Search for Least Squares Clusterwise Regression,” Les Cahiers du GERAD, G-2005-61. · Zbl 0945.90031
[12] CARBONNEAU, R.A., CAPOROSSI, G., and HANSEN, P. (2011a), “Extensions to the Repetitive Branch and Bound Algorithm for Globally Optimal Clusterwise Regression,” GERAD, G-2011-10. · Zbl 1251.90313
[13] CARBONNEAU, RA; CAPOROSSI, G; HANSEN, P, Globally optimal clusterwise regression by mixed logical-quadratic programming, European Journal of Operational Research, 212, 213-222, (2011)
[14] CHARLES, C. (1977), “Régression Typologique Et Reconnaissance Des Formes,” Thèse de doctorat 3ième cycle, Université de Paris IX.
[15] CHVÁTAL, V. (1983), Linear Programming, New York: WH Freeman. · Zbl 0537.90067
[16] CIAMPI, A; RICH, B; DYACHENKO, A; ANTONIANO, I; MURIE, C; NADON, R; Brito, P (ed.); Cucumel, G (ed.); Bertrand, P (ed.); Carvalho, F (ed.), Locally linear regression and the calibration problem for micro-array analysis, 549-555, (2007), Berlin · Zbl 1181.68111
[17] DANTZIG, GB; WOLFE, P, Decomposition principle for linear programs, Operations Research, 8, 101-111, (1960) · Zbl 0093.32806
[18] DESARBO, W, A maximum likelihood methodology for clusterwise linear regression, Journal of Classification, 5, 249-282, (1988) · Zbl 0692.62052
[19] DESARBO, W; OLIVER, R; RANGASWAMY, A, A simulated annealing methodology for clusterwise linear regression, Psychometrika, 54, 707-736, (1989)
[20] DESAULNIERS, G., DESROSIERS, J., and SOLOMON, M.M. (eds.) (2005), Column Generation (Vol. 5), Berlin: Springer Verlag. · Zbl 1084.90002
[21] DIDAY, E. (1979), Optimization En Classification Automatique, Le Chesnay: INRIA. · Zbl 0471.62056
[22] GENTLEMAN, WM, Least squares computations by givens transformations without square roots, IMA Journal of Applied Mathematics, 12, 329-336, (1973) · Zbl 0289.65020
[23] HANSEN, P; MEYER, C, A new column generation algorithm for logical analysis of data, Annals of Operations Research, 188, 215-249, (2011) · Zbl 1225.90175
[24] HENNIG, C, Identifiablity of models for clusterwise linear regression, Journal of Classification, 17, 273-296, (2000) · Zbl 1017.62058
[25] HOOKER, JN, Logic, optimization, and constraint programming, INFORMS Journal on Computing, 14, 295-321, (2002) · Zbl 1238.90002
[26] HOOKER, J.N. (2007), Integrated Methods for Optimization, New York: Springer. · Zbl 1122.90002
[27] HOOKER, JN; OSORIO, MA, Mixed logical-linear programming, Discrete Applied Mathematics, 96, 395-442, (1999) · Zbl 0945.90031
[28] HOOKER, JN; OTTOSSON, G; THORSTEINSSON, ES; KIM, HJ, A scheme for unifying optimization and constraint satisfaction methods, The Knowledge Engineering Review, 15, 11-30, (2000)
[29] IBM (2009a), Ibm Ilog Cplex 12.1.0, Armonk, NY: IBM.
[30] IBM (2009b), Ibm Ilog Opl 6.3, Armonk, NY: IBM.
[31] KNUTH, D.E. (1997), “Art of Computer Programming, Volume 2: Seminumerical Algorithms” (3\^{rd} ed.), Addison-Wesley Professional. · Zbl 0883.68015
[32] LAU, KN; LEUNG, PL; TSE, KK, A mathematical programming approach to clusterwise regression model and its extensions, European Journal of Operational Research, 116, 640-652, (1999) · Zbl 0993.62052
[33] LÜBBECKE, ME; DESROSIERS, J, Selected topics in column generation, Operations Research, 53, 1007-1023, (2005) · Zbl 1165.90578
[34] MIRKIN, B. (2005), Clustering for Data Mining, London: Chapman & Hall/CRC. · Zbl 1083.68099
[35] RYAN, DM; FOSTER, BA, An integer programming approach to scheduling, 269-280, (1981), Amsterdam
[36] SPÄTH, H, Algorithm 39 clusterwise linear regression, Computing, 22, 367-373, (1979) · Zbl 0387.65028
[37] SPÄTH, H, Correction to algorithm 39 clusterwise linear regression, Computing, 26, 275-275, (1981) · Zbl 0444.65020
[38] SPÄTH, H, A fast algorithm for clusterwise linear regression, Computing, 29, 175-181, (1982) · Zbl 0485.65030
[39] VAN HENTENRYCK, P., LUSTIG, I., MICHEL, L., and PUGET, J.F. (1999), The Opl Optimization Programming Language, Cambridge: MIT Press.
[40] WEDEL, M. (1990), “Clusterwise Regression and Market Segmentation. Developments and Applications,” doctoral thesis, Landbouwuniversiteit Wageningen. · Zbl 0289.65020
[41] WEDEL, M. (1998), Glimmix: Simultaneous Estimation of Latent Classes and Generalized Models within Each Latent Class, User’s Manual, Version 1.0, Groningen: ProGAMMA.
[42] WEDEL, M; DESARBO, WS, A mixture likelihood approach for generalized linear models, Journal of Classification, 12, 21-55, (1995) · Zbl 0825.62611
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.