×

Exact and heuristic algorithms for the maximum weighted submatrix coverage problem. (English) Zbl 1490.90257

Summary: The maximum weighted submatrix coverage problem is a recently introduced problem with applications in data mining. It is concerned with selecting \(K\) submatrices of a given numerical matrix such that the sum of the matrix-entries, which occur in at least one of the selected submatrices, is maximized.
In the paper introducing the problem, a problem-specific constraint programming approach was developed and embedded in a large neighborhood-search to obtain a heuristic. A compact integer linear programming formulation was also presented, but deemed inefficient due to its size. In this paper, we introduce new integer linear programming formulations for the problem, one of them is based on Benders decomposition. The obtained Benders decomposition-based formulation has a nice combinatorial structure, i.e., there is no need to solve linear programs to separate Benders cuts. We present preprocessing procedures and valid inequalities for all formulations. We also develop a greedy randomized adaptive search procedure for the problem, which is enhanced with a local search.
A computational study using the instances from literature is done to evaluate the effectiveness of our new approaches. Our algorithms manage to find improved primal solutions for ten out of 17 real-world instances, and optimality is proven for two real-world instances. Moreover, for over 700 of 1617 large-scale synthetic instances, our algorithms find improved primal solutions compared to the heuristics from the literature.

MSC:

90C27 Combinatorial optimization
90C10 Integer programming
90C57 Polyhedral combinatorics, branch-and-bound, branch-and-cut
90C59 Approximation methods and heuristics in mathematical programming
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Billionnet, A., Different formulations for solving the heaviest k-subgraph problem, INFOR: Information Systems and Operational Research, 43, 3, 171-186 (2005) · Zbl 07682442
[2] Branders, V.; Schaus, P.; Dupont, P., Combinatorial optimization algorithms to mine a sub-matrix of maximal sum, International workshop on new frontiers in mining complex patterns, 65-79 (2017), Springer
[3] Busygin, S.; Prokopyev, O.; Pardalos, P. M., Biclustering in data mining, Computers and Operations Research, 35, 9, 2964-2987 (2008) · Zbl 1144.68309
[4] Conforti, M.; Cornuéjols, G.; Zambelli, G., Integer programming, 271 (2014), Springer · Zbl 1307.90001
[5] Dao, T., Docquier, F., Maurel, M., & Schaus, P. (2018). Global migration in the 20th and 21st centuries: The unstoppable force of demography. Working Papers.
[6] Derval, G., Branders, V., Dupont, P., & Schaus, P. (2018). Synthetic dataset used in “the maximum weighted submatrix coverage problem: A CP approach”. Nov.10.5281/zenodo.3549866 · Zbl 1525.90351
[7] Derval, G.; Branders, V.; Dupont, P.; Schaus, P., The maximum weighted submatrix coverage problem: A CP approach, International conference on integration of constraint programming, artificial intelligence, and operations research, 258-274 (2019), Springer · Zbl 1525.90351
[8] Derval, G., Branders, V., Dupont, P., & Schaus, P. (2019b). “The maximum weighted submatrix coverage problem: A CP approach”: Experiment results. Nov.10.5281/zenodo.3549851 · Zbl 1525.90351
[9] Feo, T. A.; Resende, M. G., Greedy randomized adaptive search procedures, Journal of Global Optimization, 6, 2, 109-133 (1995) · Zbl 0822.90110
[10] IOC Research and Reference Service The guardian: Olympic sports and medals, 1896-2014. URL https://www.kaggle.com/the-guardian/olympic-games.
[11] Leskovec, J.; Krause, A.; Guestrin, C.; Faloutsos, C.; VanBriesen, J.; Glance, N., Cost-effective outbreak detection in networks, Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, 420-429 (2007)
[12] Letchford, A. N.; Sørensen, M. M., A new separation algorithm for the boolean quadric and cut polytopes, Discrete Optimization, 14, 61-71 (2014) · Zbl 1308.90209
[13] Madeira, S. C.; Oliveira, A. L., Biclustering algorithms for biological data analysis: a survey, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1, 1, 24-45 (2004)
[14] Padberg, M., The boolean quadric polytope: Some characteristics, facets and relatives, Mathematical Programming, 45, 1-3, 139-172 (1989) · Zbl 0675.90056
[15] Resende, M. G.; Ribeiro, C. C., Greedy randomized adaptive search procedures, Handbook of metaheuristics, 219-249 (2003), Springer · Zbl 1102.90384
[16] de Souto, M. C.; Costa, I. G.; de Araujo, D. S.; Ludermir, T. B.; Schliep, A., Clustering cancer gene expression data: A comparative study, BMC Bioinformatics, 9, 1, 497 (2008)
[17] Tanay, A.; Sharan, R.; Shamir, R., Biclustering algorithms: A survey, Handbook of Computational Molecular Biology, 9, 1-20, 122-124 (2005)
[18] Wolsey, L. A.; Nemhauser, G. L., Integer and combinatorial optimization, 55 (1999), John Wiley & Sons · Zbl 0944.90001
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.