×

Large neighborhood search applied to the software module clustering problem. (English) Zbl 1391.90526

Summary: The software module clustering problem seeks to distribute the modules comprising a software system into clusters to maximize cluster cohesion and minimize coupling between clusters. Metaheuristics based on local search have been successfully applied to find good solutions for this problem, outperforming more-complex, population-based heuristics. In this paper, we present a heuristic based on large neighborhood search to address the software module clustering problem. We also perform what is to our knowledge the largest experimental study addressing this problem to date, involving 124 instances of varying size and complexity and comparing our proposed algorithm to the heuristic that has found the best results for the problem so far. Our proposed algorithm outperformed the state-of-the-art heuristic on 93 out of 124 instances with 95% confidence level. We also report new upper bounds on the MQ value for 44 instances and evaluated the relative goodness of the solutions obtained by our proposed algorithm for 89 instances. Considering the 77 instances for which optimal solutions are proven, the proposed algorithm found the optimal solution for 30 instances (39%). Additionally, thirteen developers participate in a study focused on the distribution of the modules comprising a software project into clusters. We use JodaMoney as our study object and compare the characteristics of the solutions generated by its authors, the best solution generated by LNS_SMC and the solution generated by each of the 13 subjects. LNS_SMC solution performs better than inexperienced subjects ones in issue resolution prediction but performs worse than the solutions proposed by experienced subjects and also the author’s solution. For the prediction of concomitant changes, the LNS_SMC solution was outperformed by both subjects and authors solutions.

MSC:

90C27 Combinatorial optimization
68M14 Distributed systems
05C90 Applications of graph theory
68W40 Analysis of algorithms
90C59 Approximation methods and heuristics in mathematical programming

Software:

Ant; Bunch
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ahuja, R. K.; Ergun, O.; Orlin, J. B.; Punnen, A. P., A survey of very large-scale neighborhood search techniques, Discrete Appl. Math., 123, 1-3, 75-102, (2002) · Zbl 1014.68052
[2] Barros, M. O., An analysis of the effects of composite objectives in multiobjective software module clustering, Proceedings of the 2012 Conference on Genetic and Evolutionary Computation, 1205-1212, (2012), ACM Philadelphia
[3] Barros, M. O., An experimental study on incremental search-based software engineering, Proceedings of the 5th International Symposium on Search Based Software Engineering (SSBSE) 2013, St. Petersburg, Russia, August 24-26, 2013., 34-49, (2013)
[4] Bavota, G.; Carnevale, F.; De Lucia, A.; Di Penta, M.; Oliveto, R., Putting the developer in-the-loop: an interactive GA for software re-modularization, 75-89, (2012), Springer Berlin Heidelberg Berlin, Heidelberg
[5] Briand, L. C., Morasca, S., Basili, V. R., 1999. Defining and validating measures for object-based high-level design.; Briand, L. C., Morasca, S., Basili, V. R., 1999. Defining and validating measures for object-based high-level design.
[6] Deb, K.; Jain, H., An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part i: solving problems with box constraints, IEEE Trans. Evol. Comput., 18, 4, 577-601, (2014)
[7] Doval, D.; Mancoridis, S.; Mitchell, B. S., Automatic clustering of software systems using a genetic algorithm, Proceedings of the Software Technology and Engineering Practice, 1999 (STEP ’99), 73-81, (1999)
[8] Feltovich, N., Nonparametric tests of differences in medians: comparison of the Wilcoxon-Mann-Whitney and robust rank-order tests, Exp. Econ., 6, 3, 273-297, (2003) · Zbl 1031.62035
[9] Garey, M. R.; Johnson, D. S., Computers and intractability: A guide to the theory of NP-completeness (series of books in the mathematical sciences), (1979), W. H. Freeman
[10] Gibbs, S.; Casais, E.; Nierstrasz, O.; Pintado, X.; Tsichritzis, D., Class management for software communities, Commun. ACM, 33, 9, 90-103, (1990)
[11] Glorie, M.; Zaidman, A.; van Deursen, A.; Hofland, L., Splitting a large software repository for easing future software evolution—an industrial experience report, J. Softw. Maint. Evol., 21, 2, 113-141, (2009)
[12] Hall, M.; McMinn, P.; Walkinshaw, N., Supervised software modularisation, Proceedings of the 2012 IEEE International Conference on Software Maintenance (ICSM), 472-481, (2012), IEEE Computer Society Washington, DC, USA
[13] Hansen, P.; Jaumard, B., Cluster analysis and mathematical programming, Math. Program., 79, 1-3, 191-215, (1997) · Zbl 0887.90182
[14] Köhler, V.; Fampa, M.; Araújo, O., Mixed-integer linear programming formulations for the software clustering problem, Comput. Optim. Appl., 55, 1, 113-135, (2013) · Zbl 1273.90128
[15] Kramer, H. H., 2017. Private Communication. June.; Kramer, H. H., 2017. Private Communication. June.
[16] Kramer, H. H.; Uchoa, E.; Fampa, M.; Köhler, V.; Vanderbeck, F., Column generation approaches for the software clustering problem, Comput. Optim. Appl., 64, 3, 843-864, (2016) · Zbl 1352.90066
[17] Kumari, A. C.; Srinivas, K., Software module clustering using a fast multi-objective hyper-heuristic evolutionary algorithm, Int. J. Appl. Inf.Syst., 5, 6, 12-18, (2013)
[18] Lanza, M.; Marinescu, R., Object-oriented metrics in practice: using software metrics to characterize, evaluate, and improve the design of object-oriented systems, (2010), Springer-Verlag Berlin
[19] Larman, C., Applying UML and patterns: an introduction to object-oriented analysis and design and the unified process, (2002), Prentice Hall PTR
[20] Li, H.-L., A global approach for general 0-1 fractional programming, Eur. J. Oper. Res., 73, 3, 590-596, (1994) · Zbl 0806.90119
[21] Lourenço, H. R.; Martin, O. C.; Stützle, T., Iterated local search: framework and applications, (Gendreau, M.; Potvin, J.-Y., Handbook of Metaheuristics, 146, (2010), Springer US New York, USA), 363-397
[22] Mahdavi, K.; Harman, M.; Hierons, R. M., A multiple Hill climbing approach to software module clustering, Proceedings of the International Conference on Software Maintenance, 2003 (ICSM 2003), 315-324, (2003), IEEE Computer Society Amsterdam
[23] Mancoridis, S.; Mitchell, B. S.; Chen, Y.; Gansner, E. R., Bunch: a clustering tool for the recovery and maintenance of software system structures, Proceedings of the International Conference on Software Maintenance, 1999 (ICSM ’99), 50-59, (1999)
[24] Mancoridis, S.; Mitchell, B. S.; Rorres, C.; Chen, Y.; Gansner, E. R., Using automatic clustering to produce high-level system organizations of source code, Proceedings of the 6th International Workshop on Program Comprehension, 1998 (IWPC ’98), Ischia, 45-52, (1998)
[25] McConnell, S., Code complete, Developer Best Practices, (2004), Pearson Education
[26] Mitchell, B. S., A Heuristic Search Approach to Solving the Software Clustering Problem, (2002), Drexel University Philadelphia, PA, USA, Ph.d. thesis
[27] Mkaouer, W.; Kessentini, M.; BECHIKH, S.; DEB, K.; OUNI, A., Many-objective software remodularization using NSGA-III, ACM Trans. Softw. Eng. Method, (2014)
[28] Monçores, M. C.; Alvim, A. C.F.; Barros, M. O., Large neighborhood search for the software module clustering problem, Proceedings of the 11th Metaheuristics International Conference, (2015)
[29] de Oliveira Barros, M.; de Almeida Farzat, F.; Travassos, G. H., Learning from optimization: a case study with apache ant, Inf. Softw. Technol., 57, 684-704, (2015)
[30] Pinto, A.; Alvim, A. C.F.; Barros, M. O., ILS for the software module clustering problem, Simpósio Brasileiro de Pesquisa Operacional, 1972-1983, (2014)
[31] Pinto, A. F., Uma Heurística Baseada em Busca Local Iterada Para o Problema de Clusterização de Módulos de Software,, (2014), PPGI/UNIRIO Rio de Janeiro, RJ, Brasil, Master’s thesis
[32] Pisinger, D.; Ropke, S., Large neighborhood search, (Gendreau, M.; Potvin, J.-Y., Handbook of Metaheuristics, (2010), Springer US New York, USA), 399-419
[33] Praditwong, K., Solving software module clustering problem by evolutionary algorithms, 2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE), Nakhon Pathom, 154-159, (2011)
[34] Praditwong, K.; Harman, M.; Yao, X., Software module clustering as a multi-objective search problem, IEEE Trans. Softw. Eng., 37, 2, 264-282, (2011)
[35] Ropke, S.; Pisinger, D., An adaptive large neighborhood search heuristic for the pickup and delivery problem with time windows, Transp. Sci., 40, 4, 455-472, (2006)
[36] Semaan, G. S.; Ochi, L. S., Algoritmo evolutivo para o problema de clusterizaã§ão EM grafos orientados, Simpósio de pesquisa operacional da marinha, 2007 (SPOLM2007), Rio de Janeiro, (2007)
[37] Shaw, P., Using constraint programming and local search methods to solve vehicle routing problems, Proceedings of the 4th International Conference on Principles and Practice of Constraint Programming, 417-431, (1998), Springer-Verlag Pisa
[38] Sullivan, L. H., The tall office building artistically considered, Lippincott’s Mag., 403-409, (1896)
[39] Vargha, A.; Delaney, H. D., A critique and improvement of the cl common language effect size statistics of mcgraw and Wong, J. Educ. Behav. Stat., 25, 2, 101-132, (2000)
[40] Wen, Z.; Tzerpos, V., An effectiveness measure for software clustering algorithms, Proceedings. 12th IEEE International Workshop on Program Comprehension, 2004., 194-203, (2004)
[41] Yourdon, E.; Constantine, L., Structured design: fundamentals of a discipline of computer program and systems design, Yourdon Press computing series, (1979), Prentice Hall · Zbl 0466.68003
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.