zbMATH — the first resource for mathematics

\(k\)-means genetic algorithms with greedy genetic operators. (English) Zbl 1459.62114
Summary: The \(k\)-means problem is one of the most popular models of cluster analysis. The problem is NP-hard, and modern literature offers many competing heuristic approaches. Sometimes practical problems require obtaining such a result (albeit notExact), within the framework of the \(k\)-means model, which would be difficult to improve by known methods without a significant increase in the computation time or computational resources. In such cases, genetic algorithms with greedy agglomerative heuristic crossover operator might be a good choice. However, their computational complexity makes it difficult to use them for large-scale problems. The crossover operator which includes the \(k\)-means procedure, taking the absolute majority of the computation time, is essential for such algorithms, and other genetic operators such as mutation are usually eliminated or simplified. The importance of maintaining the population diversity, in particular, with the use of a mutation operator, is more significant with an increase in the data volume and available computing resources such as graphical processing units (GPUs). In this article, we propose a new greedy heuristic mutation operator for such algorithms and investigate the influence of new and well-known mutation operators on the objective function value achieved by the genetic algorithms for large-scale \(k\)-means problems. Our computational experiments demonstrate the ability of the new mutation operator, as well as the mechanism for organizing subpopulations, to improve the result of the algorithm.
62H30 Classification and discrimination; cluster analysis (statistical aspects)
90C59 Approximation methods and heuristics in mathematical programming
Full Text: DOI
[1] Garey, M.; Johnson, D.; Witsenhausen, H., The complexity of the generalized lloyd - max problem (Corresp.), IEEE Transactions on Information Theory, 28, 2, 255-256 (1982) · Zbl 0476.94009
[2] Aloise, D.; Deshpande, A.; Hansen, P.; Popat, P., NP-hardness ofEuclidean sum-of-squares clustering, Machine Learning, 75, 2, 245-248 (2009) · Zbl 1378.68047
[3] Drezner, Z.; Hamacher, H., Facility Location: Applications and Theory (2004), Berlin, Germany: Springer-Verlag, Berlin, Germany
[4] Lloyd, S., Least squares quantization in PCM, IEEE Transactions on Information Theory, 28, 2, 129-137 (1982) · Zbl 0504.94015
[5] MacQueen, J. B., Some methods of classification and analysis of multivariate observations, Proceedings of the 5th Berkley Symposium on Mathematical Statistics and Probability, University of California Press
[6] Cooper, L., Heuristic methods for location-allocation problems, SIAM Review, 6, 1, 37-53 (1964) · Zbl 0956.90014
[7] Jiang, J.-L.; Yuan, X.-M., A heuristic algorithm for constrained multi-source weber problem - the variational inequality approach, uropean Journal of Operational Research, 187, 2, 357-370 (2008) · Zbl 1149.90091
[8] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via theEMAlgorithm, Journal of the Royal Statistical Society: Series B (Methodological), 39, 1, 1-22 (1977) · Zbl 0364.62022
[9] Kazakovtsev, L.; Stashkov, D.; Gudyma, M.; Kazakovtsev, V., Algorithms with greedy heuristic procedures for mixture probability distribution separation, Yugoslav Journal of Operations Research, 29, 1, 51-67 (2019)
[10] Zhang, T.; Ramakrishnan, R.; Livny, M., BIRCH: anEffcient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD’96), ACM
[11] O’Callaghan, L.; Meyerson, A.; Motwani, R.; Mishra, N.; Guha, S., Streaming-data algorithms for high-quality clustering, dataEngineering, Proceedings 18th International Conference on DataEngineering, IEEE
[12] Ackermann, M. R.; Märtens, M.; Raupach, C.; Swierkot, K.; Lammersen, C.; Sohler, C., StreamKM++, ACM Journal ofExperimental Algorithmics, 17, 2 (2012) · Zbl 1284.68234
[13] Masuyama, S.; Ibaraki, T.; Hasegawa, T., The computational complexity of the m-center problems on the plane, The Transactions of the Institute ofElectronics and CommunicationEngineers of Japan, 64E, 57-64 (1981)
[14] Kariv, O.; Hakimi, S. L., An algorithmic approach to network location problems. I: thep-centers, SIAM Journal on Applied Mathematics, 37, 3, 513-538 (1979) · Zbl 0432.90074
[15] Kuenne, R. E.; Soland, R. M., Exact and approximate solutions to the multisource weber problem, Mathematical Programming, 3-3, 1, 193-209 (1972) · Zbl 0245.90021
[16] Ostresh, L. M., The stepwise location-allocation problem:Exact solutions in continuous and discrete spaces, Geographical Analysis, 10, 2, 174-185 (1978)
[17] Rosing, K. E., An optimal method for solving the (generalized) multi-weber problem, uropean Journal of Operational Research, 58, 3, 414-426 (1992) · Zbl 0760.90064
[18] Farahani, R. Z.; Hekmatfar, M., Facility Location Concepts, Models, Algorithms and Case Studies (2009), Berlin Heidelberg, Germany: SpringerVerlag, Berlin Heidelberg, Germany
[19] Mladenovic, N.; Brimberg, J.; Hansen, P.; Moreno-Perez, J. A., The p-median problem: a survey of metaheuristic approaches, uropean Journal of Operational Research, 179, 927-939 (2007) · Zbl 1163.90610
[20] Reese, J., Solution methods for thep-median problem: an annotated bibliography, Networks, 48, 3, 125-142 (2006) · Zbl 1133.90357
[21] Brimberg, J.; Drezner, Z.; Mladenović, N.; Salhi, S., A new local search for continuous location problems, uropean Journal of Operational Research, 232, 2, 256-265 (2014) · Zbl 1305.90267
[22] Drezner, Z.; Brimberg, J.; Mladenović, N.; Salhi, S., New heuristic algorithms for solving the planar p-median problem, Computers & Operations Research, 62, 296-304 (2015) · Zbl 1348.90388
[23] Drezner, Z.; Brimberg, J.; Mladenović, N.; Salhi, S., Solving the planar p-median problem by variable neighborhood and concentric searches, Journal of Global Optimization, 63, 3, 501-514 (2015) · Zbl 1327.90090
[24] Mishra, N.; Oblinger, D.; Pitt, L., Sublinear Time Approximate Clustering (2001), Palo Alto, CA, USA: Hewlett-Packard Labs, Palo Alto, CA, USA · Zbl 0987.68068
[25] Kaufman, L.; Rousseeuw, P. J., Finding Groups in Data: An Introduction to Cluster Analysis (1990), New York, NY, USA: Wiley, New York, NY, USA · Zbl 1345.62009
[26] Eisenbrand, F.; Grandoni, F.; Rothvosz, T.; Schafer, G., Approximating connected facility location problems via random facility sampling and core detouring, Proceedings of SODA’2008, ACM
[27] Jaiswal, R.; Kumar, A.; Sen, S., A simple D 2-sampling based PTAS for k-means and other clustering problems, Algorithmica, 70, 1, 22-46 (2014) · Zbl 1364.68369
[28] Avella, P.; Boccia, M.; Salerno, S.; Vasilyev, I., An aggregation heuristic for large scale p-median problem, Computers & Operations Research, 39, 7, 1625-1632 (2012) · Zbl 1251.90234
[29] Francis, R. L.; Lowe, T. J.; Rayco, M. B.; Tamir, A., AggregationError for location models: survey and analysis, Annals of Operations Research, 167, 1, 171-208 (2009) · Zbl 1173.90005
[30] Arthur, D.; Vassilvitskii, S., k-Means++: the advantages of careful seeding, Proceedings of SODA’07, SIAM
[31] Hansen, P.; Mladenovic, N.; Burke, . K.; Kendall, G., Variable neighborhood search, Search Methodologies (2005), Boston, MA, USA: Springer, Boston, MA, USA
[32] Rozhnov, I. P.; Orlov, V. I.; Kazakovtsev, L. A., VNS-based algorithms for the centroid-based clustering problem, Facta Universitatis Series: Mathematics and Informatics, 34, 5, 957-972 (2019)
[33] Still, S.; Bialek, W.; Bottou, L., Geometric clustering using the information bottleneck method, Proceedings of the Advances In Neural Information Processing Systems, Cambridge, UK: MIT Press, Cambridge, UK
[34] Sun, Z.; Fox, G.; Gu, W.; Li, Z., A parallel clustering method combined information bottleneck theory and centroid-based clustering, The Journal of Supercomputing, 69, 1, 452-467 (2014)
[35] Houck, C. R.; Joines, J. A.; Kay, M. G., Comparison of genetic algorithms, random restart and two-opt switching for solving large location-allocation problems, Computers & Operations Research, 23, 6, 587-596 (1996) · Zbl 0847.90091
[36] Maulik, U.; Bandyopadhyay, S., Genetic algorithm-based clustering technique, Pattern Recognition, 33, 9, 1455-1465 (2000)
[37] Krishna, K.; Narasimha Murty, M., Genetic K-means algorithm, IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 29, 3, 433-439 (1999)
[38] Neema, M. N.; Maniruzzaman, K. M.; Ohgai, A., New genetic algorithms based approaches to continuous p-median problem, Networks and SpatialEconomics, 11, 1, 83-99 (2011) · Zbl 1213.90163
[39] Kazakovtsev, L. A.; Rozhnov, I., Application of algorithms with variable greedy heuristics for k-medoids problems, Informatica, 44, 1, 55-61 (2020)
[40] Hosage, C. M.; Goodchild, M. F., Discrete space location-allocation solutions from genetic algorithms, Annals of Operations Research, 6, 2, 35-46 (1986)
[41] Alp, O.; Erkut, .; Drezner, Z., AnEfficient genetic algorithm for the p-median problem, Annals of Operations Research, 122, 1/4, 21-42 (2003) · Zbl 1038.90046
[42] Kim, K.; Ahn, H., A recommender system using GA K-means clustering in an online shopping market, xpert Systems with Applications, 34, 2, 1200-1209 (2008)
[43] Kazakovtsev, L. A.; Antamoshkin, A. N., Genetic algorithm with fast greedy heuristic for clustering and location problems, Informatica, 38, 3, 229-240 (2014)
[44] Kwedlo, W.; Iwanowicz, P., Using Genetic Algorithm for Selection of Initial Cluster Centers for the K-Means Method, ICAISC 2010: Artifical Intelligence and Soft Computing (2010), Berlin, Heidelberg, Germany: Springer-Verlag, Berlin, Heidelberg, Germany
[45] He, Z.; Yu, C., Clustering stability-basedEvolutionary K-means, Soft Computing, 23, 1, 305-321 (2019) · Zbl 07075521
[46] Pizzuti, C.; Procopio, N., A K-means based genetic algorithm for data clustering, Proceedings of the International Joint Conference SOCO’16-CISIS’16-ICEUTE’16
[47] Rousseeuw, P. J., Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, 20, 53-65 (1987) · Zbl 0636.62059
[48] Davies, D. L.; Bouldin, D. W., A cluster separation measure, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1, 2, 224-227 (1979)
[49] remeev, A. V., Genetic algorithm with tournament selection as a local search method, Discrete Analysis and Operations Research, 19, 2, 41-53 (2012) · Zbl 1324.68172
[50] Holland, J. H., Adaptation in Natural and Artificial Systems (1992), Cambridge, UK: MIT Press, Cambridge, UK
[51] Fogel, D. B.; Atmar, J. W., Comparing genetic operators with Gaussian mutations in simulatedEvolutionary processes using linear systems, Biological Cybernetics, 63, 2, 111-114 (1990)
[52] Liu, C.; Kroll, A., On designing genetic algorithms for solving small- and medium-scale traveling salesman problems, Swarm andEvolutionary Computation, 7269, 283-291 (2012)
[53] Osaba, E.; Carballedo, R.; Diaz, F.; Onieva, E.; de la Iglesia, I.; Perallos, A., Crossover versus mutation: a comparative analysis of theEvolutionary strategy of genetic algorithms applied to combinatorial optimization problems, The Scientific World Journal, 2014 (2014)
[54] Walkenhorst, J.; Bertram, T., Multikriterielleoptimierungsverfahren Fur pickup-and-delivery-probleme, Proceedings of 21. Workshop Computational Intelligence
[55] Kazakovtsev, L. A.; Antamoshkin, A. N., Greedy heuristic method for location problems, Vestnik SibGAU, 16, 2, 317-325 (2015)
[56] Zeebaree, D. Q.; Haron, H.; Abdulazeez, A. M.; Zeebaree, S. R. M., Combination of K-means clustering with genetic algorithm: a review, International Journal of AppliedEngineering Research, 12, 24, 14238-14245 (2017)
[57] Hruschka, . R.; Campello, R. J. G. B.; Freitas, A. A.; de Carvalho, A. C. P. L. F., A survey ofEvolutionary algorithms for clustering, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 39, 2, 133-155 (2009)
[58] Freitas, A. A., A Review ofEvolutionary Algorithms for Data Mining, Data Mining and Knowledge Discovery Handbook (2009), Oxford, UK: Oxford University, Oxford, UK
[59] Bandyopadhyay, S., Genetic algorithms for clustering and fuzzy clustering, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1, 6, 524-531 (2011)
[60] Larrañaga, P.; Kuijpers, C. M. H.; Murga, R. H.; Inza, I.; Dizdarevic, S., Genetic algorithms for the travelling salesman problem: a review of representations and operators, Artificial Intelligence Review, 13, 2, 129-170 (1999)
[61] Sarangi, A.; Lenka, R.; Sarangi, S. K., Design of linear phase firhigh pass filter using PSO with gaussian mutation, Proceedings of the Swarm,Evolutionalry, and Memetic Computing
[62] Deb, D.; Deb, K., Investigation of mutation schemes in real-parameter genetic algorithms, Swarm,Evolutionary, and Memetic Computing, 7677, 1-8 (2012)
[63] Deep, K.; Thakur, M., A new mutation operator for real coded genetic algorithms, Applied Mathematics and Computation, 193, 1, 211-230 (2007) · Zbl 1193.68209
[64] Deep, K.; Mebrahtu, H., Combined mutation operators of genetic algorithm for the travelling salesman problem, International Journal of Combinatorial Optimization Problems and Informatics, 2, 3, 1-23 (2011)
[65] Hong, T.-P.; Wang, H.-S.; Chen, W.-C., Simultaneously applying multiple mutation operators in genetic algorithms, Journal of Heuristics, 6, 4, 439-455 (2000) · Zbl 0972.68630
[66] McGinley, B.; Maher, J.; O’Riordan, C.; Morgan, F., Maintaining healthy population diversity using adaptive crossover, mutation, and selection, IEEE Transactions onEvolutionary Computation, 15, 5, 692-714 (2011)
[67] Serpell, M.; Smith, J. E., Self-adaptation of mutation operator and probability for permutation representations in genetic algorithms, volutionary Computation, 18, 3, 491-514 (2010)
[68] Brizuela, C. A.; Aceves, R., xperimental Genetic Operators Analysis for the Multi-Objective Permutation Flowshop (2003), Berlin, Heidelberg, Garmany: Springer-Verlag, Berlin, Heidelberg, Garmany · Zbl 1036.90518
[69] Wang, L.; Zhang, L., Determining optimal combination of genetic operators for flow shop scheduling, The International Journal of Advanced Manufacturing Technology, 30, 3-4, 302-308 (2006)
[70] Hasan, B. H. F.; Saleh, M. S. M., valuating theEffectiveness of mutation operators on the behavior of genetic algorithms applied to non-deterministic polynomial problems, Informatica, 35, 4, 513-518 (2011)
[71] Karthikeyan, P.; Baskar, S.; Alphones, A., Improved genetic algorithm using different genetic operator combinations (GOCs) for multicast routing in ad hoc networks, Soft Computing, 17, 9, 1563-1572 (2013)
[72] Correa, . S.; Steiner, M. T. A.; Freitas, A. A.; Carnieri, C., A genetic algorithm for the p-median problem, Proceedings of the GECCO-2001, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc, San Francisco, CA, USA
[73] Hansen, P.; Mladenović, N., J-Means: a new local search heuristic for minimum sum of squares clustering, Pattern Recognition, 34, 2, 405-413 (2001) · Zbl 1012.68873
[74] Alkhalifah, Y.; Wainwright, R. L., A genetic algorithm applied to graph problems involving subsets of vertices, Proceedings of the 2004 Congress onEvolutionary Computation, Portland, IEEE
[75] Lu, Y.; Lu, S.; Fotouhi, F.; Deng, Y.; Brown, S. J., FGKA: a fast genetic k-means clustering algorithm, Proceedings of the 2004 ACM Symposium on Applied Computing-SAC’04
[76] Cheng, S. S.; Chao, Y. H.; Wang, H. M.; Fu, H. C., A prototypes-embedded genetic k-means algorithm, Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), IEEE
[77] Chang, D.-X.; Zhang, X.-D.; Zheng, C.-W., A genetic algorithm with gene rearrangement for K-means clustering, Pattern Recognition, 42, 7, 1210-1222 (2009)
[78] Goldberg, D. E.; Richarson, J., Genetic algorithms with sharing for multimodal function optimization, Proceedings 2nd International Conference on Genetic Algorithms
[79] Jong, K. A. D., An analysis of the behaviour of a class of genetic adaptive systems (1975), Ann Arbor, MI, USA: University of Michigan, Ann Arbor, MI, USA, Ph.D. thesis
[80] Mengshoel, O.; Goldberg, D., Probabilistic Crowding: Deterministic Crowding with Probabilistic Replacement (1999), San Francisco, CA, USA: Morgan Kaufmann Publishers Inc, San Francisco, CA, USA
[81] Ono, I.; Kobayashi, S., A Real-coded Genetic Algorithm Using the Unimodal Normal Distribution Crossover: Natural Computing Series (2003), Berlin, Heidelberg, Germany: Springer-Verlag, Berlin, Heidelberg, Germany
[82] Dumitrescu, D.; Stoean, C., The genetic chromodynamics metaheuristic, Proceedings of TELE-INFO’06, WSEAS, Stevens Point
[83] Lung, R. I.; Dumitrescu, D., Roaming optimization: a newEvolutionary technique for multimodal optimization, Studia Universitatis Babes-Bolyai - Informatica, XLIX, 1, 99-109 (2004) · Zbl 1118.90330
[84] Zechner, M.; Granitzer, M., Accelerating K-means on the graphics processor via CUDA, Proceedings of the International Conference on Intensive Applications and Services
[85] Luebke, D.; Humphreys, G., How GPUs work, Computer, 40, 2, 96-100 (2007)
[86] Sivanandam, S. N.; Deepa, S. N., Introduction to Genetic Algorithms (2007), Berlin, Germany: Springer, Berlin, Germany · Zbl 1129.90001
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.