##
**A flexible cluster-oriented alternative clustering algorithm for choosing from the Pareto front of solutions.**
*(English)*
Zbl 1321.68412

Summary: Supervised alternative clustering is the problem of finding a set of clusterings which are of high quality and different from a given negative clustering. The task is therefore a clear multi-objective optimization problem. Optimizing two conflicting objectives at the same time requires dealing with trade-offs. Most approaches in the literature optimize these objectives sequentially (one objective after another one) or indirectly (by some heuristic combination of the objectives). Solving a multi-objective optimization problem in these ways can result in solutions which are dominated, and not Pareto-optimal. We develop a direct algorithm, called COGNAC, which fully acknowledges the multiple objectives, optimizes them directly and simultaneously, and produces solutions approximating the Pareto front. COGNAC performs the recombination operator at the cluster level instead of at the object level, as in the traditional genetic algorithms. It can accept arbitrary clustering quality and dissimilarity objectives and provides solutions dominating those obtained by other state-of-the-art algorithms. Based on COGNAC, we propose another algorithm called SGAC for the sequential generation of alternative clusterings where each newly found alternative clustering is guaranteed to be different from all previous ones. The experimental results on widely used benchmarks demonstrate the advantages of our approach.

### MSC:

68T05 | Learning and adaptive systems in artificial intelligence |

62-07 | Data analysis (statistics) (MSC2010) |

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |

### Keywords:

alternative clustering; multi-objective optimization; cluster-oriented recombination; genetic algorithms
PDF
BibTeX
XML
Cite

\textit{D. T. Truong} and \textit{R. Battiti}, Mach. Learn. 98, No. 1--2, 57--91 (2015; Zbl 1321.68412)

Full Text:
DOI

### References:

[1] | Bae, E.; Bailey, J., Coala: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity, 53-62, (2006), New York |

[2] | Battiti, R.; Passerini, A., Brain-computer evolutionary multiobjective optimization: a genetic algorithm adapting to the decision maker, IEEE Transactions on Evolutionary Computation, 14, 671-687, (2010) |

[3] | Battiti, R.; Brunato, M.; Mascia, F., Reactive search and intelligent optimization, (2008), Berlin · Zbl 1158.68036 |

[4] | Branke, J., Deb, K., & Miettinen, K. (2008). Multiobjective optimization: interactive and evolutionary approaches (Vol. 5252). New York: Springer. · Zbl 1147.68304 |

[5] | Cui, Y.; Fern, X. Z.; Dy, J. G., Non-redundant multi-view clustering via orthogonalization, 133-142, (2007), New York |

[6] | Dang, X. H.; Bailey, J., Generation of alternative clusterings using the cami approach, 118-129, (2010) |

[7] | Dasgupta, S.; Ng, V., Mining clustering dimensions, 263-270, (2010) |

[8] | Davidson, I.; Qi, Z., Finding alternative clusterings using constraints, 773-778, (2008), New York |

[9] | Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T., A fast and elitist multiobjective genetic algorithm: NSGA-II-ii, IEEE Transactions on Evolutionary Computation, 6, 182-197, (2002) |

[10] | Bie, T., Subjectively interesting alternative clusters, 43-54, (2011) |

[11] | Falkenauer, E., A new representation and operators for genetic algorithms applied to grouping problems, Evolutionary Computation, 2, 123-144, (1994) |

[12] | Frank, A., & Asuncion, A. UCI machine learning repository. URL http://archive.ics.uci.edu/ml. |

[13] | Gondek, D.; Hofmann, T., Conditional information bottleneck clustering, 36-42, (2003), Princeton |

[14] | Gondek, D.; Hofmann, T., Non-redundant clustering with conditional ensembles, 70-77, (2005), New York |

[15] | Günnemann, S.; Färber, I.; Seidl, T., Multi-view clustering using mixture models in subspace projections, 132-140, (2012), New York |

[16] | Handl, J.; Knowles, J., An evolutionary approach to multiobjective clustering, IEEE Transactions on Evolutionary Computation, 11, 56-76, (2007) |

[17] | Hruschka, E. R.; Campello, R. J. G. B.; Freitas, A. A.; Carvalho, A. C. P. L. F., A survey of evolutionary algorithms for clustering, IEEE Transactions on Systems, Man and Cybernetics. Part C, Applications and Reviews, 39, 133-155, (2009) |

[18] | Hubert, L.; Arabie, P., Comparing partitions, Journal of Classification, 2, 193-218, (1985) |

[19] | Jain, P.; Meka, R.; Dhillon, I. S., Simultaneous unsupervised learning of disparate clusterings, Statistical Analysis and Data Mining, 1, 195-210, (2008) |

[20] | Kirkpatrick, S.; Gelatt, C. D.; Vecchi, M. P., Optimization by simulated annealing, Science, 220, 671-680, (1983) · Zbl 1225.90162 |

[21] | Lance, G. N.; Williams, W. T., A general theory of classificatory sorting strategies II. clustering systems, The Computer Journal, 10, 271-277, (1967) |

[22] | Lloyd, S., Least squares quantization in pcm, IEEE Transactions on Information Theory, 28, 129-137, (1982) · Zbl 0504.94015 |

[23] | Munkres, J., Algorithms for the assignment and transportation problems, Journal of the Society for Industrial and Applied Mathematics, 5, 32-38, (1957) · Zbl 0083.15302 |

[24] | Niu, D.; Dy, J. G.; Jordan, M. I., Multiple non-redundant spectral clustering views, 831-838, (2010) |

[25] | Qi, Z. J.; Davidson, I., A principled and flexible framework for finding alternative clusterings, 717-726, (2009), New York |

[26] | Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 846-850. |

[27] | Tishby, N., Pereira, F. C., & Bialek, W. (2000). The information bottleneck method. arXiv:physics/0004057. |

[28] | Truong, D. T.; Battiti, R., A cluster-oriented genetic algorithm for alternative clustering, 1122-1127, (2012), New York |

[29] | Vinh, N. X.; Epps, J., Mincentropy: a novel information theoretic approach for the generation of alternative clusterings, 521-530, (2010), New York |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.