×

Correlation clustering in general weighted graphs. (English) Zbl 1099.68074

Summary: We consider the following general correlation-clustering problem [(*) N. Bansal, A. Blum, and S. Chawla, “Correlation clustering”, in: Proc. 43rd Annu. IEEE Symp. on Foundations of Computer Science, Vancouver, Canada, November 2002, 238–250 (2002)]: given a graph with real nonnegative edge weights and a \(\langle+\rangle/\langle-\rangle\) edge labelling, partition the vertices into clusters to minimize the total weight of cut \(\langle+\rangle\) edges and uncut \(\langle-\rangle\) edges. Thus, \(\langle+\rangle\) edges with large weights (representing strong correlations between endpoints) encourage those endpoints to belong to a common cluster while \(\langle-\rangle\) edges with large weights encourage the endpoints to belong to different clusters. In contrast to most clustering problems, correlation clustering specifies neither the desired number of clusters nor a distance threshold for clustering; both of these parameters are effectively chosen to be the best possible by the problem definition. Correlation clustering was introduced in (*) motivated by both document clustering and agnostic learning. They proved NP-hardness and gave constant-factor approximation algorithms for the special case in which the graph is complete (full information) and every edge has the same weight. We give an \(O(\log n)\)-approximation algorithm for the general case based on a linear-programming rounding and the “region-growing” technique. We also prove that this linear program has a gap of \(\Omega (\log n)\), and therefore our approximation is tight under this approach. We also give an \(O(r^{3})\)-approximation algorithm for \(K_{r,r}\)-minor-free graphs. On the other hand, we show that the problem is equivalent to minimum multicut, and therefore APX-hard and difficult to approximate better than \(\Theta (\log n)\).

MSC:

68R10 Graph theory (including graph drawing) in computer science
68W25 Approximation algorithms
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] N. Bansal, A. Blum, S. Chawla, Correlation clustering, in: Proc. 43rd Annu. IEEE Symp. on Foundations of Computer Science, Vancouver, Canada, November 2002, pp. 238-250.; N. Bansal, A. Blum, S. Chawla, Correlation clustering, in: Proc. 43rd Annu. IEEE Symp. on Foundations of Computer Science, Vancouver, Canada, November 2002, pp. 238-250. · Zbl 1089.68085
[2] Y. Bejerano, N. Immorlica, J. Naor, M. Smith, Efficient location area planning for personal communication systems, in: Proc. Ninth Annual Internat. Conf. on Mobile Computing and Networking, San Diego, CA, September 2003, pp. 109-121.; Y. Bejerano, N. Immorlica, J. Naor, M. Smith, Efficient location area planning for personal communication systems, in: Proc. Ninth Annual Internat. Conf. on Mobile Computing and Networking, San Diego, CA, September 2003, pp. 109-121.
[3] M. Charikar, V. Guruswami, A. Wirth, Clustering with qualitative information, in: Proc. 44th Annu. IEEE Symp. Foundations of Computer Science, 2003, pp. 524-533.; M. Charikar, V. Guruswami, A. Wirth, Clustering with qualitative information, in: Proc. 44th Annu. IEEE Symp. Foundations of Computer Science, 2003, pp. 524-533. · Zbl 1094.68075
[4] M. Charikar, A. Wirth, Maximizing quadratic programs: extending grothendieck’s inequality, in: Proc. 45th Annu. IEEE Symp. on Foundations of Computer Science, 2004, pp. 54-60.; M. Charikar, A. Wirth, Maximizing quadratic programs: extending grothendieck’s inequality, in: Proc. 45th Annu. IEEE Symp. on Foundations of Computer Science, 2004, pp. 54-60.
[5] Dahlhaus, E.; Johnson, D. S.; Papadimitriou, C. H.; Seymour, P. D.; Yannakakis, M., The complexity of multiterminal cuts, SIAM J. Comput., 4, 23, 864-894 (1994) · Zbl 0809.68075
[6] E.D. Demaine, N. Immorlica, Correlation clustering with partial information, in: Proc. Sixth Internat. Workshop on Approximation Algorithms for Combinatorial Optimization Problems, Princeton, NJ, August 2003, pp. 1-13.; E.D. Demaine, N. Immorlica, Correlation clustering with partial information, in: Proc. Sixth Internat. Workshop on Approximation Algorithms for Combinatorial Optimization Problems, Princeton, NJ, August 2003, pp. 1-13. · Zbl 1202.68479
[7] D. Emanuel, A. Fiat, Correlation clustering—minimizing disagreements on arbitrary weighted graphs, in: Proc. 11th Annu. European Symp. on Algorithms, 2003, pp. 208-220.; D. Emanuel, A. Fiat, Correlation clustering—minimizing disagreements on arbitrary weighted graphs, in: Proc. 11th Annu. European Symp. on Algorithms, 2003, pp. 208-220. · Zbl 1266.68228
[8] Ester, M.; Kriegel, H.-P.; Sander, J.; Xu, X., Clustering for mining in large spatial databases, KI-Journal, 1, 18-24 (1998), (Special Issue on Data Mining. ScienTec Publishing)
[9] Garg, N.; Vazirani, V. V.; Yannakakis, M., Approximate max-flow min-(multi)cut theorems and their applications, SIAM J. Comput., 25, 2, 235-251 (1996) · Zbl 0844.68061
[10] Garg, N.; Vazirani, V. V.; Yannakakis, M., Primal-dual approximation algorithms for integral flow and multicut in trees, Algorithmica, 18, 1, 3-20 (1997) · Zbl 0873.68075
[11] Hochbaum, D. S.; Shmoys, D. B., A unified approach to approximation algorithms for bottleneck problems, J. ACM, 33, 533-550 (1986)
[12] Hu, T. C., Multicommodity network flows, Oper. Res., 11, 344-360 (1963) · Zbl 0123.23704
[13] K. Jain, V.V. Vazirani, Primal-dual approximation algorithms for metric facility location and \(k\); K. Jain, V.V. Vazirani, Primal-dual approximation algorithms for metric facility location and \(k\)
[14] Kanungo, T.; Mount, D. M.; Netanyahu, N. S.; Piatko, C. D.; Silverman, R.; Wu, A. Y., An efficient \(k\)-means clustering algorithm: analysis and implementation, IEEE Trans. Pattern Anal. Mach. Intell., 24, 7, 881-892 (2002)
[15] D. Klein, S.D. Kamvar, C.D. Manning, From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering, in: Proc. 19th Internat. Conf. on Machine Learning, 2002, pp. 307-314.; D. Klein, S.D. Kamvar, C.D. Manning, From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering, in: Proc. 19th Internat. Conf. on Machine Learning, 2002, pp. 307-314.
[16] P.N. Klein, S.A. Plotkin, S. Rao, Excluded minors, network decomposition, and multicommodity flow, in: Proc. 25th Annu. ACM Symp. on Theory of Computing, 1993, pp. 682-690.; P.N. Klein, S.A. Plotkin, S. Rao, Excluded minors, network decomposition, and multicommodity flow, in: Proc. 25th Annu. ACM Symp. on Theory of Computing, 1993, pp. 682-690. · Zbl 1310.90017
[17] Leighton, T.; Rao, S., Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms, J. ACM, 46, 6, 787-832 (1999) · Zbl 1065.68666
[18] J.B. MacQueen, Some methods for classification and analysis of multivariate observations, in: Proc. Fifth Symp. on Mathematical, Statistics, and Probability, 1967, pp. 281-297.; J.B. MacQueen, Some methods for classification and analysis of multivariate observations, in: Proc. Fifth Symp. on Mathematical, Statistics, and Probability, 1967, pp. 281-297. · Zbl 0214.46201
[19] M. Meila, D. Heckerman, An experimental comparison of several clustering and initialization methods, Conf. on Uncertainty in Artificial Intelligence, 1998, pp. 386-395.; M. Meila, D. Heckerman, An experimental comparison of several clustering and initialization methods, Conf. on Uncertainty in Artificial Intelligence, 1998, pp. 386-395.
[20] Murtagh, F., A survey of recent advances in hierarchical clustering algorithms, Comput. J., 26, 4, 354-359 (1983) · Zbl 0523.68030
[21] C.M. Procopiuc, Clustering problems and their applications, Department of Computer Science, Duke University. \( \langle;\) http://www.cs.duke.edu/\( \sim;\rangle;\); C.M. Procopiuc, Clustering problems and their applications, Department of Computer Science, Duke University. \( \langle;\) http://www.cs.duke.edu/\( \sim;\rangle;\) · Zbl 0495.76118
[22] O. Reingold, S.P. Vadhan, A. Wigderson, Entropy waves, the zig-zag graph product, and new constant-degree expanders and extractors, Electronic Colloq. on Computational Complexity (ECCC), Vol. 8(18), 2001.; O. Reingold, S.P. Vadhan, A. Wigderson, Entropy waves, the zig-zag graph product, and new constant-degree expanders and extractors, Electronic Colloq. on Computational Complexity (ECCC), Vol. 8(18), 2001. · Zbl 1008.05101
[23] M. Satchell, Hunting for good Will: Will the real Shakespeare please stand up? U.S. News and World Report, July 24 2000, pp. 71-72.; M. Satchell, Hunting for good Will: Will the real Shakespeare please stand up? U.S. News and World Report, July 24 2000, pp. 71-72.
[24] L.J. Schulman, Clustering for edge-cost minimization, Electronic Colloq. on Computational Complexity (ECCC), Vol. 6(035), 1999.; L.J. Schulman, Clustering for edge-cost minimization, Electronic Colloq. on Computational Complexity (ECCC), Vol. 6(035), 1999. · Zbl 1296.68197
[25] M. Steinbach, G. Karypis, V. Kumar, A comparison of document clustering techniques, KDD-2000 Workshop on Text Mining, 2000.; M. Steinbach, G. Karypis, V. Kumar, A comparison of document clustering techniques, KDD-2000 Workshop on Text Mining, 2000.
[26] C. Swamy, Correlation clustering: maximizing agreements via semidefinite programming, in: Proc. 15th Annu. ACM-SIAM Symp. on Discrete Algorithms, New Orleans, LA, 2004. Society for Industrial and Applied Mathematics, pp. 526-527.; C. Swamy, Correlation clustering: maximizing agreements via semidefinite programming, in: Proc. 15th Annu. ACM-SIAM Symp. on Discrete Algorithms, New Orleans, LA, 2004. Society for Industrial and Applied Mathematics, pp. 526-527. · Zbl 1318.68197
[27] Tardos, E.; Vazirani, V. V., Improved bounds for the max-flow min-multicut ratio for planar and \(K_{rr}\)-free graphs, Inform. Process. Lett., 47, 2, 77-80 (1993) · Zbl 0803.90056
[28] Vazirani, V. V., Approximation Algorithms (2001), Springer: Springer Berlin · Zbl 1138.90417
[29] K. Wagstaff, C. Cardie, Clustering with instance-level constraints, in: Proc. 17th Internat. Conf. on Machine Learning, 2000, pp. 1103-1110.; K. Wagstaff, C. Cardie, Clustering with instance-level constraints, in: Proc. 17th Internat. Conf. on Machine Learning, 2000, pp. 1103-1110.
[30] K. Wagstaff, C. Cardie, S. Rogers, S. Schroedl, Constrained \(K\); K. Wagstaff, C. Cardie, S. Rogers, S. Schroedl, Constrained \(K\)
[31] M. Yannakakis, P.C. Kanellakis, S.C. Cosmadakis, C.H. Papadimitriou, Cutting and partitioning a graph after a fixed pattern, in: Proc. 10th Internat. Colloq. on Automata, Languages and Programming, 1983, pp. 712 -722.; M. Yannakakis, P.C. Kanellakis, S.C. Cosmadakis, C.H. Papadimitriou, Cutting and partitioning a graph after a fixed pattern, in: Proc. 10th Internat. Colloq. on Automata, Languages and Programming, 1983, pp. 712 -722. · Zbl 0549.68062
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.