Convex optimization for the densest subgraph and densest submatrix problems. (English) Zbl 1459.90173

Summary: We consider the densest \(k\)-subgraph problem, which seeks to identify the \(k\)-node subgraph of a given input graph with maximum number of edges. This problem is well-known to be NP-hard, by reduction to the maximum clique problem. We propose a new convex relaxation for the densest \(k\)-subgraph problem, based on a nuclear norm relaxation of a low-rank plus sparse decomposition of the adjacency matrices of \(k\)-node subgraphs to partially address this intractability. We establish that the densest \(k\)-subgraph can be recovered with high probability from the optimal solution of this convex relaxation if the input graph is randomly sampled from a distribution of random graphs constructed to contain an especially dense \(k\)-node subgraph with high probability. Specifically, the relaxation is exact when the edges of the input graph are added independently at random, with edges within a particular \(k\)-node subgraph added with higher probability than other edges in the graph. We provide a sufficient condition on the size of this subgraph \(k\) and the expected density under which the optimal solution of the proposed relaxation recovers this \(k\)-node subgraph with high probability. Further, we propose a first-order method for solving this relaxation based on the alternating direction method of multipliers, and empirically confirm our predicted recovery thresholds using simulations involving randomly generated graphs, as well as graphs drawn from social and collaborative networks.


90C27 Combinatorial optimization
90C35 Programming involving graphs or networks
Full Text: DOI arXiv


[1] Karp, R., Reducibility among combinatorial problems, Complexity of Computer Computations, 40, 4, 85-103 (1972)
[2] Alon N, Arora S, Manokaran R, Moshkovitz D, Weinstein O (2011) Inapproximability of densest κ-subgraph from average case hardness. Unpublished manuscript 1
[3] Feige U (2002) Relations between average case complexity and approximation complexity. In: Proceedings of the thiry-fourth annual ACM symposium on theory of computing. ACM, pp 534-543 · Zbl 1192.68358
[4] Khot, S., Ruling out PTAS for graph min-bisection, dense k-subgraph, and bipartite clique, SIAM J Comput, 36, 4, 1025-1071 (2006) · Zbl 1127.68035
[5] Henzinger MR, Motwani R, Silverstein C (2002) Challenges in web search engines. In: ACM SIGIR forum, vol 36. ACM, pp 11-22
[6] Gibson D, Kumar R, Tomkins A (2005) Discovering large dense subgraphs in massive graphs. In: Böhm K, Jensen C S, Haas L M, Kersten M L, Larson P, Ooi BC (eds) Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30 - September 2, 2005. ACM, pp 721-732
[7] Angel, A.; Koudas, N.; Sarkas, N.; Srivastava, D., Dense subgraph maintenance under streaming edge weight updates for real-time story identification, Proc. VLDB Endow., 5, 6, 574-585 (2012)
[8] Gajewar A, Das Sarma A (2012) Multi-skill collaborative teams based on densest subgraphs. In: Proceedings of the 2012 SIAM international conference on data mining. SIAM, pp 165-176
[9] Tsourakakis C (2015) The k-clique densest subgraph problem. In: Proceedings of the 24th international conference on world wide web. International World Wide Web Conferences Steering Committee, pp 1122-1132
[10] Tsourakakis C, Bonchi F, Gionis A, Gullo F, Tsiarli M (2013) Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 104-112
[11] Abbe, E.; Bandeira, A.; Hall, G., Exact recovery in the stochastic block model, IEEE Trans Inf Theory, 62, 1, 471-487 (2016) · Zbl 1359.94047
[12] Ailon N, Chen Y, Xu H (2013) Breaking the small cluster barrier of graph clustering. In: International conference on machine learning, pp 995-1003
[13] Ames, B.; Vavasis, S., Convex optimization for the planted k-disjoint-clique problem, Math Program, 143, 1-2, 299-337 (2014) · Zbl 1291.90194
[14] Ames, B., Guaranteed clustering and biclustering via semidefinite programming, Math Program, 147, 1-2, 429-465 (2014) · Zbl 1297.90107
[15] Amini, A.; Levina, E., On semidefinite relaxations for the block model, Ann Stat, 46, 1, 149-179 (2018) · Zbl 1393.62021
[16] Cai, T.; Li, X., Robust and computationally feasible community detection in the presence of arbitrary outlier nodes, Ann Stat, 43, 3, 1027-1059 (2015) · Zbl 1328.62381
[17] Chen, Y.; Jalali, A.; Sanghavi, S.; Xu, H., Clustering partially observed graphs via convex optimization, The Journal of Machine Learning Research, 15, 1, 2213-2238 (2014) · Zbl 1319.62123
[18] Chen, Y.; Sanghavi, S.; Xu, H., Improved graph clustering, IEEE Trans Inf Theory, 60, 10, 6440-6455 (2014) · Zbl 1360.94499
[19] Chen Y, Xu J (2014) Statistical-computational phase transitions in planted models: the high-dimensional setting. In: International conference on machine learning, pp 244-252
[20] Guédon O, Vershynin R (2015) Community detection in sparse networks via Grothendieck’s inequality. Probab Theory Relat Fields, pp 1-25 · Zbl 1357.90111
[21] Hajek B, Wu Y, Xu J (2015) Achieving exact cluster recovery threshold via semidefinite programming. In: IEEE international symposium on information theory. IEEE, pp 1442-1446
[22] Lei, J.; Rinaldo, A., Consistency of spectral clustering in stochastic block models, Ann Stat, 43, 1, 215-237 (2015) · Zbl 1308.62041
[23] Mathieu C, Schudy W (2010) Correlation clustering with noisy input. In: ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 712-728 · Zbl 1288.68197
[24] Nellore, A.; Ward, R., Recovery guarantees for exemplar-based clustering, Inf Comput, 245, 165-180 (2015) · Zbl 1333.62165
[25] Oymak S, Hassibi B (2011) Finding dense clusters via “low rank + sparse” decomposition. arXiv:1104.5186
[26] Rohe, K.; Chatterjee, S.; Yu, B., Spectral clustering and the high-dimensional stochastic blockmodel, Ann Stat, 39, 4, 1878-1915 (2011) · Zbl 1227.62042
[27] Qin T, Rohe K (2013) Regularized spectral clustering under the degree-corrected stochastic blockmodel. In: Advances in neural information processing systems, pp 3120-3128
[28] Vinayak R, Oymak S, Hassibi B (2014) Sharp performance bounds for graph clustering via convex optimization. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 8297-8301
[29] Holland, PW; Laskey, KB; Leinhardt, S., Stochastic blockmodels: first steps, Social networks, 5, 2, 109-137 (1983)
[30] Li X, Chen Y, Xu J (2018) Convex relaxation methods for community detection. arXiv:1810.00315
[31] Ames, B.; Vavasis, SA, Nuclear norm minimization for the planted clique and biclique problems, Mathematical Programming, 129, 1, 69-89 (2011) · Zbl 1271.90056
[32] Ames, B., Guaranteed recovery of planted cliques and dense subgraphs by convex relaxation, J Optim Theory Appl, 167, 2, 653-675 (2015) · Zbl 1327.90189
[33] Bomze IM, Budinich M, Pardalos PM, Pelillo M (1999) The maximum clique problem. In: Handbook of combinatorial optimization. Springer, pp 1-74 · Zbl 1253.90188
[34] Pardalos, PM; Xue, J., The maximum clique problem, Journal of global Optimization, 4, 3, 301-328 (1994) · Zbl 0797.90108
[35] Deshpande, Y.; Montanari, A., Finding hidden cliques of size \(\sqrt{N/e} N\)/e in nearly linear time, Found Comput Math, 15, 4, 1069-1128 (2015) · Zbl 1347.05227
[36] Montanari, A., Finding one community in a sparse graph, J Stat Phys, 161, 2, 273-299 (2015) · Zbl 1327.82091
[37] Hajek, B.; Wu, Y.; Xu, J., Information limits for recovering a hidden community, IEEE Trans Inf Theory, 63, 8, 4729-4745 (2017) · Zbl 1372.94364
[38] Hajek B, Wu Y, Xu J (2016) Semidefinite programs for exact recovery of a hidden community. In: Conference on learning theory, pp 1051-1095
[39] Saad H, Nosratinia A (2018) Belief propagation with side information for recovering a single community. In: 2018 IEEE international symposium on information theory (ISIT). IEEE, pp 1271-1275
[40] Hajek, B.; Wu, Y.; Xu, J., Recovering a hidden community beyond the Kesten-Stigum threshold in O(|E|log∗|V |) time, J Appl Probab, 55, 2, 325-352 (2018) · Zbl 1401.62085
[41] Hajek, B.; Wu, Y.; Xu, J., Submatrix localization via message passing, J Machine Learn Res, 18, 1, 6817-6868 (2017)
[42] Banks J, Moore C, Vershynin R, Verzelen N, Xu J (2018) Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization. IEEE Trans Inform Theory · Zbl 1401.94065
[43] Brennan M, Bresler G, Huleihel W (2019) Universality of computational lower bounds for submatrix detection. arXiv:1902.06916
[44] Chen Y, Sanghavi S, Xu H (2012) Clustering sparse graphs. In: Advances in neural information processing systems, pp 2204-2212
[45] Recht, B.; Fazel, M.; Parrilo, PA, Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, SIAM Rev, 52, 3, 471-501 (2010) · Zbl 1198.90321
[46] Alon, N.; Krivelevich, M.; Sudakov, B., Finding a large hidden clique in a random graph, Random Structures and Algorithms, 13, 3-4, 457-466 (1998) · Zbl 0959.05082
[47] Feige U, Ron D (2010) Finding hidden cliques in linear time. In: 21st international meeting on probabilistic, combinatorial, and asymptotic methods in the analysis of algorithms (AofA’10). Discrete Mathematics and Theoretical Computer Science, pp 189-204 · Zbl 1355.05186
[48] Dekel, Y.; Gurel-Gurevich, O.; Peres, Y., Finding hidden cliques in linear time with high probability, Comb Probab Comput, 23, 1, 29-49 (2014) · Zbl 1290.05129
[49] Juels, A.; Peinado, M., Hiding cliques for cryptographic security, Des Codes Crypt, 20, 3, 269-280 (2000) · Zbl 0965.94015
[50] Golub G, Van Loan C (1996) Matrix computations. Johns Hopkins University Press · Zbl 0865.65009
[51] Boucheron S, Lugosi G, Massart P (2013) Concentration inequalities: a nonasymptotic theory of independence. Oxford University Press · Zbl 1279.60005
[52] Bandeira AS, van Handel R (2014) Sharp nonasymptotic bounds on the norm of random matrices with independent entries. Annals of Probability to appear · Zbl 1372.60004
[53] Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J., Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends®;, in Machine learning, 3, 1, 1-122 (2011) · Zbl 1229.90122
[54] Bader DA, Meyerhenke H, Sanders P, Wagner D (2012) Graph partitioning and graph clustering. In: 10th DIMACS implementation challenge workshop · Zbl 1262.05001
[55] Bader DA, Meyerhenke H, Sanders P, Schulz C, Kappes A, Wagner D (2014) Benchmarking for graph clustering and partitioning. Encyclopedia of Social Network Analysis and Mining, pp 73-82
[56] Gleiser, P.; Danoni, L., Community structure in jazz, Advances in Complex Systems, 6, 4, 565-573 (2003)
[57] Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. In: Third international AAAI conference on weblogs and social media
[58] Jacomy, M.; Venturini, T.; Heymann, S.; Bastian, M., ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the Gephi software, PloS one, 9, 6, e98,679 (2014)
[59] Tsourakakis C, Bonch F (2013) Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. KDD ’13 Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 104-112
[60] Sotirov R (2019) On solving the densest k-subgraph problem on large graphs. arXiv:1901.06344
[61] Tropp, JA, An introduction to matrix concentration inequalities, Foundations and Trends®;, in Machine Learning, 8, 1-2, 1-230 (2015) · Zbl 1391.15071
[62] Tropp, JA, User-friendly tail bounds for sums of random matrices, Foundations of Computational Mathematics, 12, 4, 389-434 (2012) · Zbl 1259.60008
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.