Hypothesis testing for automated community detection in networks. (English) Zbl 1411.62162

Summary: Community detection in networks is a key exploratory tool with applications in a diverse set of areas, ranging from finding communities in social and biological networks to identifying link farms in the World Wide Web. The problem of finding communities or clusters in a network has received much attention from statistics, physics and computer science. However, most clustering algorithms assume knowledge of the number of clusters \(k\). We propose to determine \(k\) automatically in a graph generated from a stochastic block model by using a hypothesis test of independent interest. Our main contribution is twofold; first, we theoretically establish the limiting distribution of the principal eigenvalue of the suitably centred and scaled adjacency matrix and use that distribution for our test of the hypothesis that a random graph is of Erdős-Rényi (noise) type. Secondly, we use this test to design a recursive bipartitioning algorithm, which naturally uncovers nested community structure. Using simulations and quantifiable classification tasks on real world networks with ground truth, we show that our algorithm outperforms state of the art methods.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H15 Hypothesis testing in multivariate analysis
62E20 Asymptotic distribution theory in statistics
60B20 Random matrices (probabilistic aspects)
05C82 Small world graphs, complex networks (graph-theoretic aspects)
Full Text: DOI arXiv