##
**Hypothesis testing for automated community detection in networks.**
*(English)*
Zbl 1411.62162

Summary: Community detection in networks is a key exploratory tool with applications in a diverse set of areas, ranging from finding communities in social and biological networks to identifying link farms in the World Wide Web. The problem of finding communities or clusters in a network has received much attention from statistics, physics and computer science. However, most clustering algorithms assume knowledge of the number of clusters \(k\). We propose to determine \(k\) automatically in a graph generated from a stochastic block model by using a hypothesis test of independent interest. Our main contribution is twofold; first, we theoretically establish the limiting distribution of the principal eigenvalue of the suitably centred and scaled adjacency matrix and use that distribution for our test of the hypothesis that a random graph is of Erdős-Rényi (noise) type. Secondly, we use this test to design a recursive bipartitioning algorithm, which naturally uncovers nested community structure. Using simulations and quantifiable classification tasks on real world networks with ground truth, we show that our algorithm outperforms state of the art methods.

### MSC:

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |

62H15 | Hypothesis testing in multivariate analysis |

62E20 | Asymptotic distribution theory in statistics |

60B20 | Random matrices (probabilistic aspects) |

05C82 | Small world graphs, complex networks (graph-theoretic aspects) |