Center for Applied Mathematics Colloquium
The problem of graph clustering and community detection concerns identifying densely connected groups of nodes in a network. In this talk we show that there is a tradeoff between computational (running-time) and statistical (robustness) considerations: by using computationally more expensive algorithms, one can achieve better statistical performance, and vice versa. In particular, we consider the classical planted partition model (a.k.a. stochastic block model), and show that the problem space can be partitioned into four regimes — "simple", "easy", "hard", and "impossible" — which correspond to progressively noisier problems. (1) A near-linear time algorithm succeeds in the “simple” regime, but provably fails in the "easy" and "hard" regimes; (2) a polynomial time algorithm succeeds in the "easy" regime, but provably fails in the "hard" regime; (3) a super-polynomial time algorithm succeeds in the "hard" regime, for which no polynomial time algorithm is known; (4) all algorithms fail in the "impossible" regime. Our results hold even with an unbounded number of small clusters.