Statistics Seminar
Fitting a rank-r matrix to noisy data arise in many applications. The popular approach of nuclear/trace norm minimization, while in principle polynomial-time computable with strong (sometimes minimax optimal) statistical guarantees, is often computationally infeasible for large problems. In this talk, we consider a scalable approach via projected gradient descent over the (non-convex) space of n-by-r matrices. We develop a unified framework characterizing the convergence of such methods as well as the statistical properties of the resulting fixed point. Our results apply to a broad range of concrete models, including noisy matrix sensing, matrix completion with real and one-bit observations, matrix decomposition, structured PCA and graph clustering problems. For these problems non-convex projected gradient descent runs in near linear time, and provides statistical guarantees that match (and sometimes better than) the best known results provided by convex relaxation methods.