ORIE Colloquium

Qiaomin Xie Cornell University - ORIE
AlphaGo Zero, Monte Carlo Tree Search and Self-Play: Towards Theoretical Foundations

Tuesday, April 16, 2019 - 4:15pm
Rhodes 253

Description: AlphaGo Zero (AGZ) by Silver et al (2017) introduced a new reinforcement learning algorithm that has achieved superhuman performance in the games of Go, Chess, and Shogi with no prior knowledge other than the rules of the game. Methodically, there are two key innovations: (a) use of Monte-Carlo Tree Search (MCTS) with Supervised Learning (SL) through well-engineered Neural Network for learning the policy, and (b) use of self-play for generating training examples. The remarkable empirical success naturally raises the following question: can we explain and justify these two algorithmic ingredients theoretically? In this talk, we shall argue that MCTS with expressive enough SL learns the optimal solution at nearly minimax optimal rate. To the best of our knowledge, this is the first complete and rigorous non-asymptotic analysis of MCTS. In the process, we correct a fundamental error in the well cited MCTS algorithm and its proof. Interestingly enough, the AGZ had already utilized this correction in its implementation. Our results hold for both traditional setting of infinite horizon Markov Decision Process (MDP) as well as the setting of self-play modeled as Robust MDP. Beyond two player games where AGZ has been effective, we show empirical success for a representative application from network scheduling. This is primarily based on joint work with Devavrat Shah (MIT) and Zhi Xu (MIT). Bio: Qiaomin Xie is a visiting assistant processor in ORIE at Cornell University. Prior to that, she spent two years as a postdoctoral researcher with LIDS at MIT, and was a research fellow at the Simons Institute during Fall 2016. Qiaomin received her Ph.D. degree in electrical and computing engineering from University of Illinois Urbana-Champaign in 2016, and her B.E. degree in electronic engineering from Tsinghua University. Her research interests lie in the fields of applied probability, stochastic networks and reinforcement learning. She is the recipient of UIUC CSL PhD Thesis Award (2017) and the best paper award from IFIP Performance Conference (2011).