ORIE Colloquium

Xu HuanNational University of Singapore

Practicable robust Markov decision processes

Tuesday, November 10, 2015 - 4:15pm

Rhodes 253

Markov decision processes (MDP) is a standard modeling tool for sequential decision making in a dynamic and stochastic environment. When the model parameters are subject to uncertainty, the "optimal strategy" obtained from MDP can significantly under-perform than the model's prediction. To address this, robust MDP has been developed which is based on worst-case analysis. However, several restrictions of the robust MDP model prevent it from practical success, which I will address in this talk. The first restriction of standard robust MDP is that the modeling of uncertainty is not flexible and can lead to conservative solution. In particular, it requires that the uncertainty set is "rectangular", i.e., it is a Cartesian product of uncertainty sets of each state. To lift this assumption, we propose an uncertainty model which we call “k-rectangular" that generalizes the concept of rectangularity. This new class of uncertainty sets includes interesting examples such as the union of rectangular uncertainty set and the uncertainty set where the total number of deviation is restricted, and we show that this can be solved efficiently via state augmentation. The second restriction of standard robust MDP is that it does not take into account the learning issue, i.e., how to adapt the model in an efficient way to reduce the uncertainty. To address this, we propose an algorithm inspired by work in online learning, which can efficiently learn the level of uncertainty of each state and achieve an accumulated reward that is asymptotically optimal.