ORIE Colloquium
Partially observable Markov decision process (POMDP) is a fundamental model for sequential decision making. By assuming partial observability of the state of an underlying system, the model captures a key feature of many real-world problems but is notoriously difficult to solve. Almost all existing methods aiming at an exact solution are rooted in the (primal) Bayesian framework—decisions are made based on the belief (or distribution) of the underlying state and the belief is updated dynamically according to Bayes’ rule. This talk will present a novel dual framework for POMDPs with discounted rewards and finite state, action, and observation sets. The technique involves characterizing the frontier of value-to-go vectors in each period through backward induction, which eliminates the need for forward belief updating. This new framework facilitates POMDP research on two fronts. First, it transforms and decomposes the POMDP problem into well-known computational geometry problems—finding the convex hull of a set of points and the Minkowski sum of a set of convex polytopes. Consequently, state-of-the-art techniques in computational geometry can be redirected to tackle the POMDP problem more efficiently. Second, the dual framework enables analytical solutions for some stylized POMDPs and helps the exploration of structural properties of optimal policies. The idea behind the dual framework may also apply to other problems in machine learning and artificial intelligence.