ORIE Colloquium
We consider the classical joint pricing and inventory control problem with lost-sales and censored demand in which the customer’s response to selling price and the demand distribution are not known a priori, and the only available information for decision-making is the past sales data. Conventional approaches, such as stochastic approximation, online convex optimization, and continuum-armed bandit algorithms, cannot be employed since neither the realized values of the profit function nor its derivatives are known. A major difficulty of this problem lies in the fact that the estimated profit function from observed sales data is multimodal even when the expected profit function is concave. We develop a nonparametric data-driven algorithm that actively integrates exploration and exploitation through carefully designed cycles. The algorithm searches the decision space through a sparse discretization scheme to jointly learn and optimize a multimodal (sampled) profit function, and corrects the estimation biases caused by demand censoring. We show that the algorithm converges to the optimal policy as the planning horizon increases, and obtain the convergence rate of regret.