Invited Talk by Dr. Prabuchandran K J on Sequential Decision Making under Uncertainty
In many real-world problems like inventory management, traffic signal control, distributed job scheduling and so on, one is often required to take decisions sequentially under uncertainty. Such decision processes are typically modeled as Markov Decision Processes (MDPs). In the setting of MDPs, the goal is to find a state dependent optimal sequence of actions that minimize a certain long-term performance criterion. The standard dynamic programming approach to solve an MDP for the optimal decisions requires a complete model of the MDP and is computationally infeasible for large state-action MDPs that are associated with many real-world applications. Reinforcement learning (RL) methods, on the other hand, are model-free simulation based approaches for solving MDPs and can scale to large state-action MDPs when applied in conjunction with function approximation technique. However, a solution based on RL methods with function approximation comes with the associated problem of choosing the right features for approximation. In this talk, we investigate the problem of choosing the right features for RL methods based on function approximation. In addition, we will briefly discuss how a global optimization method like Bayesian Optimization that leverages gradient information can be utilised for improving the efficiency of RL algorithms.
Dr Prabuchandran K.J. is an Amazon-IISc Postdoctoral scholar at IISc, Bangalore. He holds Ph.D. from Department of Computer Science and Automation, IISc in the area of Reinforcement Learning. Post his PhD, Prabuchandran worked as Research Scientist at IBM Research Labs, India for an year and half on change detection algorithms for multivariate compositional data. His research lies in the intersection of reinforcement learning, stochastic control & optimisation and stochastic approximation algorithms.