Variational inference for policy search in changing situations

Neumann, Gerhard (2011) Variational inference for policy search in changing situations. In: 28th International Conference on Machine Learning (ICML-11), 28 June - 2 July 2011, Bellevue, Washington, USA.

Full content URL:


Item Type:Conference or Workshop contribution (Paper)
Item Status:Live Archive


Many policy search algorithms minimize the Kullback-Leibler (KL) divergence to a certain
target distribution in order to fit their policy. The commonly used KL-divergence forces the resulting
policy to be ’reward-attracted’. The policy tries to reproduce all positively rewarded experience
while negative experience is neglected. However, the KL-divergence is not symmetric
and we can also minimize the the reversed KL-divergence, which is typically used in variational
inference. The policy now becomes ’cost-averse’. It tries to avoid reproducing any negatively-rewarded experience while maximizing exploration. Due to this ’cost-averseness’ of the policy, Variational Inference for Policy Search (VIP) has several interesting properties. It requires no kernelbandwith nor exploration rate, such settings are
determined automatically by the inference. The algorithm meets the performance of state-of-theart
methods while being applicable to simultaneously learning in multiple situations. We concentrate on using VIP for policy search in robotics. We apply our algorithm to learn dynamic counterbalancing of different kinds of
pushes with human-like 2-link and 4-link robots.

Keywords:Policy Search, Variational Inference
Subjects:G Mathematical and Computer Sciences > G760 Machine Learning
Divisions:College of Science > School of Computer Science
Related URLs:
ID Code:25793
Deposited On:06 Apr 2017 13:39

Repository Staff Only: item control page