Neumann, Gerhard
(2011)
Variational inference for policy search in changing situations.
In: 28th International Conference on Machine Learning (ICML-11), 28 June - 2 July 2011, Bellevue, Washington, USA.
Full content URL: https://www.scopus.com/inward/record.uri?eid=2-s2....
![[img]](http://eprints.lincoln.ac.uk/25793/1.hassmallThumbnailVersion/441_icmlpaper.pdf)  Preview |
|
PDF
441_icmlpaper.pdf
282kB |
Item Type: | Conference or Workshop contribution (Paper) |
---|
Item Status: | Live Archive |
---|
Abstract
Many policy search algorithms minimize the Kullback-Leibler (KL) divergence to a certain
target distribution in order to fit their policy. The commonly used KL-divergence forces the resulting
policy to be ’reward-attracted’. The policy tries to reproduce all positively rewarded experience
while negative experience is neglected. However, the KL-divergence is not symmetric
and we can also minimize the the reversed KL-divergence, which is typically used in variational
inference. The policy now becomes ’cost-averse’. It tries to avoid reproducing any negatively-rewarded experience while maximizing exploration. Due to this ’cost-averseness’ of the policy, Variational Inference for Policy Search (VIP) has several interesting properties. It requires no kernelbandwith nor exploration rate, such settings are
determined automatically by the inference. The algorithm meets the performance of state-of-theart
methods while being applicable to simultaneously learning in multiple situations. We concentrate on using VIP for policy search in robotics. We apply our algorithm to learn dynamic counterbalancing of different kinds of
pushes with human-like 2-link and 4-link robots.
Repository Staff Only: item control page