Lioutikov, R., Paraschos, A., Peters, J. et al and Neumann, G.
(2014)
Sample-based information-theoretic stochastic optimal control.
In: Proceedings of 2014 IEEE International Conference on Robotics and Automation, 31 May - 7 June 2014, Hong Kong.
![[img]](http://eprints.lincoln.ac.uk/25771/1.hassmallThumbnailVersion/Lioutikov_ICRA_2014.pdf)  Preview |
|
PDF
Lioutikov_ICRA_2014.pdf
- Whole Document
2MB |
Item Type: | Conference or Workshop contribution (Paper) |
---|
Item Status: | Live Archive |
---|
Abstract
Many Stochastic Optimal Control (SOC) approaches
rely on samples to either obtain an estimate of the
value function or a linearisation of the underlying system model.
However, these approaches typically neglect the fact that the
accuracy of the policy update depends on the closeness of the
resulting trajectory distribution to these samples. The greedy
operator does not consider such closeness constraint to the
samples. Hence, the greedy operator can lead to oscillations
or even instabilities in the policy updates. Such undesired
behaviour is likely to result in an inferior performance of the
estimated policy. We reuse inspiration from the reinforcement
learning community and relax the greedy operator used in SOC
with an information theoretic bound that limits the ‘distance’ of
two subsequent trajectory distributions in a policy update. The
introduced bound ensures a smooth and stable policy update.
Our method is also well suited for model-based reinforcement
learning, where we estimate the system dynamics model from
data. As this model is likely to be inaccurate, it might be
dangerous to exploit the model greedily. Instead, our bound
ensures that we generate new data in the vicinity of the current
data, such that we can improve our estimate of the system
dynamics model. We show that our approach outperforms
several state of the art approaches on challenging simulated
robot control tasks.
Repository Staff Only: item control page