Abdulsamad, Hany, Arenz, Oleg, Peters, Jan et al and Neumann, Gerhard
(2017)
State-regularized policy search for linearized dynamical systems.
In: Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS), 18-23 June 2017, Pittsburgh, USA.
![[img]](http://eprints.lincoln.ac.uk/27055/3.hassmallThumbnailVersion/Abdulsamad_ICAPS_2017.pdf)  Preview |
|
PDF
Abdulsamad_ICAPS_2017.pdf
- Whole Document
220kB |
Item Type: | Conference or Workshop contribution (Paper) |
---|
Item Status: | Live Archive |
---|
Abstract
Trajectory-Centric Reinforcement Learning and Trajectory
Optimization methods optimize a sequence of feedbackcontrollers
by taking advantage of local approximations of
model dynamics and cost functions. Stability of the policy update
is a major issue for these methods, rendering them hard
to apply for highly nonlinear systems. Recent approaches
combine classical Stochastic Optimal Control methods with
information-theoretic bounds to control the step-size of the
policy update and could even be used to train nonlinear deep
control policies. These methods bound the relative entropy
between the new and the old policy to ensure a stable policy
update. However, despite the bound in policy space, the
state distributions of two consecutive policies can still differ
significantly, rendering the used local approximate models invalid.
To alleviate this issue we propose enforcing a relative
entropy constraint not only on the policy update, but also on
the update of the state distribution, around which the dynamics
and cost are being approximated. We present a derivation
of the closed-form policy update and show that our approach
outperforms related methods on two nonlinear and highly dynamic
simulated systems.
Repository Staff Only: item control page