State-regularized policy search for linearized dynamical systems

Abdulsamad, Hany and Arenz, Oleg and Peters, Jan and Neumann, Gerhard (2017) State-regularized policy search for linearized dynamical systems. In: Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS), 18-23 June 2017, Pittsburgh, USA.

Abdulsamad_ICAPS_2017.pdf - Whole Document

Item Type:Conference or Workshop contribution (Paper)
Item Status:Live Archive


Trajectory-Centric Reinforcement Learning and Trajectory
Optimization methods optimize a sequence of feedbackcontrollers
by taking advantage of local approximations of
model dynamics and cost functions. Stability of the policy update
is a major issue for these methods, rendering them hard
to apply for highly nonlinear systems. Recent approaches
combine classical Stochastic Optimal Control methods with
information-theoretic bounds to control the step-size of the
policy update and could even be used to train nonlinear deep
control policies. These methods bound the relative entropy
between the new and the old policy to ensure a stable policy
update. However, despite the bound in policy space, the
state distributions of two consecutive policies can still differ
significantly, rendering the used local approximate models invalid.
To alleviate this issue we propose enforcing a relative
entropy constraint not only on the policy update, but also on
the update of the state distribution, around which the dynamics
and cost are being approximated. We present a derivation
of the closed-form policy update and show that our approach
outperforms related methods on two nonlinear and highly dynamic
simulated systems.

Keywords:Stochastic Optimal Control, Trust Region
Subjects:G Mathematical and Computer Sciences > G760 Machine Learning
H Engineering > H671 Robotics
Divisions:College of Science > School of Computer Science
ID Code:27055
Deposited On:28 Apr 2017 09:07

Repository Staff Only: item control page