Akrour, R., Abdolmaleki, A., Abdulsamad, H. et al and Neumann, G.
(2016)
Model-free trajectory optimization for reinforcement learning.
In: 33rd International Conference on Machine Learning, 19 - 24 June 2016, New York.
Full content URL: http://proceedings.mlr.press/v48/akrour16.html
Model-free trajectory optimization for reinforcement learning | | ![[img]](http://eprints.lincoln.ac.uk/25747/1.hassmallThumbnailVersion/full_moto_16.pdf) [Download] |
|
![[img]](http://eprints.lincoln.ac.uk/25747/1.hassmallThumbnailVersion/full_moto_16.pdf)  Preview |
|
PDF
full_moto_16.pdf
- Whole Document
1MB |
Item Type: | Conference or Workshop contribution (Paper) |
---|
Item Status: | Live Archive |
---|
Abstract
Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update.
However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy.
In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation
of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system
dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics.
Repository Staff Only: item control page