Neumann, Gerhard and Peters, Jan
(2009)
Fitted Q-iteration by advantage weighted regression.
In: Advances in Neural Information Processing Systems 22 (NIPS 2008), 8-11 December 2008, Vancouver, BC, Canada.
Full content URL: https://www.scopus.com/inward/record.uri?eid=2-s2....
![[img]](http://eprints.lincoln.ac.uk/25796/1.hassmallThumbnailVersion/neumann_NIPS2008.pdf)  Preview |
|
PDF
neumann_NIPS2008.pdf
- Whole Document
147kB |
Item Type: | Conference or Workshop contribution (Paper) |
---|
Item Status: | Live Archive |
---|
Abstract
Recently, fitted Q-iteration (FQI) based methods have become more popular due
to their increased sample efficiency, a more stable learning process and the higher
quality of the resulting policy. However, these methods remain hard to use for continuous
action spaces which frequently occur in real-world tasks, e.g., in robotics
and other technical applications. The greedy action selection commonly used for
the policy improvement step is particularly problematic as it is expensive for continuous
actions, can cause an unstable learning process, introduces an optimization
bias and results in highly non-smooth policies unsuitable for real-world systems.
In this paper, we show that by using a soft-greedy action selection the policy
improvement step used in FQI can be simplified to an inexpensive advantage weighted
regression. With this result, we are able to derive a new, computationally
efficient FQI algorithm which can even deal with high dimensional action spaces.
Repository Staff Only: item control page