Fitted Q-iteration by advantage weighted regression

Neumann, Gerhard and Peters, Jan (2009) Fitted Q-iteration by advantage weighted regression. In: Advances in Neural Information Processing Systems 22 (NIPS 2008), 8-11 December 2008, Vancouver, BC, Canada.

Full content URL: https://www.scopus.com/inward/record.uri?eid=2-s2....

Documents
neumann_NIPS2008.pdf
[img]
[Download]
[img]
Preview
PDF
neumann_NIPS2008.pdf - Whole Document

147kB
Item Type:Conference or Workshop contribution (Paper)
Item Status:Live Archive

Abstract

Recently, fitted Q-iteration (FQI) based methods have become more popular due
to their increased sample efficiency, a more stable learning process and the higher
quality of the resulting policy. However, these methods remain hard to use for continuous
action spaces which frequently occur in real-world tasks, e.g., in robotics
and other technical applications. The greedy action selection commonly used for
the policy improvement step is particularly problematic as it is expensive for continuous
actions, can cause an unstable learning process, introduces an optimization
bias and results in highly non-smooth policies unsuitable for real-world systems.
In this paper, we show that by using a soft-greedy action selection the policy
improvement step used in FQI can be simplified to an inexpensive advantage weighted
regression. With this result, we are able to derive a new, computationally
efficient FQI algorithm which can even deal with high dimensional action spaces.

Keywords:Reinforcement Learning
Subjects:G Mathematical and Computer Sciences > G760 Machine Learning
Divisions:College of Science > School of Computer Science
ID Code:25796
Deposited On:28 Jul 2017 07:59

Repository Staff Only: item control page