Non-parametric policy search with limited information loss

van Hoof, Herke and Neumann, Gerhard and Peters, Jan (2017) Non-parametric policy search with limited information loss. Journal of Machine Learning Research . ISSN 1532-4435

Documents
vanHoof_JMLR_2017.pdf
[img]
[Download]
[img]
Preview
PDF
vanHoof_JMLR_2017.pdf - Whole Document

3MB
Item Type:Article
Item Status:Live Archive

Abstract

Learning complex control policies from non-linear and redundant sensory input is an important
challenge for reinforcement learning algorithms. Non-parametric methods that
approximate values functions or transition models can address this problem, by adapting
to the complexity of the dataset. Yet, many current non-parametric approaches rely on
unstable greedy maximization of approximate value functions, which might lead to poor
convergence or oscillations in the policy update. A more robust policy update can be obtained
by limiting the information loss between successive state-action distributions. In this
paper, we develop a policy search algorithm with policy updates that are both robust and
non-parametric. Our method can learn non-parametric control policies for infinite horizon
continuous Markov decision processes with non-linear and redundant sensory representations.
We investigate how we can use approximations of the kernel function to reduce the
time requirements of the demanding non-parametric computations. In our experiments, we
show the strong performance of the proposed method, and how it can be approximated effi-
ciently. Finally, we show that our algorithm can learn a real-robot underpowered swing-up
task directly from image data.

Keywords:reinforcement learning, policy search
Subjects:G Mathematical and Computer Sciences > G760 Machine Learning
Divisions:College of Science > School of Computer Science
ID Code:28020
Deposited On:26 Jul 2017 13:18

Repository Staff Only: item control page