Sample and feedback efficient hierarchical reinforcement learning from human preferences

Pinsler, R. and Akrour, R. and Osa, T. and Peters, J. and Neumann, G. (2018) Sample and feedback efficient hierarchical reinforcement learning from human preferences. In: IEEE International Conference on Robotics and Automation (ICRA), 21 - 25 May 2018, Brisbane.

icra18_robert.pdf - Whole Document

Item Type:Conference or Workshop contribution (Paper)
Item Status:Live Archive


While reinforcement learning has led to promising results in robotics, defining an informative reward function can sometimes prove to be challenging. Prior work considered including the human in the loop to jointly learn the reward function and the optimal policy. Generating samples from a physical robot and requesting human feedback are both taxing efforts for which efficiency is critical. In contrast to prior work, in this paper we propose to learn reward functions from both the robot and the human perspectives in order to improve on both efficiency metrics. On one side, learning a reward function from the human perspective increases feedback efficiency by assuming that humans rank trajectories according to an outcome space of reduced dimensionaltiy. On the other side, learning a reward function from the robot perspective circumvents the need for learning a dynamics model while retaining the sample efficiency of model-based approaches. We provide an algorithm that incorporates bi-perspective reward learning into a general hierarchical reinforcement learning framework and demonstrate the merits of our approach on a toy task and a simulated robot grasping task.

Keywords:grasping and manipulation, reinforcement learning
Subjects:H Engineering > H671 Robotics
Divisions:College of Science > School of Computer Science
Related URLs:
ID Code:31675
Deposited On:17 Apr 2018 11:13

Repository Staff Only: item control page