Policy evaluation with temporal differences: a survey and comparison

Dann, C., Neumann, G. and Peters, J. (2014) Policy evaluation with temporal differences: a survey and comparison. Journal of Machine Learning Research, 15 . pp. 809-883. ISSN 1532-4435

Full content URL: http://jmlr.org/papers/volume15/dann14a/dann14a.pd...

dann14a.pdf - Whole Document

Item Type:Article
Item Status:Live Archive


Policy evaluation is an essential step in most reinforcement learning approaches. It yields a value function, the quality assessment of states for a given policy, which can be used in a policy improvement step. Since the late 1980s, this research area has been dominated by temporal-difference (TD) methods due to their data-efficiency. However, core issues such as stability guarantees in the off-policy scenario, improved sample efficiency and probabilistic treatment of the uncertainty in the estimates have only been tackled recently, which has led to a large number of new approaches.

This paper aims at making these new developments accessible in a concise overview, with foci on underlying cost functions, the off-policy scenario as well as on regularization in high dimensional feature spaces. By presenting the first extensive, systematic comparative evaluations comparing TD, LSTD, LSPE, FPKF, the residual- gradient algorithm, Bellman residual minimization, GTD, GTD2 and TDC, we shed light on the strengths and weaknesses of the methods. Moreover, we present alternative versions of LSTD and LSPE with drastically improved off-policy performance.

Keywords:Policy Evaluation, Temporal Difference Learning, Reinforcement Learning, JCOpen
Subjects:G Mathematical and Computer Sciences > G760 Machine Learning
Divisions:College of Science > School of Computer Science
Related URLs:
ID Code:25768
Deposited On:17 Jan 2017 15:50

Repository Staff Only: item control page