Dann, C., Neumann, G. and Peters, J. (2014) Policy evaluation with temporal differences: a survey and comparison. Journal of Machine Learning Research, 15 . pp. 809-883. ISSN 1532-4435
Full content URL: http://jmlr.org/papers/volume15/dann14a/dann14a.pd...
Documents |
|
|
PDF
dann14a.pdf - Whole Document 1MB |
Item Type: | Article |
---|---|
Item Status: | Live Archive |
Abstract
Policy evaluation is an essential step in most reinforcement learning approaches. It yields a value function, the quality assessment of states for a given policy, which can be used in a policy improvement step. Since the late 1980s, this research area has been dominated by temporal-difference (TD) methods due to their data-efficiency. However, core issues such as stability guarantees in the off-policy scenario, improved sample efficiency and probabilistic treatment of the uncertainty in the estimates have only been tackled recently, which has led to a large number of new approaches.
This paper aims at making these new developments accessible in a concise overview, with foci on underlying cost functions, the off-policy scenario as well as on regularization in high dimensional feature spaces. By presenting the first extensive, systematic comparative evaluations comparing TD, LSTD, LSPE, FPKF, the residual- gradient algorithm, Bellman residual minimization, GTD, GTD2 and TDC, we shed light on the strengths and weaknesses of the methods. Moreover, we present alternative versions of LSTD and LSPE with drastically improved off-policy performance.
Keywords: | Policy Evaluation, Temporal Difference Learning, Reinforcement Learning, JCOpen |
---|---|
Subjects: | G Mathematical and Computer Sciences > G760 Machine Learning |
Divisions: | College of Science > School of Computer Science |
Related URLs: | |
ID Code: | 25768 |
Deposited On: | 17 Jan 2017 15:50 |
Repository Staff Only: item control page