Pajarinen, J., Thai, H.L., Akrour, R. , Peters, J. and Neumann, Gerhard (2019) Compatible natural gradient policy search. Machine Learning . ISSN 1573-0565
Full content URL: https://doi.org/10.1007/s10994-019-05807-0
Documents |
|
![]() |
PDF
Pajarinen2019_Article_CompatibleNaturalGradientPolic.pdf - Whole Document Available under License Creative Commons Attribution 4.0 International. 7MB |
Item Type: | Article |
---|---|
Item Status: | Live Archive |
Abstract
Trust-region methods have yielded state-of-the-art results in policy search. A common approach is to use KL-divergence to bound the region of trust resulting in a natural gradient policy update. We show that the natural gradient and trust region optimization are equivalent if we use the natural parameterization of a standard exponential policy distribution in combination with compatible value function approximation. Moreover, we show that standard natural gradient updates may reduce the entropy of the policy according to a wrong schedule leading to premature convergence. To control entropy reduction we introduce a
new policy search method called compatible policy search (COPOS) which bounds entropy loss. The experimental results show that COPOS yields state-of-the-art results in challenging continuous control tasks and in discrete partially observable tasks.
Keywords: | Deep reinforcement learning |
---|---|
Subjects: | G Mathematical and Computer Sciences > G760 Machine Learning |
Divisions: | College of Science > School of Computer Science |
ID Code: | 36283 |
Deposited On: | 24 Jun 2019 08:48 |
Repository Staff Only: item control page