Audio Embeddings Help to Learn Better Dialogue Policies

Lopez Zorrilla, Asier, Torres, M. Ines and Cuayahuitl, Heriberto (2021) Audio Embeddings Help to Learn Better Dialogue Policies. In: IEEE Automatic Speech Recognition and Understanding.

Audio Embeddings Help to Learn Better Dialogue Policies
Authors' Accepted Manuscript
ASRU2021-final.pdf - Whole Document

Item Type:Conference or Workshop contribution (Paper)
Item Status:Live Archive


Neural transformer architectures have gained a lot of interest for text-based dialogue management in the last few years. They have shown high learning capabilities for open domain dialogue with huge amounts of data and also for domain adaptation in task-oriented setups. But the potential benefits of exploiting the users' audio signal have rarely been explored in such frameworks. In this work, we combine text dialogue history representations generated by a GPT-2 model with audio embeddings obtained by the recently released Wav2Vec2 transformer model. We jointly fine-tune these models to learn dialogue policies via supervised learning and two policy gradient-based reinforcement learning algorithms. Our experimental results, using the DSTC2 dataset and a simulated user model capable of sampling audio turns, reveal that audio embeddings lead to overall higher task success (than without using audio embeddings) with statistically significant results across evaluation metrics and training algorithms.

Keywords:Spoken dialogue systems, Audio Embeddings, Reinforcement learning
Subjects:G Mathematical and Computer Sciences > G700 Artificial Intelligence
G Mathematical and Computer Sciences > G760 Machine Learning
G Mathematical and Computer Sciences > G710 Speech and Natural Language Processing
Divisions:College of Science > School of Computer Science
ID Code:46800
Deposited On:18 Oct 2021 08:14

Repository Staff Only: item control page