Audio Embedding-Aware Dialogue Policy Learning

Lopez Zorrilla, Asier, Torres, M. Ines and Cuayahuitl, Heriberto (2022) Audio Embedding-Aware Dialogue Policy Learning. IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 31 . pp. 525-538. ISSN 1558-7916, 2329-9290

Full content URL:

Audio Embedding-Aware Dialogue Policy Learning
Authors' manuscript

Item Type:Article
Item Status:Live Archive


Following the success of Natural Language Processing (NLP) transformers pretrained via self-supervised learning, similar models have been proposed recently for speech processing such as Wav2Vec2, HuBERT and UniSpeech-SAT. An interesting yet unexplored area of application of these models is Spoken Dialogue Systems, where the users’ audio signals are typically just mapped to word-level features derived from an Automatic Speech Recogniser (ASR), and then processed using NLP techniques to generate system responses. This paper reports a comprehensive comparison of dialogue policies trained using ASR-based transcriptions and extended with the aforementioned audio processing transformers in the DSTC2 task. Whilst our dialogue policies are trained with supervised and policy-based deep reinforcement learning, they are assessed using both automatic task completion metrics and a human evaluation. Our results reveal that using audio embeddings is more beneficial than detrimental in most of our trained dialogue policies, and that the benefits are stronger for supervised learning than reinforcement learning.

Keywords:spoken dialogue systems, audio embeddings, transformer neural networks, deep reinforcement learning
Subjects:G Mathematical and Computer Sciences > G700 Artificial Intelligence
G Mathematical and Computer Sciences > G710 Speech and Natural Language Processing
G Mathematical and Computer Sciences > G760 Machine Learning
Divisions:College of Science > School of Computer Science
ID Code:52689
Deposited On:16 Jan 2023 15:42

Repository Staff Only: item control page