Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks

Eyben, F. and Petridis, S. and Schuller, B. and Tzimiropoulos, Georgios and Zafeiriou, S. and Pantic, M. (2011) Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks. In: Conference of 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, 22-27 May 2011, Prague.

Full content URL: http://www.scopus.com/inward/record.url?eid=2-s2.0...

Full text not available from this repository.

Item Type:Conference or Workshop contribution (Paper)
Item Status:Live Archive

Abstract

We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year's Paralinguistic Challenge's Audiovisual Interest Corpus of human-to-human natural conversation. For video-based analysis we compare shape and appearance based features. These are fused in an early manner with typical audio descriptors. The results show significant improvements of LSTM networks over a static approach based on Support Vector Machines. More important, we can show a significant gain in performance when fusing audio and visual shape features. © 2011 IEEE.

Additional Information:Conference Code: 85875
Keywords:Appearance based, Audio-visual Processing, Descriptors, Dynamic sequences, Laughter, Memory network, Non-linguistic Vocalisations, Shape features, Short term memory, Static approach, Brain, Classification (of information), Linguistics, Signal processing, Speech communication, Recurrent neural networks
Subjects:G Mathematical and Computer Sciences > G400 Computer Science
G Mathematical and Computer Sciences > G440 Human-computer Interaction
Divisions:College of Science > School of Computer Science
ID Code:8730
Deposited On:08 Apr 2013 13:48

Repository Staff Only: item control page