Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks

Eyben, F., Petridis, S., Schuller, B. , Tzimiropoulos, Georgios, Zafeiriou, S. and Pantic, M. (2011) Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks. In: Conference of 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, 22-27 May 2011, Prague.

Full content URL: http://www.scopus.com/inward/record.url?eid=2-s2.0...

Full text not available from this repository.

Item Type:Conference or Workshop contribution (Paper)
Item Status:Live Archive

Abstract

We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year's Paralinguistic Challenge's Audiovisual Interest Corpus of human-to-human natural conversation. For video-based analysis we compare shape and appearance based features. These are fused in an early manner with typical audio descriptors. The results show significant improvements of LSTM networks over a static approach based on Support Vector Machines. More important, we can show a significant gain in performance when fusing audio and visual shape features. © 2011 IEEE.

Additional Information:Conference Code: 85875
Keywords:Appearance based, Audio-visual Processing, Descriptors, Dynamic sequences, Laughter, Memory network, Non-linguistic Vocalisations, Shape features, Short term memory, Static approach, Brain, Classification (of information), Linguistics, Signal processing, Speech communication, Recurrent neural networks
Subjects:G Mathematical and Computer Sciences > G400 Computer Science
G Mathematical and Computer Sciences > G440 Human-computer Interaction
Divisions:College of Science > School of Computer Science
ID Code:8730
Deposited On:08 Apr 2013 13:48

Repository Staff Only: item control page