Eyben, F., Petridis, S., Schuller, B. , Tzimiropoulos, Georgios, Zafeiriou, S. and Pantic, M. (2011) Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks. In: Conference of 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, 22-27 May 2011, Prague.
Full content URL: http://www.scopus.com/inward/record.url?eid=2-s2.0...
Full text not available from this repository.
Item Type: | Conference or Workshop contribution (Paper) |
---|---|
Item Status: | Live Archive |
Abstract
We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year's Paralinguistic Challenge's Audiovisual Interest Corpus of human-to-human natural conversation. For video-based analysis we compare shape and appearance based features. These are fused in an early manner with typical audio descriptors. The results show significant improvements of LSTM networks over a static approach based on Support Vector Machines. More important, we can show a significant gain in performance when fusing audio and visual shape features. © 2011 IEEE.
Additional Information: | Conference Code: 85875 |
---|---|
Keywords: | Appearance based, Audio-visual Processing, Descriptors, Dynamic sequences, Laughter, Memory network, Non-linguistic Vocalisations, Shape features, Short term memory, Static approach, Brain, Classification (of information), Linguistics, Signal processing, Speech communication, Recurrent neural networks |
Subjects: | G Mathematical and Computer Sciences > G400 Computer Science G Mathematical and Computer Sciences > G440 Human-computer Interaction |
Divisions: | College of Science > School of Computer Science |
ID Code: | 8730 |
Deposited On: | 08 Apr 2013 13:48 |
Repository Staff Only: item control page