Combining audio and video for detection of spontaneous emotions

Rok Gajšek, Vitomir Štruc, Simon Dobrišek, Janez Žibert, France Mihelič, Nikola Pavešić: Combining audio and video for detection of spontaneous emotions. Biometric ID management and multimodal communication, 5707 , Lecture Notes on Computer Science Springer-Verlag, Berlin, Heidelberg, 2009.

Abstract

The paper presents our initial attempts in building an audio video emotion recognition system. Both, audio and video sub-systems are discussed, and description of the database of spontaneous emotions is given. The task of labelling the recordings from the database according to different emotions is discussed and the measured agreement between multiple annotators is presented. Instead of focusing on the prosody in audio emotion recognition, we evaluate the possibility of using linear transformations (CMLLR) as features. The classification results from audio and video sub-systems are combined using sum rule fusion and the increase in recognition results, when using both modalities, is presented.

BibTeX (Download)

@conference{BioID_Multi2009b,
title = {Combining audio and video for detection of spontaneous emotions},
author = {Rok Gaj\v{s}ek and Vitomir \v{S}truc and Simon Dobri\v{s}ek and Janez \v{Z}ibert and France Miheli\v{c} and Nikola Pave\v{s}i\'{c}},
url = {http://luks.fe.uni-lj.si/nluks/wp-content/uploads/2016/09/BioID_R.pdf},
year  = {2009},
date = {2009-01-01},
booktitle = {Biometric ID management and multimodal communication},
volume = {5707},
pages = {114-121},
publisher = {Springer-Verlag},
address = {Berlin, Heidelberg},
series = {Lecture Notes on Computer Science},
abstract = {The paper presents our initial attempts in building an audio video emotion recognition system. Both, audio and video sub-systems are discussed, and description of the database of spontaneous emotions is given. The task of labelling the recordings from the database according to different emotions is discussed and the measured agreement between multiple annotators is presented. Instead of focusing on the prosody in audio emotion recognition, we evaluate the possibility of using linear transformations (CMLLR) as features. The classification results from audio and video sub-systems are combined using sum rule fusion and the increase in recognition results, when using both modalities, is presented.},
keywords = {emotion recognition, facial expression recognition, performance evaluation, speech processing, speech technologies},
pubstate = {published},
tppubtype = {conference}
}