Development and Evaluation of the Emotional Slovenian Speech Database-EmoLUKS

Tadej Justin, Vitomir Štruc, Janez Žibert, France Mihelič: Development and Evaluation of the Emotional Slovenian Speech Database-EmoLUKS. Proceedings of the International Conference on Text, Speech, and Dialogue (TSD), Springer 2015.

Abstract

This paper describes a speech database built from 17 Slovenian radio dramas. The dramas were obtained from the national radio-and-television station (RTV Slovenia) and were given at the universities disposal with an academic license for processing and annotating the audio material. The utterances of one male and one female speaker were transcribed, segmented and then annotated with emotional states of the speakers. The annotation of the emotional states was conducted in two stages with our own web-based application for crowd sourcing. The final (emotional) speech database consists of 1385 recordings of one male (975 recordings) and one female (410 recordings) speaker and contains labeled emotional speech with a total duration of around 1 hour and 15 minutes. The paper presents the two-stage annotation process used to label the data and demonstrates the usefulness of the employed annotation methodology. Baseline emotion recognition experiments are also presented. The reported results are presented with the un-weighted as well as weighted average recalls and precisions for 2-class and 7-class recognition experiments.

BibTeX (Download)

@conference{justin2015development,
title = {Development and Evaluation of the Emotional Slovenian Speech Database-EmoLUKS},
author = { Tadej Justin and Vitomir \v{S}truc and Janez \v{Z}ibert and France Miheli\v{c}},
url = {http://luks.fe.uni-lj.si/nluks/wp-content/uploads/2016/09/tsd2015.pdf},
year  = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the International Conference on Text, Speech, and Dialogue (TSD)},
pages = {351--359},
organization = {Springer},
abstract = {This paper describes a speech database built from 17 Slovenian radio dramas. The dramas were obtained from the national radio-and-television station (RTV Slovenia) and were given at the universities disposal with an academic license for processing and annotating the audio material. The utterances of one male and one female speaker were transcribed, segmented and then annotated with emotional states of the speakers. The annotation of the emotional states was conducted in two stages with our own web-based application for crowd sourcing. The final (emotional) speech database consists of 1385 recordings of one male (975 recordings) and one female (410 recordings) speaker and contains labeled emotional speech with a total duration of around 1 hour and 15 minutes. The paper presents the two-stage annotation process used to label the data and demonstrates the usefulness of the employed annotation methodology. Baseline emotion recognition experiments are also presented. The reported results are presented with the un-weighted as well as weighted average recalls and precisions for 2-class and 7-class recognition experiments.},
keywords = {annotated data, dataset, dataset of emotional speech, EmoLUKS, emotional speech synthesis, speech synthesis, speech technologies, transcriptions},
pubstate = {published},
tppubtype = {conference}
}