Towards Improving Speech Emotion Recognition Using Synthetic Data Augmentation from Emotion Conversion - Institut d'Electronique et de Télécommunications de Rennes Accéder directement au contenu
Communication Dans Un Congrès Année : 2024

Towards Improving Speech Emotion Recognition Using Synthetic Data Augmentation from Emotion Conversion

Résumé

One of the main challenges in speech emotion recognition is the lack of large labelled datasets. The progress in speech synthesis allows us to generate reliable and realistic expressive speech. In this work, we propose using a state-of-the-art end-to-end speech emotion conversion model to generate new synthetic data for training speech emotion recognition models. We first evaluate the quality of the converted speech on new unseen datasets, which proves to be on par with the training data. Then, we study the effect of using the synthesized speech as data augmentation. We show that this approach improves the overall performance of emotion recognition models on two different datasets, IEMOCAP and RAVDESS, both in the cases of speaker dependent and independent emotion recognition using a fine-tuned wav2vec 2.0.
Fichier principal
Vignette du fichier
ICASSP2024-1.pdf (232.52 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04364976 , version 1 (27-12-2023)

Identifiants

Citer

Karim M Ibrahim, Antony Perzo, Simon Leglaive. Towards Improving Speech Emotion Recognition Using Synthetic Data Augmentation from Emotion Conversion. International Conference on Acoustics, Speech, and Signal Processing, 2024, Seoul, South Korea. ⟨10.1109/icassp48485.2024.10445740⟩. ⟨hal-04364976⟩
156 Consultations
100 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More