Towards Improving Speech Emotion Recognition Using Synthetic Data Augmentation from Emotion Conversion

Karim M Ibrahim; Antony Perzo; Simon Leglaive

doi:10.1109/icassp48485.2024.10445740

Communication Dans Un Congrès Année : 2024

Towards Improving Speech Emotion Recognition Using Synthetic Data Augmentation from Emotion Conversion

(1) , (1) , (2, 3)

1
2
3

Karim M Ibrahim

Fonction : Auteur correspondant
PersonId : 182577
IdHAL : karim-m-ibrahim
ORCID : 0000-0003-3315-9539

Connectez-vous pour contacter l'auteur

Emobot

Antony Perzo

Fonction : Auteur

Emobot

Simon Leglaive

Fonction : Auteur
PersonId : 20853
IdHAL : simon-leglaive
ORCID : 0000-0002-8219-1298
IdRef : 25312171X

CentraleSupélec

Institut d'Électronique et des Technologies du numéRique

Résumé

One of the main challenges in speech emotion recognition is the lack of large labelled datasets. The progress in speech synthesis allows us to generate reliable and realistic expressive speech. In this work, we propose using a state-of-the-art end-to-end speech emotion conversion model to generate new synthetic data for training speech emotion recognition models. We first evaluate the quality of the converted speech on new unseen datasets, which proves to be on par with the training data. Then, we study the effect of using the synthesized speech as data augmentation. We show that this approach improves the overall performance of emotion recognition models on two different datasets, IEMOCAP and RAVDESS, both in the cases of speaker dependent and independent emotion recognition using a fine-tuned wav2vec 2.0.

Mots clés

speech emotion recognition synthetic data data augmentation speech generation

Domaines

Machine Learning [stat.ML]

Fichier principal

ICASSP2024-1.pdf (232.52 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Karim M. Ibrahim : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04364976

Soumis le : mercredi 27 décembre 2023-15:59:15

Dernière modification le : jeudi 4 avril 2024-03:14:31

Dates et versions

hal-04364976 , version 1 (27-12-2023)

Identifiants

HAL Id : hal-04364976 , version 1
DOI : 10.1109/icassp48485.2024.10445740

Citer

Karim M Ibrahim, Antony Perzo, Simon Leglaive. Towards Improving Speech Emotion Recognition Using Synthetic Data Augmentation from Emotion Conversion. International Conference on Acoustics, Speech, and Signal Processing, 2024, Seoul, South Korea. ⟨10.1109/icassp48485.2024.10445740⟩. ⟨hal-04364976⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INSA-RENNES IETR SUP_IETR CENTRALESUPELEC IETR-FAST UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES INSA-GROUPE UR1-MATH-NUM HUB-IA NANTES-UNIVERSITE IETR-AIMAC

156 Consultations

100 Téléchargements

Towards Improving Speech Emotion Recognition Using Synthetic Data Augmentation from Emotion Conversion

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager