Bicocca Open Archive

Performance in Speech Emotion Recognition (SER) on a single language has increased greatly in the last few years thanks to the use of deep learning techniques. However, cross-lingual SER remains a challenge in real-world applications due to two main factors: the first is the big gap among the source and the target domain distributions; the second factor is the major availability of unlabeled utterances in contrast to the labeled ones for the new language. Taking into account previous aspects, we propose a Semi-Supervised Learning (SSL) method for cross-lingual emotion recognition when only few labeled examples in the target domain (i.e. the new language) are available. Our method is based on a Transformer and it adapts to the new domain by exploiting a pseudo-labeling strategy on the unlabeled utterances. In particular, the use of a hard and soft pseudo-labels approach is investigated. We thoroughly evaluate the performance of the proposed method in a speaker-independent setup on both the source and the new language and show its robustness across five languages belonging to different linguistic strains. The experimental findings indicate that the unweighted accuracy is increased by an average of 40% compared to state-of-the-art methods.

Agarla, M., Bianco, S., Celona, L., Napoletano, P., Petrovsky, A., Piccoli, F., et al. (2024). Semi-supervised cross-lingual speech emotion recognition. EXPERT SYSTEMS WITH APPLICATIONS, 237(Part A (1 March 2024)) [10.1016/j.eswa.2023.121368].

Semi-supervised cross-lingual speech emotion recognition

Agarla, Mirko;Bianco, Simone;Celona, Luigi;Napoletano, Paolo;Petrovsky, Alexey;Piccoli, Flavio;Schettini, Raimondo;Shanin, Ivan

2024

Abstract

Performance in Speech Emotion Recognition (SER) on a single language has increased greatly in the last few years thanks to the use of deep learning techniques. However, cross-lingual SER remains a challenge in real-world applications due to two main factors: the first is the big gap among the source and the target domain distributions; the second factor is the major availability of unlabeled utterances in contrast to the labeled ones for the new language. Taking into account previous aspects, we propose a Semi-Supervised Learning (SSL) method for cross-lingual emotion recognition when only few labeled examples in the target domain (i.e. the new language) are available. Our method is based on a Transformer and it adapts to the new domain by exploiting a pseudo-labeling strategy on the unlabeled utterances. In particular, the use of a hard and soft pseudo-labels approach is investigated. We thoroughly evaluate the performance of the proposed method in a speaker-independent setup on both the source and the new language and show its robustness across five languages belonging to different linguistic strains. The experimental findings indicate that the unweighted accuracy is increased by an average of 40% compared to state-of-the-art methods.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Cross-lingual; Semi-supervised domain adaptation; Semi-supervised learning; Speech emotion recognition;
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				3-set-2023
			
	Data di pubblicazione
	
				2024
			
	Rivista
	
				EXPERT SYSTEMS WITH APPLICATIONS
			
	Numero del volume
	
				237
			
	Fascicolo
	
				Part A (1 March 2024)
			
	Article number
	
				121368
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1016/j.eswa.2023.121368
			
	Fulltext
	
				open
			
	Citazione
	
				Agarla, M., Bianco, S., Celona, L., Napoletano, P., Petrovsky, A., Piccoli, F., et al. (2024). Semi-supervised cross-lingual speech emotion recognition. EXPERT SYSTEMS WITH APPLICATIONS, 237(Part A (1 March 2024)) [10.1016/j.eswa.2023.121368].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Agarla-2024-Expert Systems with Applications-VoR.pdf accesso aperto Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 2.8 MB Formato Adobe PDF Visualizza/Apri	2.8 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/436678

Citazioni

13

10

Social impact