Bicocca Open Archive

Speaker verification is the task of examining a speech signal to authenticate the claimed identity of a speaker as true or false. In order to deal with utterances having different lengths, and to accumulate information along the time dimension, different temporal aggregators have been proposed inside speaker verification pipelines. In this paper we investigate the behavior of five different temporal aggregators in the state of art, namely Temporal Average Pooling (TAP), Global Statistical Pooling (GSP), Self-Attentive Pooling (SAP), Attentive Statistical Pooling (ASP), and Vector of Locally Aggregated Descriptors (VLAD) at varying lengths of the two utterances. Starting from a speaker verification method in the state of the art, the experimental results on the VoxCeleb2 dataset show that there is a sweet spot for utterance length where speaker verification performance is higher independently from the temporal aggregator used.

Piccoli, F., Olearo, L., Bianco, S. (2022). A comparison of temporal aggregators for speaker verification. In IEEE International Conference on Consumer Electronics - Berlin, ICCE-Berlin (pp.1-6). IEEE Computer Society [10.1109/ICCE-Berlin56473.2022.9937132].

A comparison of temporal aggregators for speaker verification

Piccoli F.;Olearo L.;Bianco S.

2022

Abstract

Speaker verification is the task of examining a speech signal to authenticate the claimed identity of a speaker as true or false. In order to deal with utterances having different lengths, and to accumulate information along the time dimension, different temporal aggregators have been proposed inside speaker verification pipelines. In this paper we investigate the behavior of five different temporal aggregators in the state of art, namely Temporal Average Pooling (TAP), Global Statistical Pooling (GSP), Self-Attentive Pooling (SAP), Attentive Statistical Pooling (ASP), and Vector of Locally Aggregated Descriptors (VLAD) at varying lengths of the two utterances. Starting from a speaker verification method in the state of the art, the experimental results on the VoxCeleb2 dataset show that there is a sweet spot for utterance length where speaker verification performance is higher independently from the temporal aggregator used.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				slide + paper
			
	Parole chiave
	
				Speaker verification; temporal aggregation;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				12th IEEE International Conference on Consumer Electronics, ICCE-Berlin 2022 - 2 September 2022 through 6 September 2022
			
	Anno del convegno
	
				2022
			
	Titolo degli atti
	
				IEEE International Conference on Consumer Electronics - Berlin, ICCE-Berlin
			
	ISBN del volume degli atti
	
				978-1-6654-5676-0
			
	Data di pubblicazione
	
				2022
			
	Numero del volume
	
				2022-
			
	Pagina iniziale
	
				1
			
	Pagina finale
	
				6
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1109/ICCE-Berlin56473.2022.9937132
			
	Fulltext
	
				none
			
	Citazione
	
				Piccoli, F., Olearo, L., Bianco, S. (2022). A comparison of temporal aggregators for speaker verification. In IEEE International Conference on Consumer Electronics - Berlin, ICCE-Berlin (pp.1-6). IEEE Computer Society [10.1109/ICCE-Berlin56473.2022.9937132].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/398215

Citazioni

0

ND

Social impact