In this paper we face the problem of speaker recognition in the wild. We tackle the speaker identification and verification problems with the use of Deep Convolutional Neural Networks (CNN). We propose the modification of two Residual CNN architectures (ResNet) in order to be used with the spectrograms of the audio data as input images. The proposed architectures, trained with a contrastive-center loss, have been tested on the VoxCeleb and SIWIS datasets on both the speaker identification and verification tasks. The experimental results show the effectiveness of the proposed solution with respect to the state of the art. The proposed network shows to be robust in unconstrained conditions and, more important, it shows to be quite robust in a multilingual scenario.

Bianco, S., Cereda, E., Napoletano, P. (2018). Discriminative deep audio feature embedding for speaker recognition in the wild. In 2018 IEEE 8th International Conference on Consumer Electronics - Berlin (ICCE-Berlin) (pp.1-5). IEEE Computer Society [10.1109/ICCE-Berlin.2018.8576237].

Discriminative deep audio feature embedding for speaker recognition in the wild

Bianco, S;Napoletano, P
2018

Abstract

In this paper we face the problem of speaker recognition in the wild. We tackle the speaker identification and verification problems with the use of Deep Convolutional Neural Networks (CNN). We propose the modification of two Residual CNN architectures (ResNet) in order to be used with the spectrograms of the audio data as input images. The proposed architectures, trained with a contrastive-center loss, have been tested on the VoxCeleb and SIWIS datasets on both the speaker identification and verification tasks. The experimental results show the effectiveness of the proposed solution with respect to the state of the art. The proposed network shows to be robust in unconstrained conditions and, more important, it shows to be quite robust in a multilingual scenario.
slide + paper
Convolutional Neural Networks; Speaker Identification; Speaker Verification;
Speaker Identification, Speaker Verification, Convolutional Neural Networks
English
8th IEEE International Conference on Consumer Electronics - Berlin, ICCE-Berlin 2018
2018
2018 IEEE 8th International Conference on Consumer Electronics - Berlin (ICCE-Berlin)
9781538660959
2018
2018-
1
5
8576237
none
Bianco, S., Cereda, E., Napoletano, P. (2018). Discriminative deep audio feature embedding for speaker recognition in the wild. In 2018 IEEE 8th International Conference on Consumer Electronics - Berlin (ICCE-Berlin) (pp.1-5). IEEE Computer Society [10.1109/ICCE-Berlin.2018.8576237].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/215439
Citazioni
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 3
Social impact