In this paper we face the problem of speaker recognition in the wild. We tackle the speaker identification and verification problems with the use of Deep Convolutional Neural Networks (CNN). We propose the modification of two Residual CNN architectures (ResNet) in order to be used with the spectrograms of the audio data as input images. The proposed architectures, trained with a contrastive-center loss, have been tested on the VoxCeleb and SIWIS datasets on both the speaker identification and verification tasks. The experimental results show the effectiveness of the proposed solution with respect to the state of the art. The proposed network shows to be robust in unconstrained conditions and, more important, it shows to be quite robust in a multilingual scenario.
Bianco, S., Cereda, E., Napoletano, P. (2018). Discriminative deep audio feature embedding for speaker recognition in the wild. In 2018 IEEE 8th International Conference on Consumer Electronics - Berlin (ICCE-Berlin) (pp.1-5). IEEE Computer Society [10.1109/ICCE-Berlin.2018.8576237].
Discriminative deep audio feature embedding for speaker recognition in the wild
Bianco, S;Napoletano, P
2018
Abstract
In this paper we face the problem of speaker recognition in the wild. We tackle the speaker identification and verification problems with the use of Deep Convolutional Neural Networks (CNN). We propose the modification of two Residual CNN architectures (ResNet) in order to be used with the spectrograms of the audio data as input images. The proposed architectures, trained with a contrastive-center loss, have been tested on the VoxCeleb and SIWIS datasets on both the speaker identification and verification tasks. The experimental results show the effectiveness of the proposed solution with respect to the state of the art. The proposed network shows to be robust in unconstrained conditions and, more important, it shows to be quite robust in a multilingual scenario.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.