In this study we investigate the effectiveness of deep neural networks in predicting valence and arousal solely from visual information of video sequences. Several recent Convolutional Neural Network (CNN) and Transformer architectures are used as backbone of the proposed model. We also assess the impact of pretraining on model performance by comparing the results of trained from scratch versus pre-trained models. Experimental results on the One-Minute Gradual-Emotion Recognition Challenge dataset suggest that pre-training on emotion recognition datasets is beneficial for most models. Comparison with the state-of-the-art reveals similar performance on valence Concordance Correlation Coefficient (CCC) and lower performance on arousal CCC. However, the predictions in our experiments are not statistically different in most cases. The study concludes by emphasizing the complexity of video emotion recognition and the need for further research to enhance the robustness and accuracy of emotion recognition models. The source code used for the experiments is made publicly available.

Alchieri, L., Celona, L., Bianco, S. (2024). Video-Based Emotion Estimation Using Deep Neural Networks: A Comparative Study. In Image Analysis and Processing - ICIAP 2023 Workshops Udine, Italy, September 11–15, 2023, Proceedings, Part I (pp.255-269). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-51023-6_22].

Video-Based Emotion Estimation Using Deep Neural Networks: A Comparative Study

Celona, Luigi;Bianco, Simone
2024

Abstract

In this study we investigate the effectiveness of deep neural networks in predicting valence and arousal solely from visual information of video sequences. Several recent Convolutional Neural Network (CNN) and Transformer architectures are used as backbone of the proposed model. We also assess the impact of pretraining on model performance by comparing the results of trained from scratch versus pre-trained models. Experimental results on the One-Minute Gradual-Emotion Recognition Challenge dataset suggest that pre-training on emotion recognition datasets is beneficial for most models. Comparison with the state-of-the-art reveals similar performance on valence Concordance Correlation Coefficient (CCC) and lower performance on arousal CCC. However, the predictions in our experiments are not statistically different in most cases. The study concludes by emphasizing the complexity of video emotion recognition and the need for further research to enhance the robustness and accuracy of emotion recognition models. The source code used for the experiments is made publicly available.
slide + paper
Arousal; Convolutional neural networks; Transformers; Valence; Video emotion recognition;
English
Image Analysis and Processing - ICIAP 2023 Workshops - September 11–15, 2023
2023
Foresti, GL; Fusiello, A; Hancock, E
Image Analysis and Processing - ICIAP 2023 Workshops Udine, Italy, September 11–15, 2023, Proceedings, Part I
9783031510229
24-gen-2024
2024
14365 LNCS
255
269
none
Alchieri, L., Celona, L., Bianco, S. (2024). Video-Based Emotion Estimation Using Deep Neural Networks: A Comparative Study. In Image Analysis and Processing - ICIAP 2023 Workshops Udine, Italy, September 11–15, 2023, Proceedings, Part I (pp.255-269). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-51023-6_22].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/457459
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
Social impact