Food recognition is a major challenge in the field of computer vision, requiring models that can effectively handle the wide variability and complexity of food images. In this paper, we explore the use of vision transformers, a category of models based on self-attention mechanisms, to address the task of food recognition. We focus on training and fine-tuning different vision transformer architectures on Food2K, a large-scale dataset of food images with 2,000 categories. We compare the performance of vision transformers with convolutional neural networks (CNNs) on Food2K and Food101. In addition, we use state-of-the-art explainability techniques to highlight the regions of interest that vision transformers take into account when performing a prediction. Our results show that vision transformers can achieve competitive results on food recognition tasks, with the added benefit that pre-training on Food2K improve their generalization capabilities and interpretability. This study highlights the potential of vision transformers in food computing, paving the way for future research in this field.

Bianco, S., Buzzelli, M., Chiriaco, G., Napoletano, P., Piccoli, F. (2023). Food Recognition with Visual Transformers. In 2023 IEEE 13th International Conference on Consumer Electronics - Berlin (ICCE-Berlin) (pp.82-87). IEEE [10.1109/ICCE-Berlin58801.2023.10375660].

Food Recognition with Visual Transformers

Bianco, Simone;Buzzelli, Marco;Napoletano, Paolo;Piccoli, Flavio
2023

Abstract

Food recognition is a major challenge in the field of computer vision, requiring models that can effectively handle the wide variability and complexity of food images. In this paper, we explore the use of vision transformers, a category of models based on self-attention mechanisms, to address the task of food recognition. We focus on training and fine-tuning different vision transformer architectures on Food2K, a large-scale dataset of food images with 2,000 categories. We compare the performance of vision transformers with convolutional neural networks (CNNs) on Food2K and Food101. In addition, we use state-of-the-art explainability techniques to highlight the regions of interest that vision transformers take into account when performing a prediction. Our results show that vision transformers can achieve competitive results on food recognition tasks, with the added benefit that pre-training on Food2K improve their generalization capabilities and interpretability. This study highlights the potential of vision transformers in food computing, paving the way for future research in this field.
slide + paper
CNNs; food recognition; visual transformers; ViT;
English
2023 IEEE 13th International Conference on Consumer Electronics - Berlin (ICCE-Berlin) - 03-05 September 2023
2023
2023 IEEE 13th International Conference on Consumer Electronics - Berlin (ICCE-Berlin)
9798350324150
2023
82
87
https://ieeexplore.ieee.org/document/10375660
reserved
Bianco, S., Buzzelli, M., Chiriaco, G., Napoletano, P., Piccoli, F. (2023). Food Recognition with Visual Transformers. In 2023 IEEE 13th International Conference on Consumer Electronics - Berlin (ICCE-Berlin) (pp.82-87). IEEE [10.1109/ICCE-Berlin58801.2023.10375660].
File in questo prodotto:
File Dimensione Formato  
Bianco-2023-ICCE Berlin-VoR.pdf

Solo gestori archivio

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Tutti i diritti riservati
Dimensione 387.08 kB
Formato Adobe PDF
387.08 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/456872
Citazioni
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
Social impact