From a few-shot learning perspective, we propose a strategy to enrich the latent semantic of the text provided in the dataset provided for the Profiling Cryptocurrency Influencers with Few-shot Learning, the task hosted at PAN@CLEF2023. Our approach is based on data augmentation using the backtranslation forth and back to and from Japanese language. We translate samples in the original training dataset to a target language (i.e. Japanese). Then we translate it back to English. The original sample and the backtranslated one are then merged. Then we fine-tuned two state-of-the-art Transformer models on this augmented version of the training dataset. We evaluate the performance of the two fine-tuned models using the Macro and Micro F1 accordingly to the official metric used for the task. After the fine-tuning phase, ELECTRA and XLNet obtained a Macro F1 of 0.7694 and 0.7872 respectively on the original training set. Our best submission obtained a Macro F1 equal to 0.3851 on the official test set provided.

Lomonaco, F., Siino, M., Tesconi, M. (2023). Text Enrichment with Japanese Language to Profile Cryptocurrency Influencers. In Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023) (pp.2708-2716). CEUR-WS.

Text Enrichment with Japanese Language to Profile Cryptocurrency Influencers

Lomonaco F.;
2023

Abstract

From a few-shot learning perspective, we propose a strategy to enrich the latent semantic of the text provided in the dataset provided for the Profiling Cryptocurrency Influencers with Few-shot Learning, the task hosted at PAN@CLEF2023. Our approach is based on data augmentation using the backtranslation forth and back to and from Japanese language. We translate samples in the original training dataset to a target language (i.e. Japanese). Then we translate it back to English. The original sample and the backtranslated one are then merged. Then we fine-tuned two state-of-the-art Transformer models on this augmented version of the training dataset. We evaluate the performance of the two fine-tuned models using the Macro and Micro F1 accordingly to the official metric used for the task. After the fine-tuning phase, ELECTRA and XLNet obtained a Macro F1 of 0.7694 and 0.7872 respectively on the original training set. Our best submission obtained a Macro F1 equal to 0.3851 on the official test set provided.
paper
author profiling; cryptocurrency influencers; data augmentation; japanese; text classification; text enrichment; Twitter;
English
24th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF-WN 2023
2023
Aliannejadi, M; Faggioli, G; Ferro, N; Vlachos, M
Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023)
2023
3497
2708
2716
https://ceur-ws.org/Vol-3497/
open
Lomonaco, F., Siino, M., Tesconi, M. (2023). Text Enrichment with Japanese Language to Profile Cryptocurrency Influencers. In Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023) (pp.2708-2716). CEUR-WS.
File in questo prodotto:
File Dimensione Formato  
Lomonaco-2023-CLEF-WN-VoR.pdf

accesso aperto

Descrizione: This volume and its papers are published under the Creative Commons License Attribution 4.0 International (CC BY 4.0).
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 1.04 MB
Formato Adobe PDF
1.04 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/524726
Citazioni
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
Social impact