Understanding the relation between the meanings of words is an important part of comprehending natural language. Prior work has either focused on analysing lexical semantic relations in word embeddings or probing pretrained language models (PLMs), with some exceptions. Given the rarity of highly multilingual benchmarks, it is unclear to what extent PLMs capture relational knowledge and are able to transfer it across languages. To start addressing this question, we propose MultiLexBATS, a multilingual parallel dataset of lexical semantic relations adapted from BATS in 15 languages including low-resource languages, such as Bambara, Lithuanian, and Albanian. As experiment on cross-lingual transfer of relational knowledge, we test the PLMs' ability to (1) capture analogies across languages, and (2) predict translation targets. We find considerable differences across relation types and languages with a clear preference for hypernymy and antonymy as well as romance languages.

Gromann, D., Goncalo Oliveira, H., Pitarch, L., Apostol, E., Bernad, J., Bytyçi, E., et al. (2024). MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings (pp.11783-11793). European Language Resources Association (ELRA).

MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations

Blerina Spahiu;
2024

Abstract

Understanding the relation between the meanings of words is an important part of comprehending natural language. Prior work has either focused on analysing lexical semantic relations in word embeddings or probing pretrained language models (PLMs), with some exceptions. Given the rarity of highly multilingual benchmarks, it is unclear to what extent PLMs capture relational knowledge and are able to transfer it across languages. To start addressing this question, we propose MultiLexBATS, a multilingual parallel dataset of lexical semantic relations adapted from BATS in 15 languages including low-resource languages, such as Bambara, Lithuanian, and Albanian. As experiment on cross-lingual transfer of relational knowledge, we test the PLMs' ability to (1) capture analogies across languages, and (2) predict translation targets. We find considerable differences across relation types and languages with a clear preference for hypernymy and antonymy as well as romance languages.
paper
BATS; Lexical Semantic Relations; Multilingual Benchmark;
English
LREC-COLING 2024 (The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation) - 20 May 2024 through 25 May 2024
2024
Calzolari, N; Kan, MY; Hoste, V Lenci, A Sakti, S; Xue, N
2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings
9782493814104
2024
11783
11793
https://lrec-coling-2024.org/
open
Gromann, D., Goncalo Oliveira, H., Pitarch, L., Apostol, E., Bernad, J., Bytyçi, E., et al. (2024). MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings (pp.11783-11793). European Language Resources Association (ELRA).
File in questo prodotto:
File Dimensione Formato  
Gromann-2024-LREC-COLING 2024-VoR.pdf

accesso aperto

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 228.37 kB
Formato Adobe PDF
228.37 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/466219
Citazioni
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
Social impact