Irony is a complex linguistic phenomenon that has been extensively studied in computational linguistics across many languages. Existing research has relied heavily on annotated corpora, which are inherently biased due to their creation process. This study focuses on the problem of bias in cross-domain and cross-language irony detection and aims to identify the extent of topic bias in benchmark corpora and how it affects the generalization of models across domains and languages (English, Spanish, and Italian). Our findings offer a first insight into this issue and showed that mitigating the topic bias in these corpora improves the generalization of models beyond their training data. These results have important implications for the development of robust models in the analysis of ironic language.
Ortega-Bueno, R., Rosso, P., Fersini, E. (2023). Cross-Domain and Cross-Language Irony Detection: The Impact of Bias on Models’ Generalization. In Natural Language Processing and Information Systems 28th International Conference on Applications of Natural Language to Information Systems, NLDB 2023, Derby, UK, June 21–23, 2023, Proceedings (pp.140-155). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-35320-8_10].
Cross-Domain and Cross-Language Irony Detection: The Impact of Bias on Models’ Generalization
Fersini E.
2023
Abstract
Irony is a complex linguistic phenomenon that has been extensively studied in computational linguistics across many languages. Existing research has relied heavily on annotated corpora, which are inherently biased due to their creation process. This study focuses on the problem of bias in cross-domain and cross-language irony detection and aims to identify the extent of topic bias in benchmark corpora and how it affects the generalization of models across domains and languages (English, Spanish, and Italian). Our findings offer a first insight into this issue and showed that mitigating the topic bias in these corpora improves the generalization of models beyond their training data. These results have important implications for the development of robust models in the analysis of ironic language.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.