In recent years, sentiment analysis has seen remarkable advancements, particularly with the development of learning-based and knowledge-based tools, also called lexicon-based. As opposed to machine learning methods, knowledge-based ones do not need to retrieve labelled data to train a classifier, and are less resource-expensive. However, dependency on pre-established rules may be too rigid to be adapted to different domains or too broad to encompass subtle variations in sentiment within specific domains. Additionally, due to their manual construction, their coverage often remains restricted. This study introduces SEEDOT, a novel methodology to enhance the performance of specialised lexicon-based tools. SEEDOT starts from a general lexicon and a domain-specific corpus, and uses machine learning to improve the existing lexicon with domain-specific terms. This improves at once the specificity and the coverage of the general lexicon. The effectiveness of SEEDOT is compared to a state-of-the-art lexicon-based tool, outperforming it in all four domains considered.
Haardt, V., Malandri, L., Mercorio, F., Porcelli, L. (2025). SEEDOT: Tool for Enhancing Sentiment Lexicon with Machine Learning. In Machine Learning and Principles and Practice of Knowledge Discovery in Databases International Workshops of ECML PKDD 2023, Turin, Italy, September 18–22, 2023, Revised Selected Papers, Part III (pp.390-402). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-74633-8_28].
SEEDOT: Tool for Enhancing Sentiment Lexicon with Machine Learning
Haardt V.;Malandri L.;Mercorio F.;Porcelli L.
2025
Abstract
In recent years, sentiment analysis has seen remarkable advancements, particularly with the development of learning-based and knowledge-based tools, also called lexicon-based. As opposed to machine learning methods, knowledge-based ones do not need to retrieve labelled data to train a classifier, and are less resource-expensive. However, dependency on pre-established rules may be too rigid to be adapted to different domains or too broad to encompass subtle variations in sentiment within specific domains. Additionally, due to their manual construction, their coverage often remains restricted. This study introduces SEEDOT, a novel methodology to enhance the performance of specialised lexicon-based tools. SEEDOT starts from a general lexicon and a domain-specific corpus, and uses machine learning to improve the existing lexicon with domain-specific terms. This improves at once the specificity and the coverage of the general lexicon. The effectiveness of SEEDOT is compared to a state-of-the-art lexicon-based tool, outperforming it in all four domains considered.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.