Taxonomies provide a structured representation of semantic relations between lexical terms. In the case of standard official taxonomies, the refinement task consists of maintaining them updated over time, while preserving their original structure. To date, most of the approaches for automated taxonomy refinement rely on word vector models. However, none of them considers to what extent those models encode the taxonomic similarity between words. Motivated by this, we propose and implement TaxoRef, a methodology that (i) synthesises the semantic similarity between taxonomic elements through a new metric, namely HSS, (ii) evaluates to what extent the embeddings generated from a text corpus preserve those similarity relations and (iii) uses the best embedding resulted from this evaluation to perform taxonomy refinement. TaxoRef is a part of the research activity of a 4-year EU project that collects and classifies millions of Online Job Ads for the 27+1 EU countries. It has been tested over 2M ICT job ads classified over ESCO, the European standard occupation and skill taxonomy. Experimental results confirm (i) the HSS outperforms previous metrics for semantic similarity in taxonomies, and (ii) TaxoRef accurately encodes similarities among occupations, suggesting a refinement strategy.
Malandri, L., Mercorio, F., Mezzanzanica, M., Nobani, N. (2021). TaxoRef: Embeddings Evaluation for AI-driven Taxonomy Refinement. In Machine Learning and Knowledge Discovery in Databases. Research Track (pp.612-627). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-030-86523-8_37].
TaxoRef: Embeddings Evaluation for AI-driven Taxonomy Refinement
Malandri, Lorenzo
;Mercorio, Fabio
;Mezzanzanica, Mario;Nobani, Navid
2021
Abstract
Taxonomies provide a structured representation of semantic relations between lexical terms. In the case of standard official taxonomies, the refinement task consists of maintaining them updated over time, while preserving their original structure. To date, most of the approaches for automated taxonomy refinement rely on word vector models. However, none of them considers to what extent those models encode the taxonomic similarity between words. Motivated by this, we propose and implement TaxoRef, a methodology that (i) synthesises the semantic similarity between taxonomic elements through a new metric, namely HSS, (ii) evaluates to what extent the embeddings generated from a text corpus preserve those similarity relations and (iii) uses the best embedding resulted from this evaluation to perform taxonomy refinement. TaxoRef is a part of the research activity of a 4-year EU project that collects and classifies millions of Online Job Ads for the 27+1 EU countries. It has been tested over 2M ICT job ads classified over ESCO, the European standard occupation and skill taxonomy. Experimental results confirm (i) the HSS outperforms previous metrics for semantic similarity in taxonomies, and (ii) TaxoRef accurately encodes similarities among occupations, suggesting a refinement strategy.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.