Taxonomies provide a structured representation of semantic relations between lexical terms. In the case of standard official taxonomies, the refinement task consists of maintaining them updated over time, while preserving their original structure. To date, most of the approaches for automated taxonomy refinement rely on word vector models. However, none of them considers to what extent those models encode the taxonomic similarity between words. Motivated by this, we propose and implement TaxoRef, a methodology that (i) synthesises the semantic similarity between taxonomic elements through a new metric, namely HSS, (ii) evaluates to what extent the embeddings generated from a text corpus preserve those similarity relations and (iii) uses the best embedding resulted from this evaluation to perform taxonomy refinement. TaxoRef is a part of the research activity of a 4-year EU project that collects and classifies millions of Online Job Ads for the 27+1 EU countries. It has been tested over 2M ICT job ads classified over ESCO, the European standard occupation and skill taxonomy. Experimental results confirm (i) the HSS outperforms previous metrics for semantic similarity in taxonomies, and (ii) TaxoRef accurately encodes similarities among occupations, suggesting a refinement strategy.

Malandri, L., Mercorio, F., Mezzanzanica, M., Nobani, N. (2021). TaxoRef: Embeddings Evaluation for AI-driven Taxonomy Refinement. In Machine Learning and Knowledge Discovery in Databases. Research Track (pp.612-627). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-030-86523-8_37].

TaxoRef: Embeddings Evaluation for AI-driven Taxonomy Refinement

Malandri, Lorenzo
;
Mercorio, Fabio
;
Mezzanzanica, Mario;Nobani, Navid
2021

Abstract

Taxonomies provide a structured representation of semantic relations between lexical terms. In the case of standard official taxonomies, the refinement task consists of maintaining them updated over time, while preserving their original structure. To date, most of the approaches for automated taxonomy refinement rely on word vector models. However, none of them considers to what extent those models encode the taxonomic similarity between words. Motivated by this, we propose and implement TaxoRef, a methodology that (i) synthesises the semantic similarity between taxonomic elements through a new metric, namely HSS, (ii) evaluates to what extent the embeddings generated from a text corpus preserve those similarity relations and (iii) uses the best embedding resulted from this evaluation to perform taxonomy refinement. TaxoRef is a part of the research activity of a 4-year EU project that collects and classifies millions of Online Job Ads for the 27+1 EU countries. It has been tested over 2M ICT job ads classified over ESCO, the European standard occupation and skill taxonomy. Experimental results confirm (i) the HSS outperforms previous metrics for semantic similarity in taxonomies, and (ii) TaxoRef accurately encodes similarities among occupations, suggesting a refinement strategy.
paper
Semantic similarity; Taxonomy refinement; Word embeddings evaluation;
Taxonomy refinement, Semantic similarity, Word embeddings, machine learning
English
Joint European Conference on Machine Learning and Knowledge Discovery in Databases
2021
Machine Learning and Knowledge Discovery in Databases. Research Track
978-3-030-86522-1
2021
12977
612
627
none
Malandri, L., Mercorio, F., Mezzanzanica, M., Nobani, N. (2021). TaxoRef: Embeddings Evaluation for AI-driven Taxonomy Refinement. In Machine Learning and Knowledge Discovery in Databases. Research Track (pp.612-627). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-030-86523-8_37].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/327156
Citazioni
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 2
Social impact