Bicocca Open Archive

Taxonomies provide a structured representation of semantic relations between lexical terms. In the case of standard official taxonomies, the refinement task consists of maintaining them updated over time, while preserving their original structure. To date, most of the approaches for automated taxonomy refinement rely on word vector models. However, none of them considers to what extent those models encode the taxonomic similarity between words. Motivated by this, we propose and implement TaxoRef, a methodology that (i) synthesises the semantic similarity between taxonomic elements through a new metric, namely HSS, (ii) evaluates to what extent the embeddings generated from a text corpus preserve those similarity relations and (iii) uses the best embedding resulted from this evaluation to perform taxonomy refinement. TaxoRef is a part of the research activity of a 4-year EU project that collects and classifies millions of Online Job Ads for the 27+1 EU countries. It has been tested over 2M ICT job ads classified over ESCO, the European standard occupation and skill taxonomy. Experimental results confirm (i) the HSS outperforms previous metrics for semantic similarity in taxonomies, and (ii) TaxoRef accurately encodes similarities among occupations, suggesting a refinement strategy.

Malandri, L., Mercorio, F., Mezzanzanica, M., Nobani, N. (2021). TaxoRef: Embeddings Evaluation for AI-driven Taxonomy Refinement. In Machine Learning and Knowledge Discovery in Databases. Research Track (pp.612-627). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-030-86523-8_37].

TaxoRef: Embeddings Evaluation for AI-driven Taxonomy Refinement

Malandri, Lorenzo;Mercorio, Fabio;Mezzanzanica, Mario;Nobani, Navid

2021

Abstract

Taxonomies provide a structured representation of semantic relations between lexical terms. In the case of standard official taxonomies, the refinement task consists of maintaining them updated over time, while preserving their original structure. To date, most of the approaches for automated taxonomy refinement rely on word vector models. However, none of them considers to what extent those models encode the taxonomic similarity between words. Motivated by this, we propose and implement TaxoRef, a methodology that (i) synthesises the semantic similarity between taxonomic elements through a new metric, namely HSS, (ii) evaluates to what extent the embeddings generated from a text corpus preserve those similarity relations and (iii) uses the best embedding resulted from this evaluation to perform taxonomy refinement. TaxoRef is a part of the research activity of a 4-year EU project that collects and classifies millions of Online Job Ads for the 27+1 EU countries. It has been tested over 2M ICT job ads classified over ESCO, the European standard occupation and skill taxonomy. Experimental results confirm (i) the HSS outperforms previous metrics for semantic similarity in taxonomies, and (ii) TaxoRef accurately encodes similarities among occupations, suggesting a refinement strategy.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Semantic similarity; Taxonomy refinement; Word embeddings evaluation;
			
	Parole chiave
	
				Taxonomy refinement, Semantic similarity, Word embeddings, machine learning
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				Joint European Conference on Machine Learning and Knowledge Discovery in Databases
			
	Anno del convegno
	
				2021
			
	Titolo degli atti
	
				Machine Learning and Knowledge Discovery in Databases. Research Track
			
	ISBN del volume degli atti
	
				978-3-030-86522-1
			
	Collana o serie
	
				LECTURE NOTES IN ARTIFICIAL INTELLIGENCE
			
	Data di pubblicazione
	
				2021
			
	Numero del volume
	
				12977
			
	Pagina iniziale
	
				612
			
	Pagina finale
	
				627
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1007/978-3-030-86523-8_37
			
	Fulltext
	
				none
			
	Citazione
	
				Malandri, L., Mercorio, F., Mezzanzanica, M., Nobani, N. (2021). TaxoRef: Embeddings Evaluation for AI-driven Taxonomy Refinement. In Machine Learning and Knowledge Discovery in Databases. Research Track (pp.612-627). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-030-86523-8_37].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/327156

Citazioni

10

5

Social impact