Bicocca Open Archive

Taxonomies are the mainstay of the semantic web as they aim at organising knowledge in concepts linked by IS-A relationships. However, keeping such hierarchies updated and able to represent the domain from which they have been drawn is still a time-consuming, costly and error prone activity. Here, word embeddings have proven to be effective in catching lexicon and semantic similarities to enrich taxonomies from text data. This, in turn, would require to evaluate the generated embeddings to estimate the extent to which they encode the semantic similarity derived from the hierarchy itself. In this paper, we propose and implement MEET-LM, a methodology that aims at generating and evaluating embeddings from a text corpus preserving the co-hyponymy relations synthesised from a domain-specific taxonomy. We apply MEET-LM to a real-life dataset of 2M+ vacancies related to ICT-jobs, framed within the research activities of an EU project that collects millions of Online Job Vacancies and classifies them within the European standard hierarchy ESCO. To show MEET-LM is useful in practice, we also trained a neural network to classify co-hyponym relations using the selected embeddings as features. Our experiments reach 99.4% of accuracy and 86.5% of f1-score.

Malandri, L., Mercorio, F., Mezzanzanica, M., Nobani, N. (2021). MEET-LM: A method for embeddings evaluation for taxonomic data in the labour market. COMPUTERS IN INDUSTRY, 124(January 2021) [10.1016/j.compind.2020.103341].

MEET-LM: A method for embeddings evaluation for taxonomic data in the labour market

Malandri, Lorenzo;Mercorio, Fabio;Mezzanzanica, Mario;Nobani, Navid

2021

Abstract

Taxonomies are the mainstay of the semantic web as they aim at organising knowledge in concepts linked by IS-A relationships. However, keeping such hierarchies updated and able to represent the domain from which they have been drawn is still a time-consuming, costly and error prone activity. Here, word embeddings have proven to be effective in catching lexicon and semantic similarities to enrich taxonomies from text data. This, in turn, would require to evaluate the generated embeddings to estimate the extent to which they encode the semantic similarity derived from the hierarchy itself. In this paper, we propose and implement MEET-LM, a methodology that aims at generating and evaluating embeddings from a text corpus preserving the co-hyponymy relations synthesised from a domain-specific taxonomy. We apply MEET-LM to a real-life dataset of 2M+ vacancies related to ICT-jobs, framed within the research activities of an EU project that collects millions of Online Job Vacancies and classifies them within the European standard hierarchy ESCO. To show MEET-LM is useful in practice, we also trained a neural network to classify co-hyponym relations using the selected embeddings as features. Our experiments reach 99.4% of accuracy and 86.5% of f1-score.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Embeddings evaluation; ICT; Labour market; Semantic hierarchies; Taxonomies;
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				23-nov-2020
			
	Data di pubblicazione
	
				2021
			
	Rivista
	
				COMPUTERS IN INDUSTRY
			
	Numero del volume
	
				124
			
	Fascicolo
	
				January 2021
			
	Article number
	
				103341
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1016/j.compind.2020.103341
			
	Fulltext
	
				none
			
	Citazione
	
				Malandri, L., Mercorio, F., Mezzanzanica, M., Nobani, N. (2021). MEET-LM: A method for embeddings evaluation for taxonomic data in the labour market. COMPUTERS IN INDUSTRY, 124(January 2021) [10.1016/j.compind.2020.103341].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/292908

Citazioni

25

22

Social impact