Bicocca Open Archive

Table annotation is a key task to improve querying the Web and support the Knowledge Graph population from legacy sources (tables). Last year, the SemTab challenge was introduced to unify different efforts to evaluate table annotation algorithms by providing a common interface and several general-purpose datasets as a ground truth. The SemTab dataset is useful to have a general understanding of how these algorithms work, and the organizers of the challenge included some artificial noise to the data to make the annotation trickier. However, it is hard to analyze specific aspects in an automatic way. For example, the ambiguity of names at the entity-level can largely affect the quality of the annotation. In this paper, we propose a novel dataset to complement the datasets proposed by SemTab. The dataset consists of a set of high-quality manually-curated tables with non-obviously linkable cells, i.e., where values are ambiguous names, typos, and misspelled entity names not appearing in the current version of the SemTab dataset. These challenges are particularly relevant for the ingestion of structured legacy sources into existing knowledge graphs. Evaluations run on this dataset show that ambiguity is a key problem for entity linking algorithms and encourage a promising direction for future work in the field.

Cutrona, V., Bianchi, F., Jiménez-Ruiz, E., Palmonari, M. (2020). Tough Tables: Carefully Evaluating Entity Linking for Tabular Data. In The Semantic Web – ISWC 2020 19th International Semantic Web Conference, Athens, Greece, November 2–6, 2020, Proceedings, Part II (pp.328-343). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-030-62466-8_21].

Tough Tables: Carefully Evaluating Entity Linking for Tabular Data

Cutrona, V;Bianchi, F;Jiménez-Ruiz, E;Palmonari, M

2020

Abstract

Table annotation is a key task to improve querying the Web and support the Knowledge Graph population from legacy sources (tables). Last year, the SemTab challenge was introduced to unify different efforts to evaluate table annotation algorithms by providing a common interface and several general-purpose datasets as a ground truth. The SemTab dataset is useful to have a general understanding of how these algorithms work, and the organizers of the challenge included some artificial noise to the data to make the annotation trickier. However, it is hard to analyze specific aspects in an automatic way. For example, the ambiguity of names at the entity-level can largely affect the quality of the annotation. In this paper, we propose a novel dataset to complement the datasets proposed by SemTab. The dataset consists of a set of high-quality manually-curated tables with non-obviously linkable cells, i.e., where values are ambiguous names, typos, and misspelled entity names not appearing in the current version of the SemTab dataset. These challenges are particularly relevant for the ingestion of structured legacy sources into existing knowledge graphs. Evaluations run on this dataset show that ambiguity is a key problem for entity linking algorithms and encourage a promising direction for future work in the field.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Cell entity annotation; Entity linking; Instance-level matching; Semantic labeling; Table annotation;
			
	Parole chiave
	
				Entity linking;Instance-level matching;Cell entity annotation;Semantic labeling;Table annotation;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				19th International Semantic Web Conference ISWC 2020 - November 2–6, 2020
			
	Anno del convegno
	
				2020
			
	Curatori della monografia
	
				Pan J.Z.,Tamma V.,d'Amato C.,Janowicz K.,Fu B.,Polleres A.,Seneviratne O.,Kagal L.
			
	Titolo degli atti
	
				The Semantic Web – ISWC 2020 19th International Semantic Web Conference, Athens, Greece, November 2–6, 2020, Proceedings, Part II
			
	ISBN del volume degli atti
	
				9783030624651
			
	Collana o serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	Data di pubblicazione
	
				2020
			
	Numero del volume
	
				12507
			
	Pagina iniziale
	
				328
			
	Pagina finale
	
				343
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1007/978-3-030-62466-8_21
			
	Fulltext
	
				partially_open
			
	Citazione
	
				Cutrona, V., Bianchi, F., Jiménez-Ruiz, E., Palmonari, M. (2020). Tough Tables: Carefully Evaluating Entity Linking for Tabular Data. In The Semantic Web – ISWC 2020 19th International Semantic Web Conference, Athens, Greece, November 2–6, 2020, Proceedings, Part II (pp.328-343). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-030-62466-8_21].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
Cutrona-2020-ISWC2020-preprint.pdf accesso aperto Tipologia di allegato: Submitted Version (Pre-print) Licenza: Altro Dimensione 361.1 kB Formato Adobe PDF Visualizza/Apri	361.1 kB	Adobe PDF	Visualizza/Apri
Cutrona-2020-ISWC2020-VoR.pdf Solo gestori archivio Descrizione: Elsevier's Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Tutti i diritti riservati Dimensione 547.24 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	547.24 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/299737

Citazioni

33

ND

Social impact