Table annotation is a key task to improve querying the Web and support the Knowledge Graph population from legacy sources (tables). Last year, the SemTab challenge was introduced to unify different efforts to evaluate table annotation algorithms by providing a common interface and several general-purpose datasets as a ground truth. The SemTab dataset is useful to have a general understanding of how these algorithms work, and the organizers of the challenge included some artificial noise to the data to make the annotation trickier. However, it is hard to analyze specific aspects in an automatic way. For example, the ambiguity of names at the entity-level can largely affect the quality of the annotation. In this paper, we propose a novel dataset to complement the datasets proposed by SemTab. The dataset consists of a set of high-quality manually-curated tables with non-obviously linkable cells, i.e., where values are ambiguous names, typos, and misspelled entity names not appearing in the current version of the SemTab dataset. These challenges are particularly relevant for the ingestion of structured legacy sources into existing knowledge graphs. Evaluations run on this dataset show that ambiguity is a key problem for entity linking algorithms and encourage a promising direction for future work in the field.

Cutrona, V., Bianchi, F., Jiménez-Ruiz, E., Palmonari, M. (2020). Tough Tables: Carefully Evaluating Entity Linking for Tabular Data. In The Semantic Web – ISWC 2020 19th International Semantic Web Conference, Athens, Greece, November 2–6, 2020, Proceedings, Part II (pp.328-343) [10.1007/978-3-030-62466-8_21].

Tough Tables: Carefully Evaluating Entity Linking for Tabular Data

Cutrona, V;Bianchi, F;Palmonari, M
2020

Abstract

Table annotation is a key task to improve querying the Web and support the Knowledge Graph population from legacy sources (tables). Last year, the SemTab challenge was introduced to unify different efforts to evaluate table annotation algorithms by providing a common interface and several general-purpose datasets as a ground truth. The SemTab dataset is useful to have a general understanding of how these algorithms work, and the organizers of the challenge included some artificial noise to the data to make the annotation trickier. However, it is hard to analyze specific aspects in an automatic way. For example, the ambiguity of names at the entity-level can largely affect the quality of the annotation. In this paper, we propose a novel dataset to complement the datasets proposed by SemTab. The dataset consists of a set of high-quality manually-curated tables with non-obviously linkable cells, i.e., where values are ambiguous names, typos, and misspelled entity names not appearing in the current version of the SemTab dataset. These challenges are particularly relevant for the ingestion of structured legacy sources into existing knowledge graphs. Evaluations run on this dataset show that ambiguity is a key problem for entity linking algorithms and encourage a promising direction for future work in the field.
paper
Entity linking;Instance-level matching;Cell entity annotation;Semantic labeling;Table annotation;
English
19th International Semantic Web Conference ISWC 2020 - November 2–6, 2020
2020
Pan J.Z.,Tamma V.,d'Amato C.,Janowicz K.,Fu B.,Polleres A.,Seneviratne O.,Kagal L.
The Semantic Web – ISWC 2020 19th International Semantic Web Conference, Athens, Greece, November 2–6, 2020, Proceedings, Part II
9783030624651
2020
12507 LNCS
328
343
partially_open
Cutrona, V., Bianchi, F., Jiménez-Ruiz, E., Palmonari, M. (2020). Tough Tables: Carefully Evaluating Entity Linking for Tabular Data. In The Semantic Web – ISWC 2020 19th International Semantic Web Conference, Athens, Greece, November 2–6, 2020, Proceedings, Part II (pp.328-343) [10.1007/978-3-030-62466-8_21].
File in questo prodotto:
File Dimensione Formato  
Cutrona-2020-ISWC2020-preprint.pdf

accesso aperto

Tipologia di allegato: Submitted Version (Pre-print)
Licenza: Altro
Dimensione 361.1 kB
Formato Adobe PDF
361.1 kB Adobe PDF Visualizza/Apri
Cutrona-2020-ISWC2020-VoR.pdf

Solo gestori archivio

Descrizione: Elsevier's
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Tutti i diritti riservati
Dimensione 547.24 kB
Formato Adobe PDF
547.24 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/299737
Citazioni
  • Scopus 33
  • ???jsp.display-item.citation.isi??? ND
Social impact