Bicocca Open Archive

In this paper, we present MammoTab, a dataset composed of 1M Wikipedia tables extracted from over 20M Wikipedia pages and annotated through Wikidata. The lack of this kind of datasets in the state- of-the-art makes MammoTab a good resource for testing and training Semantic Table Interpretation approaches. The dataset has been designed to cover several key challenges, such as disambiguation, homonymy, and NIL-mentions. The dataset has been evaluated using MTab, one of the best approaches of the SemTab challenge.

Marzocchi, M., Cremaschi, M., Pozzi, R., Avogadro, R., Palmonari, M. (2022). MammoTab: a giant and comprehensive dataset for Semantic Table Interpretation. In SemTab 2022 Semantic Web Challenge on Tabular Data to Knowledge Graph Matching Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching co-located with the 21st International Semantic Web Conference (ISWC 2022) (pp.28-33). CEUR-WS.

MammoTab: a giant and comprehensive dataset for Semantic Table Interpretation

Marzocchi, M;Cremaschi, M;Pozzi, R;Avogadro, R;Palmonari, M

2022

Abstract

In this paper, we present MammoTab, a dataset composed of 1M Wikipedia tables extracted from over 20M Wikipedia pages and annotated through Wikidata. The lack of this kind of datasets in the state- of-the-art makes MammoTab a good resource for testing and training Semantic Table Interpretation approaches. The dataset has been designed to cover several key challenges, such as disambiguation, homonymy, and NIL-mentions. The dataset has been evaluated using MTab, one of the best approaches of the SemTab challenge.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Knowledge Graph; Semantic Table Interpretation; SemTab Challenge; Tabular Data;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				2022 Semantic Web Challenge on Tabular Data to Knowledge Graph Matching, SemTab 2022
			
	Anno del convegno
	
				2022
			
	Curatori della monografia
	
				Efthymiou, V; Jiménez-Ruiz, E; Chen, J; Cutrona, V; Hassanzadeh, O; Sequeda, J; Srinivas, K; Abdelmageed, N; Hulsebos, M
			
	Titolo degli atti
	
				SemTab 2022 Semantic Web Challenge on Tabular Data to Knowledge Graph Matching
Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching
co-located with the 21st International Semantic Web Conference (ISWC 2022)
			
	Collana o serie
	
				CEUR WORKSHOP PROCEEDINGS
			
	Data di pubblicazione
	
				2022
			
	Numero del volume
	
				3320
			
	Pagina iniziale
	
				28
			
	Pagina finale
	
				33
			
	URL alternativo
	
				https://ceur-ws.org/Vol-3320/paper3.pdf
			
	Fulltext
	
				open
			
	Citazione
	
				Marzocchi, M., Cremaschi, M., Pozzi, R., Avogadro, R., Palmonari, M. (2022). MammoTab: a giant and comprehensive dataset for Semantic Table Interpretation. In SemTab 2022 Semantic Web Challenge on Tabular Data to Knowledge Graph Matching
Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching
co-located with the 21st International Semantic Web Conference (ISWC 2022) (pp.28-33). CEUR-WS.
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
Marzocchi-2023-ISWC-VoR.pdf accesso aperto Descrizione: Intervento a convegno Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 1.05 MB Formato Adobe PDF Visualizza/Apri	1.05 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/423134

Citazioni

5

ND

Social impact