In this paper, we present MammoTab, a dataset composed of 1M Wikipedia tables extracted from over 20M Wikipedia pages and annotated through Wikidata. The lack of this kind of datasets in the state- of-the-art makes MammoTab a good resource for testing and training Semantic Table Interpretation approaches. The dataset has been designed to cover several key challenges, such as disambiguation, homonymy, and NIL-mentions. The dataset has been evaluated using MTab, one of the best approaches of the SemTab challenge.
Marzocchi, M., Cremaschi, M., Pozzi, R., Avogadro, R., Palmonari, M. (2022). MammoTab: a giant and comprehensive dataset for Semantic Table Interpretation. In SemTab 2022 Semantic Web Challenge on Tabular Data to Knowledge Graph Matching Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching co-located with the 21st International Semantic Web Conference (ISWC 2022) (pp.28-33). CEUR-WS.
MammoTab: a giant and comprehensive dataset for Semantic Table Interpretation
Cremaschi, M;Pozzi, R;Avogadro, R;Palmonari, M
2022
Abstract
In this paper, we present MammoTab, a dataset composed of 1M Wikipedia tables extracted from over 20M Wikipedia pages and annotated through Wikidata. The lack of this kind of datasets in the state- of-the-art makes MammoTab a good resource for testing and training Semantic Table Interpretation approaches. The dataset has been designed to cover several key challenges, such as disambiguation, homonymy, and NIL-mentions. The dataset has been evaluated using MTab, one of the best approaches of the SemTab challenge.File | Dimensione | Formato | |
---|---|---|---|
Marzocchi-2023-ISWC-VoR.pdf
accesso aperto
Descrizione: Intervento a convegno
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Creative Commons
Dimensione
1.05 MB
Formato
Adobe PDF
|
1.05 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.