The datasets that are part of the Linking Open Data cloud diagramm (LOD cloud) are classified into the following topical categories: media, government, publications, life sciences, geographic, social networking, user-generated content, and cross-domain. The topical categories were manually assigned to the datasets. In this paper, we investigate to which extent the topical classification of new LOD datasets can be automated using machine learning techniques and the existing annotations as supervision. We conducted experiments with different classification techniques and different feature sets. The best classification technique/feature set combination reaches an accuracy of 81:62% on the task of assigning one out of the eight classes to a given LOD dataset. A deeper inspection of the classification errors reveals problems with the manual classification of datasets in the current LOD cloud.
Meusel, R., Spahiu, B., Bizer, C., Paulheim, H. (2015). Towards automatic topical classification of LOD datasets. In Conference Proceeding Workshop on Linked Data on the Web, LDOW 2015. CEUR-WS.
Towards automatic topical classification of LOD datasets
SPAHIU, BLERINASecondo
;
2015
Abstract
The datasets that are part of the Linking Open Data cloud diagramm (LOD cloud) are classified into the following topical categories: media, government, publications, life sciences, geographic, social networking, user-generated content, and cross-domain. The topical categories were manually assigned to the datasets. In this paper, we investigate to which extent the topical classification of new LOD datasets can be automated using machine learning techniques and the existing annotations as supervision. We conducted experiments with different classification techniques and different feature sets. The best classification technique/feature set combination reaches an accuracy of 81:62% on the task of assigning one out of the eight classes to a given LOD dataset. A deeper inspection of the classification errors reveals problems with the manual classification of datasets in the current LOD cloud.File | Dimensione | Formato | |
---|---|---|---|
paper-03.pdf
accesso aperto
Dimensione
264.75 kB
Formato
Adobe PDF
|
264.75 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.