Wikipedia is nowadays one of the biggest online resources on which users rely as a source of information. The amount of collaboratively generated content that is sent to the online encyclopedia every day can let to the possible creation of low-quality articles (and, consequently, misinformation) if not properly monitored and revised. For this reason, in this paper, the problem of automatically assessing the quality of Wikipedia articles is considered. In particular, the focus is (i) on the analysis of groups of hand-crafted features that can be employed by supervised machine learning techniques to classify Wikipedia articles on qualitative bases, and (ii) on the analysis of some issues behind the construction of a suitable ground truth. Evaluations are performed, on the analyzed features and on a specifically built labeled dataset, by implementing different supervised classifiers based on distinct machine learning algorithms, which produced promising results.

Viviani, M., Bassani, E. (2019). Quality of Wikipedia articles: Analyzing features and building a ground truth for supervised classification. Intervento presentato a: The 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management (KDIR), Vienna, Austria [10.5220/0008149303380346].

Quality of Wikipedia articles: Analyzing features and building a ground truth for supervised classification

Viviani, Marco
;
Bassani, Elias
2019

Abstract

Wikipedia is nowadays one of the biggest online resources on which users rely as a source of information. The amount of collaboratively generated content that is sent to the online encyclopedia every day can let to the possible creation of low-quality articles (and, consequently, misinformation) if not properly monitored and revised. For this reason, in this paper, the problem of automatically assessing the quality of Wikipedia articles is considered. In particular, the focus is (i) on the analysis of groups of hand-crafted features that can be employed by supervised machine learning techniques to classify Wikipedia articles on qualitative bases, and (ii) on the analysis of some issues behind the construction of a suitable ground truth. Evaluations are performed, on the analyzed features and on a specifically built labeled dataset, by implementing different supervised classifiers based on distinct machine learning algorithms, which produced promising results.
slide + paper
Data Quality, Wikipedia, Supervised Classification, Feature Analysis, Ground Truth Building
English
The 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management (KDIR)
2019
9789897583827
2019
1
338
346
none
Viviani, M., Bassani, E. (2019). Quality of Wikipedia articles: Analyzing features and building a ground truth for supervised classification. Intervento presentato a: The 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management (KDIR), Vienna, Austria [10.5220/0008149303380346].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/249678
Citazioni
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 2
Social impact