Bicocca Open Archive

Wikipedia is nowadays one of the biggest online resources on which users rely as a source of information. The amount of collaboratively generated content that is sent to the online encyclopedia every day can let to the possible creation of low-quality articles (and, consequently, misinformation) if not properly monitored and revised. For this reason, in this paper, the problem of automatically assessing the quality of Wikipedia articles is considered. In particular, the focus is (i) on the analysis of groups of hand-crafted features that can be employed by supervised machine learning techniques to classify Wikipedia articles on qualitative bases, and (ii) on the analysis of some issues behind the construction of a suitable ground truth. Evaluations are performed, on the analyzed features and on a specifically built labeled dataset, by implementing different supervised classifiers based on distinct machine learning algorithms, which produced promising results.

Viviani, M., Bassani, E. (2019). Quality of Wikipedia articles: Analyzing features and building a ground truth for supervised classification. Intervento presentato a: The 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management (KDIR), Vienna, Austria [10.5220/0008149303380346].

Quality of Wikipedia articles: Analyzing features and building a ground truth for supervised classification

Viviani, Marco;Bassani, Elias

2019

Abstract

Wikipedia is nowadays one of the biggest online resources on which users rely as a source of information. The amount of collaboratively generated content that is sent to the online encyclopedia every day can let to the possible creation of low-quality articles (and, consequently, misinformation) if not properly monitored and revised. For this reason, in this paper, the problem of automatically assessing the quality of Wikipedia articles is considered. In particular, the focus is (i) on the analysis of groups of hand-crafted features that can be employed by supervised machine learning techniques to classify Wikipedia articles on qualitative bases, and (ii) on the analysis of some issues behind the construction of a suitable ground truth. Evaluations are performed, on the analyzed features and on a specifically built labeled dataset, by implementing different supervised classifiers based on distinct machine learning algorithms, which produced promising results.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				slide + paper
			
	Parole chiave
	
				Data Quality, Wikipedia, Supervised Classification, Feature Analysis, Ground Truth Building
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				The 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management (KDIR)
			
	Anno del convegno
	
				2019
			
	ISBN del volume degli atti
	
				9789897583827
			
	Data di pubblicazione
	
				2019
			
	Numero del volume
	
				1
			
	Pagina iniziale
	
				338
			
	Pagina finale
	
				346
			
	DOI dell'intervento
	
				https://dx.doi.org/10.5220/0008149303380346
			
	Fulltext
	
				none
			
	Citazione
	
				Viviani, M., Bassani, E. (2019). Quality of Wikipedia articles: Analyzing features and building a ground truth for supervised classification. Intervento presentato a: The 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management (KDIR), Vienna, Austria [10.5220/0008149303380346].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/249678

Citazioni

4

3

Social impact