Bicocca Open Archive

In recent years, Machine Learning (ML) has attracted wide interest as aid for decision makers in complex domains, such as medicine. Although domain experts are typically aware of the intrinsic uncertainty around it, the issue of Ground Truth (GT) quality has scarcely been addressed in the ML literature. GT quality is regularly assumed to be adequate, regardless of the number and skills of raters involved in data annotation. These factors can, however, potentially have a severe negative impact on the reliability of ML models. In this article we study the influence of GT quality, in terms of number of raters, their expertise, and their agreement level, on the performance of ML models. We introduce the concept of reduction: computational procedures by which to produce single-target GT from multi-rater settings. We propose three reductions, based on three-way decision, possibility theory, and probability theory. We provide characterizations of these reductions from the perspective of learning theory and propose two ML algorithms. We report the result of experiments, on both real-world medical and synthetic datasets, showing that GT quality strongly impacts on the performance of ML models, and that the proposed algorithms can better handle this form of uncertainty compared with state-of-the-art approaches.

Campagner, A., Ciucci, D., Svensson, C., Figge, M., Cabitza, F. (2021). Ground truthing from multi-rater labeling with three-way decision and possibility theory. INFORMATION SCIENCES, 545, 771-790 [10.1016/j.ins.2020.09.049].

Ground truthing from multi-rater labeling with three-way decision and possibility theory

Campagner A.;Ciucci D.;Svensson C. -M.;Figge M. T.;Cabitza F.

2021

Abstract

In recent years, Machine Learning (ML) has attracted wide interest as aid for decision makers in complex domains, such as medicine. Although domain experts are typically aware of the intrinsic uncertainty around it, the issue of Ground Truth (GT) quality has scarcely been addressed in the ML literature. GT quality is regularly assumed to be adequate, regardless of the number and skills of raters involved in data annotation. These factors can, however, potentially have a severe negative impact on the reliability of ML models. In this article we study the influence of GT quality, in terms of number of raters, their expertise, and their agreement level, on the performance of ML models. We introduce the concept of reduction: computational procedures by which to produce single-target GT from multi-rater settings. We propose three reductions, based on three-way decision, possibility theory, and probability theory. We provide characterizations of these reductions from the perspective of learning theory and propose two ML algorithms. We report the result of experiments, on both real-world medical and synthetic datasets, showing that GT quality strongly impacts on the performance of ML models, and that the proposed algorithms can better handle this form of uncertainty compared with state-of-the-art approaches.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Machine learning; Multi-rater; Possibility theory; Three-way decision; Uncertainty;
Machine learning; Multi-rater; Possibility theory; Three-way decision; Uncertainty
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				28-set-2020
			
	Data di pubblicazione
	
				2021
			
	Rivista
	
				INFORMATION SCIENCES
			
	Numero del volume
	
				545
			
	Pagina iniziale
	
				771
			
	Pagina finale
	
				790
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1016/j.ins.2020.09.049
			
	Fulltext
	
				none
			
	Citazione
	
				Campagner, A., Ciucci, D., Svensson, C., Figge, M., Cabitza, F. (2021). Ground truthing from multi-rater labeling with three-way decision and possibility theory. INFORMATION SCIENCES, 545, 771-790 [10.1016/j.ins.2020.09.049].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/289890

Citazioni

49

42

Social impact