Bicocca Open Archive

Computational classification systems built using machine learning (ML) techniques are increasingly being evaluated and employed in medical settings for a number of purposes and applications, including diagnosis, prognosis, and risk stratification. However, evaluation and validation practices that are commonly used and adopted in the application of ML to other disciplines are unlikely to be meaningfully applicable to medicine. In fact, otherwise, technically sound systems have been found to perform poorly in real settings, a concept that has been termed the “last mile of implementation.” In this chapter, we will focus on three main factors underlying the so-called last mile: the impact of observer variability on ground truth reliability; the meaningful and appropriateness of commonly adopted performance measures; and the issue of replicability in ML studies. We will discuss the above mentioned issues, and we will delineate possible solutions and concepts to address them.

Cabitza, F., Campagner, A. (2024). Towards Better Ways to Assess Predictive Computing in Medicine: On Reliability, Robustness, and Utility. In B. Carpentieri, P. Lecca (a cura di), Big Data Analysis and Artificial Intelligence for Medical Sciences (pp. 309-337). wiley [10.1002/9781119846567.ch14].

Towards Better Ways to Assess Predictive Computing in Medicine: On Reliability, Robustness, and Utility

Cabitza F.;Campagner A.

2024

Abstract

Computational classification systems built using machine learning (ML) techniques are increasingly being evaluated and employed in medical settings for a number of purposes and applications, including diagnosis, prognosis, and risk stratification. However, evaluation and validation practices that are commonly used and adopted in the application of ML to other disciplines are unlikely to be meaningfully applicable to medicine. In fact, otherwise, technically sound systems have been found to perform poorly in real settings, a concept that has been termed the “last mile of implementation.” In this chapter, we will focus on three main factors underlying the so-called last mile: the impact of observer variability on ground truth reliability; the meaningful and appropriateness of commonly adopted performance measures; and the issue of replicability in ML studies. We will discuss the above mentioned issues, and we will delineate possible solutions and concepts to address them.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Capitolo o saggio
			
	Parole chiave
	
				medical machine learning; reliability; replicability; robustness; utility;
			
	Lingua del contenuto
	
				English
			
	Titolo del volume
	
				Big Data Analysis and Artificial Intelligence for Medical Sciences
			
	Curatori del volume
	
				Carpentieri, B; Lecca, P
			
	Data di pubblicazione
	
				2024
			
	ISBN del volume
	
				9781119846536
			
	Editore
	
				wiley
			
	Pagina iniziale
	
				309
			
	Pagina finale
	
				337
			
	DOI del contributo
	
				https://dx.doi.org/10.1002/9781119846567.ch14
			
	Citazione
	
				Cabitza, F., Campagner, A. (2024). Towards Better Ways to Assess Predictive Computing in Medicine: On Reliability, Robustness, and Utility. In B. Carpentieri, P. Lecca (a cura di), Big Data Analysis and Artificial Intelligence for Medical Sciences (pp. 309-337). wiley [10.1002/9781119846567.ch14].
			
	Fulltext
	
				none
			
	Appare nelle tipologie:
	
				03 - Contributo in libro

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/528385

Citazioni

0

ND

Social impact