Computational classification systems built using machine learning (ML) techniques are increasingly being evaluated and employed in medical settings for a number of purposes and applications, including diagnosis, prognosis, and risk stratification. However, evaluation and validation practices that are commonly used and adopted in the application of ML to other disciplines are unlikely to be meaningfully applicable to medicine. In fact, otherwise, technically sound systems have been found to perform poorly in real settings, a concept that has been termed the “last mile of implementation.” In this chapter, we will focus on three main factors underlying the so-called last mile: the impact of observer variability on ground truth reliability; the meaningful and appropriateness of commonly adopted performance measures; and the issue of replicability in ML studies. We will discuss the above mentioned issues, and we will delineate possible solutions and concepts to address them.

Cabitza, F., Campagner, A. (2024). Towards Better Ways to Assess Predictive Computing in Medicine: On Reliability, Robustness, and Utility. In B. Carpentieri, P. Lecca (a cura di), Big Data Analysis and Artificial Intelligence for Medical Sciences (pp. 309-337). wiley [10.1002/9781119846567.ch14].

Towards Better Ways to Assess Predictive Computing in Medicine: On Reliability, Robustness, and Utility

Cabitza F.;Campagner A.
2024

Abstract

Computational classification systems built using machine learning (ML) techniques are increasingly being evaluated and employed in medical settings for a number of purposes and applications, including diagnosis, prognosis, and risk stratification. However, evaluation and validation practices that are commonly used and adopted in the application of ML to other disciplines are unlikely to be meaningfully applicable to medicine. In fact, otherwise, technically sound systems have been found to perform poorly in real settings, a concept that has been termed the “last mile of implementation.” In this chapter, we will focus on three main factors underlying the so-called last mile: the impact of observer variability on ground truth reliability; the meaningful and appropriateness of commonly adopted performance measures; and the issue of replicability in ML studies. We will discuss the above mentioned issues, and we will delineate possible solutions and concepts to address them.
Capitolo o saggio
medical machine learning; reliability; replicability; robustness; utility;
English
Big Data Analysis and Artificial Intelligence for Medical Sciences
Carpentieri, B; Lecca, P
2024
9781119846536
wiley
309
337
Cabitza, F., Campagner, A. (2024). Towards Better Ways to Assess Predictive Computing in Medicine: On Reliability, Robustness, and Utility. In B. Carpentieri, P. Lecca (a cura di), Big Data Analysis and Artificial Intelligence for Medical Sciences (pp. 309-337). wiley [10.1002/9781119846567.ch14].
none
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/528385
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
Social impact