This paper investigates the decision making process aided by machine learning for biomedical problems and how to improve it through meta assessments of the most relevant features. Classification algorithms are usually trained and exploited with high dimensional datasets (i.e., with an extremely large number of features), which is inefficient and costly. It would be beneficial to identify the most meaningful features that contribute the most to assigning a category to a subject, and in particular, diagnosing a pathological condition. A helpful support can come from cooperative game theory, through the computation of the Shapley value, an indicator of desirable properties according to which the players, in our case the input features, can be ranked. We apply such a framework to a supervised machine learning scenario of a random forest tree classifier applied to heart disease detection. From a publicly available dataset, we identify the most relevant features that can affect the decision, thus obtaining practical guidelines for a compact yet efficient description based on an analytical rationale.
Scapin, D., Cisotto, G., Gindullina, E., Badia, L. (2022). Shapley Value as an Aid to Biomedical Machine Learning: a Heart Disease Dataset Analysis. In Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022 (pp.933-939). Institute of Electrical and Electronics Engineers Inc. [10.1109/CCGrid54584.2022.00113].
Shapley Value as an Aid to Biomedical Machine Learning: a Heart Disease Dataset Analysis
Cisotto, G;
2022
Abstract
This paper investigates the decision making process aided by machine learning for biomedical problems and how to improve it through meta assessments of the most relevant features. Classification algorithms are usually trained and exploited with high dimensional datasets (i.e., with an extremely large number of features), which is inefficient and costly. It would be beneficial to identify the most meaningful features that contribute the most to assigning a category to a subject, and in particular, diagnosing a pathological condition. A helpful support can come from cooperative game theory, through the computation of the Shapley value, an indicator of desirable properties according to which the players, in our case the input features, can be ranked. We apply such a framework to a supervised machine learning scenario of a random forest tree classifier applied to heart disease detection. From a publicly available dataset, we identify the most relevant features that can affect the decision, thus obtaining practical guidelines for a compact yet efficient description based on an analytical rationale.File | Dimensione | Formato | |
---|---|---|---|
Scapin-2022-CCGrid-VoR.pdf
Solo gestori archivio
Descrizione: Proceedings - IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID)
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Tutti i diritti riservati
Dimensione
338.55 kB
Formato
Adobe PDF
|
338.55 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.