As the availability of big biomedical data advances, there is a growing need of university students trained professionally on analyzing these data and correctly interpreting their results. We propose here a study plan for a master's degree course on biomedical data science, by describing our experience during the last academic year. In our university course, we explained how to find an open biomedical dataset, how to correctly clean it and how to prepare it for a computational statistics or machine learning phase. By doing so, we introduce common health data science terms and explained how to avoid common mistakes in the process. Moreover, we clarified how to perform an exploratory data analysis (EDA) and how to reasonably interpret its results. We also described how to properly execute a supervised or unsupervised machine learning analysis, and now to understand and interpret its outcomes. Eventually, we explained how to validate the findings obtained. We illustrated all these steps in the context of open science principles, by suggesting to the students to use only open source programming languages (R or Python in particular), open biomedical data (if available), and open access scientific articles (if possible). We believe our teaching proposal can be useful and of interest for anyone wanting to start to prepare a course on biomedical data science.
Chicco, D., Coelho, V. (2025). A teaching proposal for a short course on biomedical data science. PLOS COMPUTATIONAL BIOLOGY, 21(4) [10.1371/journal.pcbi.1012946].
A teaching proposal for a short course on biomedical data science
Chicco D.
Primo
;Coelho V.
2025
Abstract
As the availability of big biomedical data advances, there is a growing need of university students trained professionally on analyzing these data and correctly interpreting their results. We propose here a study plan for a master's degree course on biomedical data science, by describing our experience during the last academic year. In our university course, we explained how to find an open biomedical dataset, how to correctly clean it and how to prepare it for a computational statistics or machine learning phase. By doing so, we introduce common health data science terms and explained how to avoid common mistakes in the process. Moreover, we clarified how to perform an exploratory data analysis (EDA) and how to reasonably interpret its results. We also described how to properly execute a supervised or unsupervised machine learning analysis, and now to understand and interpret its outcomes. Eventually, we explained how to validate the findings obtained. We illustrated all these steps in the context of open science principles, by suggesting to the students to use only open source programming languages (R or Python in particular), open biomedical data (if available), and open access scientific articles (if possible). We believe our teaching proposal can be useful and of interest for anyone wanting to start to prepare a course on biomedical data science.File | Dimensione | Formato | |
---|---|---|---|
Chicco-2025-PLoS Computational Biology-VoR.pdf
accesso aperto
Descrizione: CC BY 4.0 This is an open access article distributed under the terms of the Creative Commons Attribution License
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Creative Commons
Dimensione
2.25 MB
Formato
Adobe PDF
|
2.25 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.