As the availability of big biomedical data advances, there is a growing need of university students trained professionally on analyzing these data and correctly interpreting their results. We propose here a study plan for a master's degree course on biomedical data science, by describing our experience during the last academic year. In our university course, we explained how to find an open biomedical dataset, how to correctly clean it and how to prepare it for a computational statistics or machine learning phase. By doing so, we introduce common health data science terms and explained how to avoid common mistakes in the process. Moreover, we clarified how to perform an exploratory data analysis (EDA) and how to reasonably interpret its results. We also described how to properly execute a supervised or unsupervised machine learning analysis, and now to understand and interpret its outcomes. Eventually, we explained how to validate the findings obtained. We illustrated all these steps in the context of open science principles, by suggesting to the students to use only open source programming languages (R or Python in particular), open biomedical data (if available), and open access scientific articles (if possible). We believe our teaching proposal can be useful and of interest for anyone wanting to start to prepare a course on biomedical data science.

Chicco, D., Coelho, V. (2025). A teaching proposal for a short course on biomedical data science. PLOS COMPUTATIONAL BIOLOGY, 21(4) [10.1371/journal.pcbi.1012946].

A teaching proposal for a short course on biomedical data science

Chicco D.
Primo
;
Coelho V.
2025

Abstract

As the availability of big biomedical data advances, there is a growing need of university students trained professionally on analyzing these data and correctly interpreting their results. We propose here a study plan for a master's degree course on biomedical data science, by describing our experience during the last academic year. In our university course, we explained how to find an open biomedical dataset, how to correctly clean it and how to prepare it for a computational statistics or machine learning phase. By doing so, we introduce common health data science terms and explained how to avoid common mistakes in the process. Moreover, we clarified how to perform an exploratory data analysis (EDA) and how to reasonably interpret its results. We also described how to properly execute a supervised or unsupervised machine learning analysis, and now to understand and interpret its outcomes. Eventually, we explained how to validate the findings obtained. We illustrated all these steps in the context of open science principles, by suggesting to the students to use only open source programming languages (R or Python in particular), open biomedical data (if available), and open access scientific articles (if possible). We believe our teaching proposal can be useful and of interest for anyone wanting to start to prepare a course on biomedical data science.
Articolo in rivista - Articolo scientifico
health informatics, teaching, biomedical data science
English
14-apr-2025
2025
21
4
e1012946
open
Chicco, D., Coelho, V. (2025). A teaching proposal for a short course on biomedical data science. PLOS COMPUTATIONAL BIOLOGY, 21(4) [10.1371/journal.pcbi.1012946].
File in questo prodotto:
File Dimensione Formato  
Chicco-2025-PLoS Computational Biology-VoR.pdf

accesso aperto

Descrizione: CC BY 4.0 This is an open access article distributed under the terms of the Creative Commons Attribution License
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 2.25 MB
Formato Adobe PDF
2.25 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/550421
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
Social impact