Next-generation sequencing (NGS) technologies have deeply changed our understanding of cellular processes by delivering an astonishing amount of data at affordable prices; nowadays, many biology laboratories have already accumulated a large number of sequenced samples. However, managing and analyzing these data poses new challenges, which may easily be underestimated by research groups devoid of IT and quantitative skills. In this perspective, we identify five issues that should be carefully addressed by research groups approaching NGS technologies. In particular, the five key issues to be considered concern: (1) adopting a laboratory management system (LIMS) and safeguard the resulting raw data structure in downstream analyses; (2) monitoring the flow of the data and standardizing input and output directories and file names, even when multiple analysis protocols are used on the same data; (3) ensuring complete traceability of the analysis performed; (4) enabling non-experienced users to run analyses through a graphical user interface (GUI) acting as a front-end for the pipelines; (5) relying on standard metadata to annotate the datasets, and when possible using controlled vocabularies, ideally derived from biomedical ontologies. Finally, we discuss the currently available tools in the light of these issues, and we introduce HTS-flow, a new workflow management system conceived to address the concerns we raised. HTS-flow is able to retrieve information from a LIMS database, manages data analyses through a simple GUI, outputs data in standard locations and allows the complete traceability of datasets, accompanying metadata and analysis scripts.

Bianchi, V., Ceol, A., Ogier, A., de Pretis, S., Galeota, E., Kishore, K., et al. (2016). Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions. FRONTIERS IN GENETICS, 7(MAY) [10.3389/fgene.2016.00075].

Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions

Pelizzola M
Ultimo
2016

Abstract

Next-generation sequencing (NGS) technologies have deeply changed our understanding of cellular processes by delivering an astonishing amount of data at affordable prices; nowadays, many biology laboratories have already accumulated a large number of sequenced samples. However, managing and analyzing these data poses new challenges, which may easily be underestimated by research groups devoid of IT and quantitative skills. In this perspective, we identify five issues that should be carefully addressed by research groups approaching NGS technologies. In particular, the five key issues to be considered concern: (1) adopting a laboratory management system (LIMS) and safeguard the resulting raw data structure in downstream analyses; (2) monitoring the flow of the data and standardizing input and output directories and file names, even when multiple analysis protocols are used on the same data; (3) ensuring complete traceability of the analysis performed; (4) enabling non-experienced users to run analyses through a graphical user interface (GUI) acting as a front-end for the pipelines; (5) relying on standard metadata to annotate the datasets, and when possible using controlled vocabularies, ideally derived from biomedical ontologies. Finally, we discuss the currently available tools in the light of these issues, and we introduce HTS-flow, a new workflow management system conceived to address the concerns we raised. HTS-flow is able to retrieve information from a LIMS database, manages data analyses through a simple GUI, outputs data in standard locations and allows the complete traceability of datasets, accompanying metadata and analysis scripts.
Articolo in rivista - Articolo scientifico
Epigenomics; Genomics; High-throughput sequencing; Laboratory information management system; Workflow management system;
English
2016
7
MAY
75
open
Bianchi, V., Ceol, A., Ogier, A., de Pretis, S., Galeota, E., Kishore, K., et al. (2016). Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions. FRONTIERS IN GENETICS, 7(MAY) [10.3389/fgene.2016.00075].
File in questo prodotto:
File Dimensione Formato  
Bianchi-2016-Front Genet-VoR.pdf

accesso aperto

Descrizione: Article
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 923.77 kB
Formato Adobe PDF
923.77 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/446760
Citazioni
  • Scopus 36
  • ???jsp.display-item.citation.isi??? 32
Social impact