The application of next-generation sequencing instruments generates a tremendous amount of sequencing data. This leads to a challenging bioinformatics problem to store, manage and analyze terabytes of sequencing data often generated from extremely different data-sources. Our project is mainly focused on the sequence analysis of human cancer genomes, in order to identify the genetic lesions underlying the development of tumors. However, the automated detection procedure of somatic mutations and a statistical based testing procedure to identify genetic lesions are still an open problem. Therefore, we propose a computational procedure to manage large scale sequencing data in order to detect exonic somatic mutations in a tumor sample. The proposed pipeline includes several steps based on open-source softwares and R language: alignment, detection of mutations, annotation, functional classification and visualization of results. We analyzed whole exome sequencing data from 3 leukemic patients and 3 paired controls plus 1 colon cancer sample and paired control. The results were validated by Sanger sequencing. © Springer-Verlag 2012.
Spinelli, R., Piazza, R., Pirola, A., Valletta, S., Rostagno, R., Mogavero, A., et al. (2012). A bioinformatics procedure to identify and annotate somatic mutations in whole-exome sequencing data. In Computational Intelligence Methods for Bioinformatic and Biostatistics (pp. 73-82). Springer-Verlag Berlin Heidelberg [10.1007/978-3-642-35686-5_7].
A bioinformatics procedure to identify and annotate somatic mutations in whole-exome sequencing data
SPINELLI, ROBERTA
;PIAZZA, ROCCO GIOVANNI;PIROLA, ALESSANDRA;VALLETTA, SIMONA;ROSTAGNO, ROBERTA;MOGAVERO, ANGELA;MAREGA, MANUELA;KUNDANINGATTU RAMAN, HIMA;GAMBACORTI PASSERINI, CARLO
2012
Abstract
The application of next-generation sequencing instruments generates a tremendous amount of sequencing data. This leads to a challenging bioinformatics problem to store, manage and analyze terabytes of sequencing data often generated from extremely different data-sources. Our project is mainly focused on the sequence analysis of human cancer genomes, in order to identify the genetic lesions underlying the development of tumors. However, the automated detection procedure of somatic mutations and a statistical based testing procedure to identify genetic lesions are still an open problem. Therefore, we propose a computational procedure to manage large scale sequencing data in order to detect exonic somatic mutations in a tumor sample. The proposed pipeline includes several steps based on open-source softwares and R language: alignment, detection of mutations, annotation, functional classification and visualization of results. We analyzed whole exome sequencing data from 3 leukemic patients and 3 paired controls plus 1 colon cancer sample and paired control. The results were validated by Sanger sequencing. © Springer-Verlag 2012.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.