The string graph for a collection of next-generation reads is a lossless data representation that is fundamental for de novo assemblers based on the overlap-layout-consensus paradigm. In this paper, we explore a novel approach to compute the string graph, based on the FMindex and Burrows-Wheeler Transform (BWT). We describe a simple algorithm that uses only the FM-index representation of the collection of reads to construct the string graph, without accessing the input reads. Our algorithm has been integrated into the SGA assembler as a standalone module to construct the string graph. The new integrated assembler has been assessed on a standard benchmark, showing that FSG is significantly faster than SGA while maintaining a moderate use of main memory, and showing practical advantages in running FSG on multiple threads.
Bonizzoni, P., DELLA VEDOVA, G., Pirola, Y., Previtali, M., Rizzi, R. (2016). FSG: Fast string graph construction for de novo assembly of reads data. In 12th International Symposium on Bioinformatics Research and Applications, ISBRA 2016; Minsk; Belarus; 5 June 2016 through 8 June 2016 (pp.27-39). Springer Verlag [10.1007/978-3-319-38782-6_3].
FSG: Fast string graph construction for de novo assembly of reads data
BONIZZONI, PAOLA;DELLA VEDOVA, GIANLUCA;PIROLA, YURI;PREVITALI, MARCO
;RIZZI, RAFFAELLA
2016
Abstract
The string graph for a collection of next-generation reads is a lossless data representation that is fundamental for de novo assemblers based on the overlap-layout-consensus paradigm. In this paper, we explore a novel approach to compute the string graph, based on the FMindex and Burrows-Wheeler Transform (BWT). We describe a simple algorithm that uses only the FM-index representation of the collection of reads to construct the string graph, without accessing the input reads. Our algorithm has been integrated into the SGA assembler as a standalone module to construct the string graph. The new integrated assembler has been assessed on a standard benchmark, showing that FSG is significantly faster than SGA while maintaining a moderate use of main memory, and showing practical advantages in running FSG on multiple threads.File | Dimensione | Formato | |
---|---|---|---|
conf-paper-16-isbra.pdf
Solo gestori archivio
Descrizione: Articolo principale
Dimensione
219.87 kB
Formato
Adobe PDF
|
219.87 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.