Bicocca Open Archive

Motivation: Comparative genome analysis of two or more whole-genome sequenced (WGS) samples is at the core of most applications in genomics. These include the discovery of genomic differences segregating in populations, case-control analysis in common diseases and diagnosing rare disorders. With the current progress of accurate long-read sequencing technologies (e.g. circular consensus sequencing from PacBio sequencers), we can dive into studying repeat regions of the genome (e.g. segmental duplications) and hard-to-detect variants (e.g. complex structural variants). Results: We propose a novel framework for comparative genome analysis through the discovery of strings that are specific to one genome ('samples-specific' strings). We have developed a novel, accurate and efficient computational method for the discovery of sample-specific strings between two groups of WGS samples. The proposed approach will give us the ability to perform comparative genome analysis without the need to map the reads and is not hindered by shortcomings of the reference genome and mapping algorithms. We show that the proposed approach is capable of accurately finding sample-specific strings representing nearly all variation (>98%) reported across pairs or trios of WGS samples using accurate long reads (e.g. PacBio HiFi data).

Khorsand, P., Denti, L., Bonizzoni, P., Chikhi, R., Hormozdiari, F. (2021). Comparative genome analysis using sample-specific string detection in accurate long reads. BIOINFORMATICS ADVANCES, 1(1) [10.1093/bioadv/vbab005].

Comparative genome analysis using sample-specific string detection in accurate long reads

Khorsand, Parsoa^Co-primo;Denti, Luca^Co-primo;Bonizzoni, Paola^Co-ultimo;Chikhi, Rayan;Hormozdiari, Fereydoun

2021

Abstract

Motivation: Comparative genome analysis of two or more whole-genome sequenced (WGS) samples is at the core of most applications in genomics. These include the discovery of genomic differences segregating in populations, case-control analysis in common diseases and diagnosing rare disorders. With the current progress of accurate long-read sequencing technologies (e.g. circular consensus sequencing from PacBio sequencers), we can dive into studying repeat regions of the genome (e.g. segmental duplications) and hard-to-detect variants (e.g. complex structural variants). Results: We propose a novel framework for comparative genome analysis through the discovery of strings that are specific to one genome ('samples-specific' strings). We have developed a novel, accurate and efficient computational method for the discovery of sample-specific strings between two groups of WGS samples. The proposed approach will give us the ability to perform comparative genome analysis without the need to map the reads and is not hindered by shortcomings of the reference genome and mapping algorithms. We show that the proposed approach is capable of accurately finding sample-specific strings representing nearly all variation (>98%) reported across pairs or trios of WGS samples using accurate long reads (e.g. PacBio HiFi data).

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Long reads, FM-index, structural variants;
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				31-mag-2021
			
	Data di pubblicazione
	
				2021
			
	Rivista
	
				BIOINFORMATICS ADVANCES
			
	Numero del volume
	
				1
			
	Fascicolo
	
				1
			
	Article number
	
				vbab005
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1093/bioadv/vbab005
			
	Fulltext
	
				open
			
	Citazione
	
				Khorsand, P., Denti, L., Bonizzoni, P., Chikhi, R., Hormozdiari, F. (2021). Comparative genome analysis using sample-specific string detection in accurate long reads. BIOINFORMATICS ADVANCES, 1(1) [10.1093/bioadv/vbab005].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
stringhe-specifiche.pdf accesso aperto Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Dimensione 507.9 kB Formato Adobe PDF Visualizza/Apri	507.9 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/337527

Citazioni

3

4

Social impact