The reconstruction of the two distinct copies of each chromosome, called haplotypes, is an essential process for the characterization of the genome of an individual. Here we address a successful approach for haplotype assembly, called the weighted Minimum Error Correction (wMEC) problem, which consists in computing the two haplotypes that partition the sequencing reads into two disjoint sub-sets with the least number of corrections to the Single Nucleotide Polymorphisms values. To solve this problem we propose GenHap, a computational method based on Genetic Algorithms, which are able to obtain optimal solutions thanks to a global search process. To evaluate the effectiveness of GenHap, we test it on a synthetic (yet realistic) dataset based on the PacBio RS II sequencing technology. We compare the performance of GenHap against HapCol, an efficient state-of-the-art algorithm for haplotype assembly. We show that GenHap always obtains high accuracy solutions (in terms of haplotype error rate), and is up to 20× faster than HapCol on this synthetic (yet realistic) dataset.
Tangherloni, A., Spolaor, S., Rundo, L., Nobile, M., Cazzaniga, P., Mauri, G., et al. (2018). GenHap: Evolutionary Computation For Haplotype Assembly. Intervento presentato a: Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics, Caparica, Portugal.
GenHap: Evolutionary Computation For Haplotype Assembly
Tangherloni, A;Spolaor, S;Rundo, L;Nobile, M;Cazzaniga, P;Mauri, G;Besozzi, D;Merelli, I
2018
Abstract
The reconstruction of the two distinct copies of each chromosome, called haplotypes, is an essential process for the characterization of the genome of an individual. Here we address a successful approach for haplotype assembly, called the weighted Minimum Error Correction (wMEC) problem, which consists in computing the two haplotypes that partition the sequencing reads into two disjoint sub-sets with the least number of corrections to the Single Nucleotide Polymorphisms values. To solve this problem we propose GenHap, a computational method based on Genetic Algorithms, which are able to obtain optimal solutions thanks to a global search process. To evaluate the effectiveness of GenHap, we test it on a synthetic (yet realistic) dataset based on the PacBio RS II sequencing technology. We compare the performance of GenHap against HapCol, an efficient state-of-the-art algorithm for haplotype assembly. We show that GenHap always obtains high accuracy solutions (in terms of haplotype error rate), and is up to 20× faster than HapCol on this synthetic (yet realistic) dataset.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.