In diploid genomes, haplotype assembly is the computational problem of reconstructing the two parental copies, called haplotypes, of each chromosome starting from sequencing reads, called fragments, possibly affected by sequencing errors. Minimum error correction (MEC) is a prominent computational problem for haplotype assembly and, given a set of fragments, aims at reconstructing the two haplotypes by applying the minimum number of base corrections. MEC is computationally hard to solve, but some approximation-based or fixed-parameter approaches have been proved capable of obtaining accurate results on real data. In this work, we expand the current characterization of the computational complexity of MEC from the approximation and the fixed-parameter tractability point of view. In particular, we show that MEC is not approximable within a constant factor, whereas it is approximable within a logarithmic factor in the size of the input. Furthermore, we answer open questions on the fixed-parameter tractability for parameters of classical or practical interest: the total number of corrections and the fragment length. In addition, we present a direct 2-approximation algorithm for a variant of the problem that has also been applied in the framework of clustering data. Finally, since polyploid genomes, such as those of plants and fishes, are composed of more than two copies of the chromosomes, we introduce a novel formulation of MEC, namely the k-ploid MEC problem, that extends the traditional problem to deal with polyploid genomes. We show that the novel formulation is still both computationally hard and hard to approximate. Nonetheless, from the parameterized point of view, we prove that the problem is tractable for parameters of practical interest such as the number of haplotypes and the coverage, or the number of haplotypes and the fragment length.

Bonizzoni, P., Dondi, R., Klau, G., Pirola, Y., Pisanti, N., Zaccaria, S. (2016). On the Minimum Error Correction Problem for Haplotype Assembly in Diploid and Polyploid Genomes. JOURNAL OF COMPUTATIONAL BIOLOGY, 23(9), 718-736 [10.1089/cmb.2015.0220].

On the Minimum Error Correction Problem for Haplotype Assembly in Diploid and Polyploid Genomes

BONIZZONI, PAOLA
;
PIROLA, YURI;ZACCARIA, SIMONE
2016

Abstract

In diploid genomes, haplotype assembly is the computational problem of reconstructing the two parental copies, called haplotypes, of each chromosome starting from sequencing reads, called fragments, possibly affected by sequencing errors. Minimum error correction (MEC) is a prominent computational problem for haplotype assembly and, given a set of fragments, aims at reconstructing the two haplotypes by applying the minimum number of base corrections. MEC is computationally hard to solve, but some approximation-based or fixed-parameter approaches have been proved capable of obtaining accurate results on real data. In this work, we expand the current characterization of the computational complexity of MEC from the approximation and the fixed-parameter tractability point of view. In particular, we show that MEC is not approximable within a constant factor, whereas it is approximable within a logarithmic factor in the size of the input. Furthermore, we answer open questions on the fixed-parameter tractability for parameters of classical or practical interest: the total number of corrections and the fragment length. In addition, we present a direct 2-approximation algorithm for a variant of the problem that has also been applied in the framework of clustering data. Finally, since polyploid genomes, such as those of plants and fishes, are composed of more than two copies of the chromosomes, we introduce a novel formulation of MEC, namely the k-ploid MEC problem, that extends the traditional problem to deal with polyploid genomes. We show that the novel formulation is still both computationally hard and hard to approximate. Nonetheless, from the parameterized point of view, we prove that the problem is tractable for parameters of practical interest such as the number of haplotypes and the coverage, or the number of haplotypes and the fragment length.
Articolo in rivista - Articolo scientifico
combinatorial optimization; graph theory; haplotypes; next-generation sequencing;
combinatorial optimization; graph theory; haplotypes; next-generation sequencing; Modeling and Simulation; Molecular Biology; Genetics; Computational Theory and Mathematics; Computational Mathematics
English
2016
23
9
718
736
reserved
Bonizzoni, P., Dondi, R., Klau, G., Pirola, Y., Pisanti, N., Zaccaria, S. (2016). On the Minimum Error Correction Problem for Haplotype Assembly in Diploid and Polyploid Genomes. JOURNAL OF COMPUTATIONAL BIOLOGY, 23(9), 718-736 [10.1089/cmb.2015.0220].
File in questo prodotto:
File Dimensione Formato  
journ-art-16-jcb.pdf

Solo gestori archivio

Descrizione: Articolo principale
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Dimensione 582.11 kB
Formato Adobe PDF
582.11 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/132174
Citazioni
  • Scopus 26
  • ???jsp.display-item.citation.isi??? 22
Social impact