Reconstructing the evolutionary history of a set of species is a central task in computational biology. In real data, it is often the case that some information is missing: the Incomplete Directed Perfect Phylogeny (IDPP) problem asks, given a collection of species described by a set of binary characters with some unknown states, to complete the missing states in such a way that the result can be explained with a directed perfect phylogeny. Pe’er et al. [SICOMP 2004] proposed a solution that takes O~ (nm) time (the O~ (· ) notation suppresses polylog factors) for n species and m characters. Their algorithm relies on pre-existing dynamic connectivity data structures: a computational study recently conducted by Fernández-Baca and Liu showed that, in this context, complex data structures perform worse than simpler ones with worse asymptotic bounds. This gives us the motivation to look into the particular properties of the dynamic connectivity problem in this setting, so as to avoid the use of sophisticated data structures as a blackbox. Not only are we successful in doing so, and give a much simpler O(nmlog n) -time algorithm for the IDPP problem; our insights into the specific structure of the problem lead to an asymptotically optimal O(nm) -time algorithm.

Bernardini, G., Bonizzoni, P., Gawrychowski, P. (2021). Incomplete Directed Perfect Phylogeny in Linear Time. In Algorithms and Data Structures (pp.172-185). GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND : Springer Science and Business Media Deutschland GmbH [10.1007/978-3-030-83508-8_13].

Incomplete Directed Perfect Phylogeny in Linear Time

Bernardini G.
Co-primo
;
Bonizzoni P.
Co-primo
;
2021

Abstract

Reconstructing the evolutionary history of a set of species is a central task in computational biology. In real data, it is often the case that some information is missing: the Incomplete Directed Perfect Phylogeny (IDPP) problem asks, given a collection of species described by a set of binary characters with some unknown states, to complete the missing states in such a way that the result can be explained with a directed perfect phylogeny. Pe’er et al. [SICOMP 2004] proposed a solution that takes O~ (nm) time (the O~ (· ) notation suppresses polylog factors) for n species and m characters. Their algorithm relies on pre-existing dynamic connectivity data structures: a computational study recently conducted by Fernández-Baca and Liu showed that, in this context, complex data structures perform worse than simpler ones with worse asymptotic bounds. This gives us the motivation to look into the particular properties of the dynamic connectivity problem in this setting, so as to avoid the use of sophisticated data structures as a blackbox. Not only are we successful in doing so, and give a much simpler O(nmlog n) -time algorithm for the IDPP problem; our insights into the specific structure of the problem lead to an asymptotically optimal O(nm) -time algorithm.
paper
Perfect phylogeny, connected components, graph theory
English
International Symposium on Algorithms and Data Structures, WADS 2021
2021
Algorithms and Data Structures
978-3-030-83507-1
2021
12808
172
185
none
Bernardini, G., Bonizzoni, P., Gawrychowski, P. (2021). Incomplete Directed Perfect Phylogeny in Linear Time. In Algorithms and Data Structures (pp.172-185). GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND : Springer Science and Business Media Deutschland GmbH [10.1007/978-3-030-83508-8_13].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/337509
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
Social impact