The massive amount of genomic data appearing over the past two years for SARS-CoV-2 has challenged traditional methods for studying the dynamics of the COVID-19 pandemic. As a result, new methods, such as the Pangolin tool, have appeared which can scale to the millions of samples of SARS-CoV-2 currently available. Such a tool is tailored to take assembled, aligned and curated full-length sequences, such as those provided by GISAID, as input. As high-throughput sequencing technologies continue to advance, such assembly, alignment and curation may become a bottleneck, creating a need for methods which can process raw sequencing reads directly. In this paper, we propose several alignment-free embedding approaches, which can generate a fixed-length feature vector representation directly from the raw sequencing reads, without the need for assembly. Moreover, because such an embedding is a numerical representation, it can be passed to already highly optimized clustering methods such as k-mea...

Chourasia, P., Ali, S., Ciccolella, S., Della Vedova, G., Patterson, M. (2022). Clustering SARS-CoV-2 Variants from Raw High-Throughput Sequencing Reads Data. In Computational Advances in Bio and Medical Sciences. 11th International Conference, ICCABS 2021 (pp.133-148). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-17531-2_11].

Clustering SARS-CoV-2 Variants from Raw High-Throughput Sequencing Reads Data

Ciccolella, S;Della Vedova, G;
2022

Abstract

The massive amount of genomic data appearing over the past two years for SARS-CoV-2 has challenged traditional methods for studying the dynamics of the COVID-19 pandemic. As a result, new methods, such as the Pangolin tool, have appeared which can scale to the millions of samples of SARS-CoV-2 currently available. Such a tool is tailored to take assembled, aligned and curated full-length sequences, such as those provided by GISAID, as input. As high-throughput sequencing technologies continue to advance, such assembly, alignment and curation may become a bottleneck, creating a need for methods which can process raw sequencing reads directly. In this paper, we propose several alignment-free embedding approaches, which can generate a fixed-length feature vector representation directly from the raw sequencing reads, without the need for assembly. Moreover, because such an embedding is a numerical representation, it can be passed to already highly optimized clustering methods such as k-mea...
paper
Alignment-free; Assembly; Clustering; High-throughput sequencing; SARS-CoV-2;
English
2021 11th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS)
2021
Bansal, MS; Mandoiu, I; Moussa, M; Patterson, M; Rajasekaran, S; Skums, P; Zelikovsky, A
Computational Advances in Bio and Medical Sciences. 11th International Conference, ICCABS 2021
9783031175305
2022
13254
133
148
none
Chourasia, P., Ali, S., Ciccolella, S., Della Vedova, G., Patterson, M. (2022). Clustering SARS-CoV-2 Variants from Raw High-Throughput Sequencing Reads Data. In Computational Advances in Bio and Medical Sciences. 11th International Conference, ICCABS 2021 (pp.133-148). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-17531-2_11].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/400922
Citazioni
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
Social impact