Predicting the taxonomic affiliation of DNA sequences collected from biological samples is a fundamental step in biodiversity assessment. This task is performed by leveraging existing databases containing reference DNA sequences endowed with a taxonomic identification. However, environmental sequences can be from organisms that are either unknown to science or for which there are no reference sequences available. Thus, taxonomic novelty of a sequence needs to be accounted for when doing classification. We propose Bayesian nonparametric taxonomic classifiers, BayesANT, which use species sampling model priors to allow unobserved taxa to be discovered at each taxonomic rank. Using a simple product multinomial likelihood with conjugate Dirichlet priors at the lowest rank, a highly flexible supervised algorithm is developed to provide a probabilistic prediction of the taxa placement of each sequence at each rank. As an illustration, we run our algorithm on a carefully annotated library of Finnish arthropods (FinBOL). To assess the ability of BayesANT to recognize novelty and to predict known taxonomic affiliations correctly, we test it on two training-test splitting scenarios, each with a different proportion of taxa unobserved in training. We show how our algorithm attains accurate predictions and reliably quantifies classification uncertainty, especially when many sequences in the test set are affiliated to taxa unknown in training. By enabling taxonomic predictions for DNA barcodes to identify unseen branches, we believe BayesANT will be of broad utility as a tool for DNA metabarcoding within bioinformatics pipelines.

Zito, A., Rigon, T., Dunson, D. (2023). Inferring taxonomic placement from DNA barcoding aiding in discovery of new taxa. METHODS IN ECOLOGY AND EVOLUTION, 14(2 (February 2023)), 529-542 [10.1111/2041-210X.14009].

Inferring taxonomic placement from DNA barcoding aiding in discovery of new taxa

Rigon, T;
2023

Abstract

Predicting the taxonomic affiliation of DNA sequences collected from biological samples is a fundamental step in biodiversity assessment. This task is performed by leveraging existing databases containing reference DNA sequences endowed with a taxonomic identification. However, environmental sequences can be from organisms that are either unknown to science or for which there are no reference sequences available. Thus, taxonomic novelty of a sequence needs to be accounted for when doing classification. We propose Bayesian nonparametric taxonomic classifiers, BayesANT, which use species sampling model priors to allow unobserved taxa to be discovered at each taxonomic rank. Using a simple product multinomial likelihood with conjugate Dirichlet priors at the lowest rank, a highly flexible supervised algorithm is developed to provide a probabilistic prediction of the taxa placement of each sequence at each rank. As an illustration, we run our algorithm on a carefully annotated library of Finnish arthropods (FinBOL). To assess the ability of BayesANT to recognize novelty and to predict known taxonomic affiliations correctly, we test it on two training-test splitting scenarios, each with a different proportion of taxa unobserved in training. We show how our algorithm attains accurate predictions and reliably quantifies classification uncertainty, especially when many sequences in the test set are affiliated to taxa unknown in training. By enabling taxonomic predictions for DNA barcodes to identify unseen branches, we believe BayesANT will be of broad utility as a tool for DNA metabarcoding within bioinformatics pipelines.
Articolo in rivista - Articolo scientifico
Bayesian nonparametrics; DNA barcoding; species novelty; species sampling models; taxonomic classification;
English
30-nov-2022
2023
14
2 (February 2023)
529
542
none
Zito, A., Rigon, T., Dunson, D. (2023). Inferring taxonomic placement from DNA barcoding aiding in discovery of new taxa. METHODS IN ECOLOGY AND EVOLUTION, 14(2 (February 2023)), 529-542 [10.1111/2041-210X.14009].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/453731
Citazioni
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 3
Social impact