The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library.

Favaro, S., Nipoti, B., Teh, Y. (2016). Rediscovery of Good-Turing estimators via Bayesian nonparametrics. BIOMETRICS, 72(1), 136-145 [10.1111/biom.12366].

Rediscovery of Good-Turing estimators via Bayesian nonparametrics

Nipoti B.;
2016

Abstract

The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library.
Articolo in rivista - Articolo scientifico
Asymptotic equivalence; Bayesian nonparametrics; Credible intervals; Discovery probability; Expressed Sequence Tags; Good-Toulmin estimator; Good-Turing estimator; Smoothing technique; Two parameter Poisson-Dirichlet prior;
English
2016
72
1
136
145
reserved
Favaro, S., Nipoti, B., Teh, Y. (2016). Rediscovery of Good-Turing estimators via Bayesian nonparametrics. BIOMETRICS, 72(1), 136-145 [10.1111/biom.12366].
File in questo prodotto:
File Dimensione Formato  
biom.12366.pdf

Solo gestori archivio

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Dimensione 247.06 kB
Formato Adobe PDF
247.06 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/250017
Citazioni
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 10
Social impact