Given n samples from a population of individuals belonging to different types with unknown proportions, how do we estimate the probability of discovering a new type at the (n + 1)th draw? This is a classical problem in statistics, commonly referred to as the missing mass estimation problem. Recent results have shown: (i) the impossibility of estimating the missing mass without imposing further assumptions on type’s proportions; (ii) the consistency of the Good–Turing estimator of the missing mass under the assumption that the tail of type’s proportions decays to zero as a regularly varying function with parameter α ∈ (0, 1); (iii) the rate of convergence n-α/2 for the Good–Turing estimator under the class of α ∈ (0, 1) regularly varying P. In this paper we introduce an alternative, and remarkably shorter, proof of the impossibility of a distribution-free estimation of the missing mass. Beside being of independent interest, our alternative proof suggests a natural approach to strengthen, and expand, the recent results on the rate of convergence of the Good–Turing estimator under α ∈ (0, 1) regularly varying type’s proportions. In particular, we show that the convergence rate n-α/2 is the best rate that any estimator can achieve, up to a slowly varying function. Furthermore, we prove that a lower bound to the minimax estimation risk must scale at least as n-α/2, which leads to conjecture that the Good–Turing estimator is a rate optimal minimax estimator under regularly varying type proportions.

Ayed, F., Battiston, M., Camerlenghi, F., Favaro, S. (2021). On consistent and rate optimal estimation of the missing mass. ANNALES DE L'INSTITUT HENRI POINCARE-PROBABILITES ET STATISTIQUES, 57(3), 1476-1494 [10.1214/20-AIHP1126].

On consistent and rate optimal estimation of the missing mass

Camerlenghi, Federico;
2021

Abstract

Given n samples from a population of individuals belonging to different types with unknown proportions, how do we estimate the probability of discovering a new type at the (n + 1)th draw? This is a classical problem in statistics, commonly referred to as the missing mass estimation problem. Recent results have shown: (i) the impossibility of estimating the missing mass without imposing further assumptions on type’s proportions; (ii) the consistency of the Good–Turing estimator of the missing mass under the assumption that the tail of type’s proportions decays to zero as a regularly varying function with parameter α ∈ (0, 1); (iii) the rate of convergence n-α/2 for the Good–Turing estimator under the class of α ∈ (0, 1) regularly varying P. In this paper we introduce an alternative, and remarkably shorter, proof of the impossibility of a distribution-free estimation of the missing mass. Beside being of independent interest, our alternative proof suggests a natural approach to strengthen, and expand, the recent results on the rate of convergence of the Good–Turing estimator under α ∈ (0, 1) regularly varying type’s proportions. In particular, we show that the convergence rate n-α/2 is the best rate that any estimator can achieve, up to a slowly varying function. Furthermore, we prove that a lower bound to the minimax estimation risk must scale at least as n-α/2, which leads to conjecture that the Good–Turing estimator is a rate optimal minimax estimator under regularly varying type proportions.
Articolo in rivista - Articolo scientifico
Good–Turing estimator; Minimax rate; Missing mass; Optimal rate of convergence; Regular variation; Two-parameter Poisson–Dirichlet;
English
2021
57
3
1476
1494
none
Ayed, F., Battiston, M., Camerlenghi, F., Favaro, S. (2021). On consistent and rate optimal estimation of the missing mass. ANNALES DE L'INSTITUT HENRI POINCARE-PROBABILITES ET STATISTIQUES, 57(3), 1476-1494 [10.1214/20-AIHP1126].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/321433
Citazioni
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
Social impact