Bicocca Open Archive

Given n samples from a population of individuals belonging to different types with unknown proportions, how do we estimate the probability of discovering a new type at the (n + 1)th draw? This is a classical problem in statistics, commonly referred to as the missing mass estimation problem. Recent results have shown: (i) the impossibility of estimating the missing mass without imposing further assumptions on type’s proportions; (ii) the consistency of the Good–Turing estimator of the missing mass under the assumption that the tail of type’s proportions decays to zero as a regularly varying function with parameter α ∈ (0, 1); (iii) the rate of convergence n-α/2 for the Good–Turing estimator under the class of α ∈ (0, 1) regularly varying P. In this paper we introduce an alternative, and remarkably shorter, proof of the impossibility of a distribution-free estimation of the missing mass. Beside being of independent interest, our alternative proof suggests a natural approach to strengthen, and expand, the recent results on the rate of convergence of the Good–Turing estimator under α ∈ (0, 1) regularly varying type’s proportions. In particular, we show that the convergence rate n-α/2 is the best rate that any estimator can achieve, up to a slowly varying function. Furthermore, we prove that a lower bound to the minimax estimation risk must scale at least as n-α/2, which leads to conjecture that the Good–Turing estimator is a rate optimal minimax estimator under regularly varying type proportions.

Ayed, F., Battiston, M., Camerlenghi, F., Favaro, S. (2021). On consistent and rate optimal estimation of the missing mass. ANNALES DE L'INSTITUT HENRI POINCARE-PROBABILITES ET STATISTIQUES, 57(3), 1476-1494 [10.1214/20-AIHP1126].

On consistent and rate optimal estimation of the missing mass

Ayed, Fadhel;Battiston, Marco;Camerlenghi, Federico;Favaro, Stefano

2021

Abstract

Given n samples from a population of individuals belonging to different types with unknown proportions, how do we estimate the probability of discovering a new type at the (n + 1)th draw? This is a classical problem in statistics, commonly referred to as the missing mass estimation problem. Recent results have shown: (i) the impossibility of estimating the missing mass without imposing further assumptions on type’s proportions; (ii) the consistency of the Good–Turing estimator of the missing mass under the assumption that the tail of type’s proportions decays to zero as a regularly varying function with parameter α ∈ (0, 1); (iii) the rate of convergence n-α/2 for the Good–Turing estimator under the class of α ∈ (0, 1) regularly varying P. In this paper we introduce an alternative, and remarkably shorter, proof of the impossibility of a distribution-free estimation of the missing mass. Beside being of independent interest, our alternative proof suggests a natural approach to strengthen, and expand, the recent results on the rate of convergence of the Good–Turing estimator under α ∈ (0, 1) regularly varying type’s proportions. In particular, we show that the convergence rate n-α/2 is the best rate that any estimator can achieve, up to a slowly varying function. Furthermore, we prove that a lower bound to the minimax estimation risk must scale at least as n-α/2, which leads to conjecture that the Good–Turing estimator is a rate optimal minimax estimator under regularly varying type proportions.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Good–Turing estimator; Minimax rate; Missing mass; Optimal rate of convergence; Regular variation; Two-parameter Poisson–Dirichlet;
			
	Lingua del contenuto
	
				English
			
	Data di pubblicazione
	
				2021
			
	Rivista
	
				ANNALES DE L'INSTITUT HENRI POINCARE-PROBABILITES ET STATISTIQUES
			
	Numero del volume
	
				57
			
	Fascicolo
	
				3
			
	Pagina iniziale
	
				1476
			
	Pagina finale
	
				1494
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1214/20-AIHP1126
			
	Fulltext
	
				none
			
	Citazione
	
				Ayed, F., Battiston, M., Camerlenghi, F., Favaro, S. (2021). On consistent and rate optimal estimation of the missing mass. ANNALES DE L'INSTITUT HENRI POINCARE-PROBABILITES ET STATISTIQUES, 57(3), 1476-1494 [10.1214/20-AIHP1126].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/321433

Citazioni

4

4

Social impact