Bicocca Open Archive

Consider an (observable) random sample of size n from an infinite population of individuals, each individual being endowed with a finite set of "features" from a collection of features (F-j)(j>1) with unknown probabilities (p(j))(j>1), i.e., p(j) is the probability that an individual displays feature F-j. Under this feature sampling framework, in recent years there has been a growing interest in estimating the sum of the probability masses p(j)'s of features observed with frequency r >= 0 in the sample, here denoted by M-n,M-r. This is the natural feature sampling counterpart of the classical problem of estimating small probabilities in the species sampling framework, where each individual is endowed with only one feature (or "species"). In this paper we study the problem of consistent estimation of the small mass M-n,M-r. We first show that there do not exist universally consistent estimators, in the multiplicative sense, of the missing mass M-n,M-0. Then, we introduce an estimator of M-n,M-r and identify sufficient conditions under which the estimator is consistent. In particular, we propose a nonparametric estimator (M) over cap (n,r) of M-n,M-r which has the same analytic form of the celebrated Good-Turing estimator for small probabilities, with the sole difference that the two estimators have different ranges (supports). Then, we show that (M) over cap (n,r) is strongly consistent, in the multiplicative sense, under the assumption that (p(j))(j >= 1) has regularly varying heavy tails.

Ayed, F., Battiston, M., Camerlenghi, F., Favaro, S. (2021). Consistent estimation of small masses in feature sampling. JOURNAL OF MACHINE LEARNING RESEARCH, 22(6), 1-28.

Consistent estimation of small masses in feature sampling

Ayed, F;Battiston, M;Camerlenghi, F;Favaro, S

2021

Abstract

Consider an (observable) random sample of size n from an infinite population of individuals, each individual being endowed with a finite set of "features" from a collection of features (F-j)(j>1) with unknown probabilities (p(j))(j>1), i.e., p(j) is the probability that an individual displays feature F-j. Under this feature sampling framework, in recent years there has been a growing interest in estimating the sum of the probability masses p(j)'s of features observed with frequency r >= 0 in the sample, here denoted by M-n,M-r. This is the natural feature sampling counterpart of the classical problem of estimating small probabilities in the species sampling framework, where each individual is endowed with only one feature (or "species"). In this paper we study the problem of consistent estimation of the small mass M-n,M-r. We first show that there do not exist universally consistent estimators, in the multiplicative sense, of the missing mass M-n,M-0. Then, we introduce an estimator of M-n,M-r and identify sufficient conditions under which the estimator is consistent. In particular, we propose a nonparametric estimator (M) over cap (n,r) of M-n,M-r which has the same analytic form of the celebrated Good-Turing estimator for small probabilities, with the sole difference that the two estimators have different ranges (supports). Then, we show that (M) over cap (n,r) is strongly consistent, in the multiplicative sense, under the assumption that (p(j))(j >= 1) has regularly varying heavy tails.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Feature sampling; Good-Turing estimator; Missing mass; Multiplicative consistency; Nonparametric inference; Regularly varying heavy-tailed distributions; Species sampling;
			
	Lingua del contenuto
	
				English
			
	Data di pubblicazione
	
				2021
			
	Rivista
	
				JOURNAL OF MACHINE LEARNING RESEARCH
			
	Numero del volume
	
				22
			
	Fascicolo
	
				6
			
	Pagina iniziale
	
				1
			
	Pagina finale
	
				28
			
	Fulltext
	
				none
			
	Citazione
	
				Ayed, F., Battiston, M., Camerlenghi, F., Favaro, S. (2021). Consistent estimation of small masses in feature sampling. JOURNAL OF MACHINE LEARNING RESEARCH, 22(6), 1-28.
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/314231

Citazioni

1

1

Social impact