Bicocca Open Archive

Any decision about the release of microdata for public use is supported by the estimation of measures of disclosure risk, the most popular being the number τ1 of sample uniques that are also population uniques. In such a context, parametric and nonparametric partition-based models have been shown to have: i) the strength of leading to estimators of τ1 with desirable features, including ease of implementation, computational efficiency and scalability to massive data; ii) the weakness of producing underestimates of τ1 in realistic scenarios, with the underestimation getting worse as the tail behaviour of the empirical distribution of microdata gets heavier. To fix this underestimation phenomenon, we propose a Bayesian nonparametric partition-based model that can be tuned to the tail behaviour of the empirical distribution of microdata. Our model relies on the Pitman–Yor process prior, and it leads to a novel estimator of τ1 with all the desirable features of partition-based estimators and that, in addition, allows to reduce underestimation by tuning a “discount” parameter. We show the effectiveness of our estimator through its application to synthetic data and real data.

Favaro, S., Panero, F., Rigon, T. (2021). Bayesian nonparametric disclosure risk assessment. ELECTRONIC JOURNAL OF STATISTICS, 15(2), 5626-5651 [10.1214/21-EJS1933].

Bayesian nonparametric disclosure risk assessment

Favaro, Stefano;Panero, Francesca;Rigon, Tommaso

2021

Abstract

Any decision about the release of microdata for public use is supported by the estimation of measures of disclosure risk, the most popular being the number τ1 of sample uniques that are also population uniques. In such a context, parametric and nonparametric partition-based models have been shown to have: i) the strength of leading to estimators of τ1 with desirable features, including ease of implementation, computational efficiency and scalability to massive data; ii) the weakness of producing underestimates of τ1 in realistic scenarios, with the underestimation getting worse as the tail behaviour of the empirical distribution of microdata gets heavier. To fix this underestimation phenomenon, we propose a Bayesian nonparametric partition-based model that can be tuned to the tail behaviour of the empirical distribution of microdata. Our model relies on the Pitman–Yor process prior, and it leads to a novel estimator of τ1 with all the desirable features of partition-based estimators and that, in addition, allows to reduce underestimation by tuning a “discount” parameter. We show the effectiveness of our estimator through its application to synthetic data and real data.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Bayesian nonparametrics; data confidentiality; Dirichlet process prior; disclosure risk assessment; empirical Bayes; Pitman– Yor process prior;
			
	Lingua del contenuto
	
				English
			
	Data di pubblicazione
	
				2021
			
	Rivista
	
				ELECTRONIC JOURNAL OF STATISTICS
			
	Numero del volume
	
				15
			
	Fascicolo
	
				2
			
	Pagina iniziale
	
				5626
			
	Pagina finale
	
				5651
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1214/21-EJS1933
			
	Fulltext
	
				open
			
	Citazione
	
				Favaro, S., Panero, F., Rigon, T. (2021). Bayesian nonparametric disclosure risk assessment. ELECTRONIC JOURNAL OF STATISTICS, 15(2), 5626-5651 [10.1214/21-EJS1933].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
10281-342516_VoR.pdf accesso aperto Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 362.75 kB Formato Adobe PDF Visualizza/Apri	362.75 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/342516

Citazioni

0

0

Social impact