Bicocca Open Archive

Loss-based clustering methods, such as k-means clustering and its variants, are standard tools for finding groups in data. However, the lack of quantification of uncertainty in the estimated clusters is a disadvantage. Model-based clustering based on mixture models provides an alternative approach, but such methods face computational problems and are highly sensitive to the choice of kernel. In this article we propose a generalized Bayes framework that bridges between these paradigms through the use of Gibbs posteriors. In conducting Bayesian updating, the loglikelihood is replaced by a loss function for clustering, leading to a rich family of clustering methods. The Gibbs posterior represents a coherent updating of Bayesian beliefs without needing to specify a likelihood for the data, and can be used for characterizing uncertainty in clustering. We consider losses based on Bregman divergence and pairwise similarities, and develop efficient deterministic algorithms for point estimation along with sampling algorithms for uncertainty quantification. Several existing clustering algorithms, including k-means, can be interpreted as generalized Bayes estimators in our framework, and thus we provide a method of uncertainty quantification for these approaches, allowing, for example, calculation of the probability that a data point is well clustered.

Rigon, T., Herring, A., Dunson, D. (2023). A generalized Bayes framework for probabilistic clustering. BIOMETRIKA, 110(3), 559-578 [10.1093/biomet/asad004].

A generalized Bayes framework for probabilistic clustering

Rigon, T;Herring, AH;Dunson, DB

2023

Abstract

Loss-based clustering methods, such as k-means clustering and its variants, are standard tools for finding groups in data. However, the lack of quantification of uncertainty in the estimated clusters is a disadvantage. Model-based clustering based on mixture models provides an alternative approach, but such methods face computational problems and are highly sensitive to the choice of kernel. In this article we propose a generalized Bayes framework that bridges between these paradigms through the use of Gibbs posteriors. In conducting Bayesian updating, the loglikelihood is replaced by a loss function for clustering, leading to a rich family of clustering methods. The Gibbs posterior represents a coherent updating of Bayesian beliefs without needing to specify a likelihood for the data, and can be used for characterizing uncertainty in clustering. We consider losses based on Bregman divergence and pairwise similarities, and develop efficient deterministic algorithms for point estimation along with sampling algorithms for uncertainty quantification. Several existing clustering algorithms, including k-means, can be interpreted as generalized Bayes estimators in our framework, and thus we provide a method of uncertainty quantification for these approaches, allowing, for example, calculation of the probability that a data point is well clustered.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Gibbs posterior; K-means; Loss function; Product partition model; Uncertainty quantification;
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				19-gen-2023
			
	Data di pubblicazione
	
				2023
			
	Rivista
	
				BIOMETRIKA
			
	Numero del volume
	
				110
			
	Fascicolo
	
				3
			
	Pagina iniziale
	
				559
			
	Pagina finale
	
				578
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1093/biomet/asad004
			
	Fulltext
	
				none
			
	Citazione
	
				Rigon, T., Herring, A., Dunson, D. (2023). A generalized Bayes framework for probabilistic clustering. BIOMETRIKA, 110(3), 559-578 [10.1093/biomet/asad004].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/453730

Citazioni

7

8

Social impact