Bicocca Open Archive

Model-based clustering of moderate- or large-dimensional data is notoriously difficult. We propose a model for simultaneous dimensionality reduction and clustering by assuming a mixture model for a set of latent scores, which are then linked to the observations via a Gaussian latent factor model. This approach was recently investigated by Chandra et al. (2023). The authors used a factor-analytic representation and assumed a mixture model for the latent factors. However, performance can deteriorate in the presence of model misspecification. Assuming a repulsive point process prior for the component-specific means of the mixture for the latent scores is shown to yield a more robust model that outperforms the standard mixture model for the latent factors in several simulated scenarios. The repulsive point process must be anisotropic to favour well-separated clusters of data, and its density should be tractable for efficient posterior inference. We address these issues by proposing a general construction for anisotropic determinantal point processes. We illustrate our model in simulations, as well as a plant species co-occurrence dataset.

Ghilotti, L., Beraha, M., Guglielmi, A. (2025). Bayesian clustering of high-dimensional data via latent repulsive mixtures. BIOMETRIKA, 112(2) [10.1093/biomet/asae059].

Bayesian clustering of high-dimensional data via latent repulsive mixtures

Ghilotti L.;Beraha M.;Guglielmi A.

2025

Abstract

Model-based clustering of moderate- or large-dimensional data is notoriously difficult. We propose a model for simultaneous dimensionality reduction and clustering by assuming a mixture model for a set of latent scores, which are then linked to the observations via a Gaussian latent factor model. This approach was recently investigated by Chandra et al. (2023). The authors used a factor-analytic representation and assumed a mixture model for the latent factors. However, performance can deteriorate in the presence of model misspecification. Assuming a repulsive point process prior for the component-specific means of the mixture for the latent scores is shown to yield a more robust model that outperforms the standard mixture model for the latent factors in several simulated scenarios. The repulsive point process must be anisotropic to favour well-separated clusters of data, and its density should be tractable for efficient posterior inference. We address these issues by proposing a general construction for anisotropic determinantal point processes. We illustrate our model in simulations, as well as a plant species co-occurrence dataset.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Anisotropic point process; Determinantal point process; Gaussian factor model; Markov chain Monte Carlo; Model-based clustering;
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				4-nov-2024
			
	Data di pubblicazione
	
				2025
			
	Rivista
	
				BIOMETRIKA
			
	Numero del volume
	
				112
			
	Fascicolo
	
				2
			
	Article number
	
				asae059
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1093/biomet/asae059
			
	Fulltext
	
				open
			
	Citazione
	
				Ghilotti, L., Beraha, M., Guglielmi, A. (2025). Bayesian clustering of high-dimensional data via latent repulsive mixtures. BIOMETRIKA, 112(2) [10.1093/biomet/asae059].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Ghilotti-2025-Biometrika-VoR.pdf accesso aperto Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 341.35 kB Formato Adobe PDF Visualizza/Apri	341.35 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/549265

Citazioni

0

ND

Social impact