Understanding how the human gut microbiome affects host health is challenging due to the wide interindividual variability, sparsity, and high dimensionality of microbiome data. Mixed-membership models have been previously applied to these data to detect latent communities of bacterial taxa that are expected to co-occur. The most widely used mixed-membership model is latent Dirichlet allocation (LDA). However, LDA is limited by the rigidity of the Dirichlet distribution imposed on the community proportions, which hinders its ability to model dependencies and account for overdispersion. To address this limitation, a generalization of LDA is proposed that introduces greater flexibility into the covariance matrix by incorporating the flexible Dirichlet (FD), a specific identifiable mixture with Dirichlet components. In addition to identifying communities, the new model enables the detection of enterotypes, i.e., clusters of samples with similar microbe composition. For inferential purposes, a computationally efficient collapsed Gibbs sampler that exploits the conjugacy of the FD distribution with respect to the multinomial model is proposed. A simulation study demonstrates the model's ability to accurately recover true parameter values by minimizing appropriate compositional discrepancy measures between the true and estimated values. Additionally, the model correctly identifies the number of communities, as evidenced by perplexity scores. Moreover, an application to the COMBO dataset highlights its effectiveness in detecting biologically significant and coherent communities and enterotypes, revealing a broader range of correlations between community abundances. These results underscore the new model as a definite improvement over LDA.

Giampino, A., Ascari, R., Migliorati, S. (2025). A flexible mixed-membership model for community and enterotype detection for microbiome data. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 210(October 2025) [10.1016/j.csda.2025.108181].

A flexible mixed-membership model for community and enterotype detection for microbiome data

Giampino, Alice
;
Ascari, Roberto;Migliorati, Sonia
2025

Abstract

Understanding how the human gut microbiome affects host health is challenging due to the wide interindividual variability, sparsity, and high dimensionality of microbiome data. Mixed-membership models have been previously applied to these data to detect latent communities of bacterial taxa that are expected to co-occur. The most widely used mixed-membership model is latent Dirichlet allocation (LDA). However, LDA is limited by the rigidity of the Dirichlet distribution imposed on the community proportions, which hinders its ability to model dependencies and account for overdispersion. To address this limitation, a generalization of LDA is proposed that introduces greater flexibility into the covariance matrix by incorporating the flexible Dirichlet (FD), a specific identifiable mixture with Dirichlet components. In addition to identifying communities, the new model enables the detection of enterotypes, i.e., clusters of samples with similar microbe composition. For inferential purposes, a computationally efficient collapsed Gibbs sampler that exploits the conjugacy of the FD distribution with respect to the multinomial model is proposed. A simulation study demonstrates the model's ability to accurately recover true parameter values by minimizing appropriate compositional discrepancy measures between the true and estimated values. Additionally, the model correctly identifies the number of communities, as evidenced by perplexity scores. Moreover, an application to the COMBO dataset highlights its effectiveness in detecting biologically significant and coherent communities and enterotypes, revealing a broader range of correlations between community abundances. These results underscore the new model as a definite improvement over LDA.
Articolo in rivista - Articolo scientifico
Collapsed Gibbs sampling; Flexible Dirichlet distribution; Human gut microbiome; Latent variable models; Mixtures;
English
4-apr-2025
2025
210
October 2025
108181
open
Giampino, A., Ascari, R., Migliorati, S. (2025). A flexible mixed-membership model for community and enterotype detection for microbiome data. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 210(October 2025) [10.1016/j.csda.2025.108181].
File in questo prodotto:
File Dimensione Formato  
Giampino-2025-Computational Statistics and Data Analysis-VoR.pdf

accesso aperto

Descrizione: This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 1.22 MB
Formato Adobe PDF
1.22 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/549243
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact