Over the past few decades, a broad variety of models has been developed for graphs. However, modern applications in various fields highlighted the need to account for higher-order interactions, to include information deriving from groups of three or more nodes. Simple examples include group interactions in social networks, scientific co-authorship, interactions between more than two species in ecological models or high-order correlations between neurons in brain networks. Hypergraphs provide the most general formalization of higher-order interactions: similarly to a graph, a hypergraph is defined as a set of nodes and a set of hyperedges, the latter specifying nodes taking part in each interaction. We propose a stochastic block model for hypergraphs to perform model-based clustering, capturing the information deriving from higher-order interactions. A discrete latent variable with Q support points is associated to each node, identifying the latent states in the population. The model parameters are the weight of each latent state, and the occurrence probability of a hyperedge given the belonging latent states of its nodes. The formulation of the model is sufficiently flexible to account for possible simplified latent structures; an example is the situation in which the conditional probability of occurrence of an hyperedge can only assume two possible values: one if all its nodes belong to the same latent state, and the other otherwise. Maximum likelihood estimation of model parameters is performed through a variational expectation-maximization algorithm, by maximizing a lower bound of the log-likelihood function. Spectral clustering techniques are employed to provide an optimal initialization to the algorithm, and model selection is explored using the ICL criterion. The model is applied to both simulated and real data, and the performance of the proposal is assessed in terms of parameter estimation and ability to recover the clusters (through the Adjusted Rand Index). The estimation algorithm is implemented in C++ language (both in serial and in parallel version) and it is made available for the R software.
Matias, C., Brusa, L. (2022). Model-based clustering in hypergraphs through a stochastic blockmodel. In International Conference APPLIED STATISTICS 2022 - Abstracts and Program (pp.29-29).
Model-based clustering in hypergraphs through a stochastic blockmodel
Brusa, L
2022
Abstract
Over the past few decades, a broad variety of models has been developed for graphs. However, modern applications in various fields highlighted the need to account for higher-order interactions, to include information deriving from groups of three or more nodes. Simple examples include group interactions in social networks, scientific co-authorship, interactions between more than two species in ecological models or high-order correlations between neurons in brain networks. Hypergraphs provide the most general formalization of higher-order interactions: similarly to a graph, a hypergraph is defined as a set of nodes and a set of hyperedges, the latter specifying nodes taking part in each interaction. We propose a stochastic block model for hypergraphs to perform model-based clustering, capturing the information deriving from higher-order interactions. A discrete latent variable with Q support points is associated to each node, identifying the latent states in the population. The model parameters are the weight of each latent state, and the occurrence probability of a hyperedge given the belonging latent states of its nodes. The formulation of the model is sufficiently flexible to account for possible simplified latent structures; an example is the situation in which the conditional probability of occurrence of an hyperedge can only assume two possible values: one if all its nodes belong to the same latent state, and the other otherwise. Maximum likelihood estimation of model parameters is performed through a variational expectation-maximization algorithm, by maximizing a lower bound of the log-likelihood function. Spectral clustering techniques are employed to provide an optimal initialization to the algorithm, and model selection is explored using the ICL criterion. The model is applied to both simulated and real data, and the performance of the proposal is assessed in terms of parameter estimation and ability to recover the clusters (through the Adjusted Rand Index). The estimation algorithm is implemented in C++ language (both in serial and in parallel version) and it is made available for the R software.File | Dimensione | Formato | |
---|---|---|---|
Matias-2022-Appl Stats-abstract.pdf
accesso aperto
Descrizione: Abstract
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Creative Commons
Dimensione
548.55 kB
Formato
Adobe PDF
|
548.55 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.