The Dirichlet is the most well known distribution for compositional data, i.e. data representing vectors of proportions. The flexible Dirichlet distribution (FD) generalizes the Dirichlet one allowing to preserve its main mathematical and compositional properties. At the same time, it does not inherit its lack of flexibility in modeling the dependence concepts appropriate for compositional data. The present paper introduces a new model obtained by extending the basis of positive random variables generating the FD by normalization. Specifically, the new basis exhibits a more sophisticated mixture (latent) representation, which leads to a twofold result. On the one side, a more general distribution for compositional data, called EFD, is obtained by normalization. In particular, the EFD allows for a significantly wider differentiation among the clusters defining its mixture representation. On the other side, the generalized basis induces a tractable model for the dependence between composition and size: the conditional distribution of the composition given the size is still an EFD, the size affecting it in a simple fashion through the cluster weights.
Ongaro, A., Migliorati, S. (2015). A Dirichlet mixture model for compositions allowing for dependence on the size. In M. Carpita, E. Brentari, E.M. Qannari (a cura di), Advances in Latent Variables Methods, Models and Applications (pp. 101-111). Springer [10.1007/10104_2014_13].
A Dirichlet mixture model for compositions allowing for dependence on the size
Ongaro A.;Migliorati S.Secondo
2015
Abstract
The Dirichlet is the most well known distribution for compositional data, i.e. data representing vectors of proportions. The flexible Dirichlet distribution (FD) generalizes the Dirichlet one allowing to preserve its main mathematical and compositional properties. At the same time, it does not inherit its lack of flexibility in modeling the dependence concepts appropriate for compositional data. The present paper introduces a new model obtained by extending the basis of positive random variables generating the FD by normalization. Specifically, the new basis exhibits a more sophisticated mixture (latent) representation, which leads to a twofold result. On the one side, a more general distribution for compositional data, called EFD, is obtained by normalization. In particular, the EFD allows for a significantly wider differentiation among the clusters defining its mixture representation. On the other side, the generalized basis induces a tractable model for the dependence between composition and size: the conditional distribution of the composition given the size is still an EFD, the size affecting it in a simple fashion through the cluster weights.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.