The Dirichlet is the most well known distribution for data representing vectors of proportions (i.e. compositions). The flexible Dirichlet distribution (FD, Ongaro, Migliorati (2013)) generalizes the Dirichlet one allowing to preserve many of its mathematical and compositional properties without inheriting its lack of flexibility in modeling the data dependence structure. The Extended FD model (EFD, Ongaro, Migliorati (2014)) is obtained by generalizing the basis of positive random variables generating the FD by normalization. The new model exhibits a more sophisticated mixture representation and it accommodates for the existence of dependence between composition (the normalized basis) and size (the sum of the variables forming the basis). The present paper investigates the inferential aspects of the EFD, as a model for compositional data. In particular, an analysis of the flexibility of the cluster structure and of the dependence pattern implied by the model reveals its relevance for applications. Furthermore, appropriate estimation procedures are devised based on E--M algorithm. Specifically, an ad hoc initialization strategy is proposed to address the crucial choice of starting values for the E--M. Finally, the potential of the model is illustrated by means of applications to real data sets
Migliorati, S., Ongaro, A. (2015). Inferential issues in the Extended Flexible Dirichlet model. In ASMDA 2015 Proceedings (pp.665-678). Skiadas, CH. Published by ISAST: international Society for the Advancement of Science and Technology.
Inferential issues in the Extended Flexible Dirichlet model
MIGLIORATI, SONIAPrimo
;ONGARO, ANDREASecondo
2015
Abstract
The Dirichlet is the most well known distribution for data representing vectors of proportions (i.e. compositions). The flexible Dirichlet distribution (FD, Ongaro, Migliorati (2013)) generalizes the Dirichlet one allowing to preserve many of its mathematical and compositional properties without inheriting its lack of flexibility in modeling the data dependence structure. The Extended FD model (EFD, Ongaro, Migliorati (2014)) is obtained by generalizing the basis of positive random variables generating the FD by normalization. The new model exhibits a more sophisticated mixture representation and it accommodates for the existence of dependence between composition (the normalized basis) and size (the sum of the variables forming the basis). The present paper investigates the inferential aspects of the EFD, as a model for compositional data. In particular, an analysis of the flexibility of the cluster structure and of the dependence pattern implied by the model reveals its relevance for applications. Furthermore, appropriate estimation procedures are devised based on E--M algorithm. Specifically, an ad hoc initialization strategy is proposed to address the crucial choice of starting values for the E--M. Finally, the potential of the model is illustrated by means of applications to real data setsI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.