In this paper an original soft hierarchical Fuzzy Clustering algorithm is proposed, named Hierarchical Hyper-spherical Divisive Fuzzy C-Means (H2D-FCM), with the following characteristics: it generates a "soft" hierarchy in which a document can belong to several child clusters of a node, and the clusters in the same hierarchical level are more specific (general) than the clusters in the upper (lower) level. The proposed algorithm is a divisive algorithm based on a modified bisective K-Means, applying a modified probabilistic Fuzzy C Means algorithm to divide each node into child-nodes. The algorithm determines the proper number of cluster to generate at the first level based on an entropy measure and decides if a node can be further split based on a "density" measure. The paper presents the algorithm and its evaluations on two standard collections. © 2009 IEEE.
Pasi, G., Bordogna, G. (2009). Hierarchical-Hyperspherical Divisive Fuzzy C-Means (H2D-FCM) Clustering for Information Retrieval. In WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (pp.614-621). Washington D.C. : IEEE [10.1109/WI-IAT.2009.104].
Hierarchical-Hyperspherical Divisive Fuzzy C-Means (H2D-FCM) Clustering for Information Retrieval
PASI, GABRIELLA;
2009
Abstract
In this paper an original soft hierarchical Fuzzy Clustering algorithm is proposed, named Hierarchical Hyper-spherical Divisive Fuzzy C-Means (H2D-FCM), with the following characteristics: it generates a "soft" hierarchy in which a document can belong to several child clusters of a node, and the clusters in the same hierarchical level are more specific (general) than the clusters in the upper (lower) level. The proposed algorithm is a divisive algorithm based on a modified bisective K-Means, applying a modified probabilistic Fuzzy C Means algorithm to divide each node into child-nodes. The algorithm determines the proper number of cluster to generate at the first level based on an entropy measure and decides if a node can be further split based on a "density" measure. The paper presents the algorithm and its evaluations on two standard collections. © 2009 IEEE.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.