The steady increase of information on WWW, digital library, portal, database and local intranet, gave rise to the development of several methods to help user in Information Retrieval, information organization and browsing. Clustering algorithms are of crucial importance when there are no labels associated to textual information or documents. The aim of clustering algorithms, in the text mining domain, is to group documents concerning with the same topic into the same cluster, producing a flat or hierarchical structure of clusters. In this paper we present a Knowledge Discovery System for document processing and clustering. The clustering algorithm implemented in this system, called Induced Bisecting k-Means, outperforms the Standard Bisecting k-Means and is particularly suitable for on line applications when computational efficiency is a crucial aspect.
Archetti, F., Fersini, E., Campanelli, P., Messina, V. (2006). A hierarchical document clustering environment based on the induced bisecting k-means. In Flexible Query Answering Systems (eds. H. L. Larsen, G. Pasi, D. O. Arroyo, T. Andreasen, H. Christiansen) (pp.257-269) [10.1007/11766254_22].
A hierarchical document clustering environment based on the induced bisecting k-means
ARCHETTI, FRANCESCO ANTONIO;FERSINI, ELISABETTA;CAMPANELLI, PIERO;MESSINA, VINCENZINA
2006
Abstract
The steady increase of information on WWW, digital library, portal, database and local intranet, gave rise to the development of several methods to help user in Information Retrieval, information organization and browsing. Clustering algorithms are of crucial importance when there are no labels associated to textual information or documents. The aim of clustering algorithms, in the text mining domain, is to group documents concerning with the same topic into the same cluster, producing a flat or hierarchical structure of clusters. In this paper we present a Knowledge Discovery System for document processing and clustering. The clustering algorithm implemented in this system, called Induced Bisecting k-Means, outperforms the Standard Bisecting k-Means and is particularly suitable for on line applications when computational efficiency is a crucial aspect.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.