The steady increase of information on WWW, digital library, portal, database and local intranet, gave rise to the development of several methods to help user in Information Retrieval, information organization and browsing. Clustering algorithms are of crucial importance when there are no labels associated to textual information or documents. The aim of clustering algorithms, in the text mining domain, is to group documents concerning with the same topic into the same cluster, producing a flat or hierarchical structure of clusters. In this paper we present a Knowledge Discovery System for document processing and clustering. The clustering algorithm implemented in this system, called Induced Bisecting k-Means, outperforms the Standard Bisecting k-Means and is particularly suitable for on line applications when computational efficiency is a crucial aspect.

Archetti, F., Fersini, E., Campanelli, P., Messina, V. (2006). A hierarchical document clustering environment based on the induced bisecting k-means. In Flexible Query Answering Systems (eds. H. L. Larsen, G. Pasi, D. O. Arroyo, T. Andreasen, H. Christiansen) (pp.257-269) [10.1007/11766254_22].

A hierarchical document clustering environment based on the induced bisecting k-means

ARCHETTI, FRANCESCO ANTONIO;FERSINI, ELISABETTA;CAMPANELLI, PIERO;MESSINA, VINCENZINA
2006

Abstract

The steady increase of information on WWW, digital library, portal, database and local intranet, gave rise to the development of several methods to help user in Information Retrieval, information organization and browsing. Clustering algorithms are of crucial importance when there are no labels associated to textual information or documents. The aim of clustering algorithms, in the text mining domain, is to group documents concerning with the same topic into the same cluster, producing a flat or hierarchical structure of clusters. In this paper we present a Knowledge Discovery System for document processing and clustering. The clustering algorithm implemented in this system, called Induced Bisecting k-Means, outperforms the Standard Bisecting k-Means and is particularly suitable for on line applications when computational efficiency is a crucial aspect.
slide + paper
hierarchical clustering,Induced Bisecting k-Means, document clustering
English
Flexibility in Database Management and Querying
2006
Flexible Query Answering Systems (eds. H. L. Larsen, G. Pasi, D. O. Arroyo, T. Andreasen, H. Christiansen)
978-3-540-34638-8
2006
4027
257
269
http://www.springerlink.com/content/f6w07qj750658r27/
none
Archetti, F., Fersini, E., Campanelli, P., Messina, V. (2006). A hierarchical document clustering environment based on the induced bisecting k-means. In Flexible Query Answering Systems (eds. H. L. Larsen, G. Pasi, D. O. Arroyo, T. Andreasen, H. Christiansen) (pp.257-269) [10.1007/11766254_22].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/5512
Citazioni
  • Scopus 26
  • ???jsp.display-item.citation.isi??? 14
Social impact