In this paper, we propose a method to automatically extract informal knowledge from a collection of documents. The method is mainly based on the definition of a kind of informal knowledge representation consisting of concepts (lexically indicated by words) and the links between them. We show that links can be inferred from documents through the use of the probabilistic topic model while the overall parameters optimisation procedure, based on a suitable score function, can be carried out through the Random Mutation Hill-Climbing algorithm. Experimental findings show that our method is effective and that, as side effects, the score function can be employed as a criterion to compute the homogeneity between documents, which can be considered as a prelude to a classification procedure. © 2013 Springer-Verlag GmbH.
Colace, F., De Santo, M., Napoletano, P. (2013). Informal lightweight knowledge extraction from documents. In Recent Progress in Data Engineering and Internet Technology (pp. 181-186). Springer Verlag [10.1007/978-3-642-28807-4_25].
Informal lightweight knowledge extraction from documents
NAPOLETANO, PAOLOUltimo
2013
Abstract
In this paper, we propose a method to automatically extract informal knowledge from a collection of documents. The method is mainly based on the definition of a kind of informal knowledge representation consisting of concepts (lexically indicated by words) and the links between them. We show that links can be inferred from documents through the use of the probabilistic topic model while the overall parameters optimisation procedure, based on a suitable score function, can be carried out through the Random Mutation Hill-Climbing algorithm. Experimental findings show that our method is effective and that, as side effects, the score function can be employed as a criterion to compute the homogeneity between documents, which can be considered as a prelude to a classification procedure. © 2013 Springer-Verlag GmbH.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.