Extracting from a given document collection what we call "domain-specific" key terms/phrases is a challenging task. By "domain-specific" key terms/phrases we mean words/expressions representative of the topical areas specific to the focus of a document collection. For example, when a collection is related to academic research (i.e., its focus is related to topics dealing with academic research), the domain-specific key terms/phrases could be 'Information Retrieval', 'Marine Biology', 'Science', etc. In this contribution a technique for identifying domain-specific key terms/phrases from a collection of documents is proposed. The proposed technique works on short textual descriptions, and it makes use of the titles of Wikipedia articles and of the Wikipedia category graph. We performed some experiments over the document collection (html title text only) of eight post-graduate school Web sites of five different countries. The evaluations show promising results for the identification of domain-specific key terms/phrases.

Qureshi, M., O'Riordan, C., Pasi, G. (2014). Exploiting wikipedia to identify domain-specific key terms/phrases from a short-text collection. In CEUR Workshop Proceedings (pp.63-74). CEUR-WS.

Exploiting wikipedia to identify domain-specific key terms/phrases from a short-text collection

QURESHI, MUHAMMAD ATIF;PASI, GABRIELLA
Ultimo
2014

Abstract

Extracting from a given document collection what we call "domain-specific" key terms/phrases is a challenging task. By "domain-specific" key terms/phrases we mean words/expressions representative of the topical areas specific to the focus of a document collection. For example, when a collection is related to academic research (i.e., its focus is related to topics dealing with academic research), the domain-specific key terms/phrases could be 'Information Retrieval', 'Marine Biology', 'Science', etc. In this contribution a technique for identifying domain-specific key terms/phrases from a collection of documents is proposed. The proposed technique works on short textual descriptions, and it makes use of the titles of Wikipedia articles and of the Wikipedia category graph. We performed some experiments over the document collection (html title text only) of eight post-graduate school Web sites of five different countries. The evaluations show promising results for the identification of domain-specific key terms/phrases.
paper
Knowledge Extraction, Wikipedia, Domain-specific information extraction
English
Italian Information Retrieval Workshop, IIR - January 20-21
2014
CEUR Workshop Proceedings
2014
1127
63
74
http://ceur-ws.org/Vol-1127/paper9.pdf
none
Qureshi, M., O'Riordan, C., Pasi, G. (2014). Exploiting wikipedia to identify domain-specific key terms/phrases from a short-text collection. In CEUR Workshop Proceedings (pp.63-74). CEUR-WS.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/58530
Citazioni
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
Social impact