Bicocca Open Archive

Extracting from a given document collection what we call "domain-specific" key terms/phrases is a challenging task. By "domain-specific" key terms/phrases we mean words/expressions representative of the topical areas specific to the focus of a document collection. For example, when a collection is related to academic research (i.e., its focus is related to topics dealing with academic research), the domain-specific key terms/phrases could be 'Information Retrieval', 'Marine Biology', 'Science', etc. In this contribution a technique for identifying domain-specific key terms/phrases from a collection of documents is proposed. The proposed technique works on short textual descriptions, and it makes use of the titles of Wikipedia articles and of the Wikipedia category graph. We performed some experiments over the document collection (html title text only) of eight post-graduate school Web sites of five different countries. The evaluations show promising results for the identification of domain-specific key terms/phrases.

Qureshi, M., O'Riordan, C., Pasi, G. (2014). Exploiting wikipedia to identify domain-specific key terms/phrases from a short-text collection. In CEUR Workshop Proceedings (pp.63-74). CEUR-WS.

Exploiting wikipedia to identify domain-specific key terms/phrases from a short-text collection

QURESHI, MUHAMMAD ATIF;O'Riordan, C;PASI, GABRIELLA^Ultimo

2014

Abstract

Extracting from a given document collection what we call "domain-specific" key terms/phrases is a challenging task. By "domain-specific" key terms/phrases we mean words/expressions representative of the topical areas specific to the focus of a document collection. For example, when a collection is related to academic research (i.e., its focus is related to topics dealing with academic research), the domain-specific key terms/phrases could be 'Information Retrieval', 'Marine Biology', 'Science', etc. In this contribution a technique for identifying domain-specific key terms/phrases from a collection of documents is proposed. The proposed technique works on short textual descriptions, and it makes use of the titles of Wikipedia articles and of the Wikipedia category graph. We performed some experiments over the document collection (html title text only) of eight post-graduate school Web sites of five different countries. The evaluations show promising results for the identification of domain-specific key terms/phrases.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Knowledge Extraction, Wikipedia, Domain-specific information extraction
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				Italian Information Retrieval Workshop, IIR - January 20-21
			
	Anno del convegno
	
				2014
			
	Titolo degli atti
	
				CEUR Workshop Proceedings
			
	Collana o serie
	
				CEUR WORKSHOP PROCEEDINGS
			
	Data di pubblicazione
	
				2014
			
	Numero del volume
	
				1127
			
	Pagina iniziale
	
				63
			
	Pagina finale
	
				74
			
	URL alternativo
	
				http://ceur-ws.org/Vol-1127/paper9.pdf
			
	Fulltext
	
				none
			
	Citazione
	
				Qureshi, M., O'Riordan, C., Pasi, G. (2014). Exploiting wikipedia to identify domain-specific key terms/phrases from a short-text collection. In CEUR Workshop Proceedings (pp.63-74). CEUR-WS.
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/58530

Citazioni

4

ND

Social impact