Bicocca Open Archive

Knowledge Organization Systems (KOS), such as ontologies, taxonomies, and thesauri, play a crucial role in organising scientific knowledge. They help scientists navigate the vast landscape of research literature and are essential for building intelligent systems such as smart search engines, recommendation systems, conversational agents, and advanced analytics tools. However, the manual creation of these KOSs is costly, time-consuming, and often leads to outdated and overly broad representations. As a result, researchers have been exploring automated or semi-automated methods for generating ontologies of research topics. This paper analyses the use of large language models (LLMs) to identify semantic relationships between research topics. We specifically focus on six open and lightweight LLMs (up to 10.7 billion parameters) and use two zero-shot reasoning strategies to identify four types of relationships: broader, narrower, same-as, and other. Our preliminary analysis indicates that Dolphin2.1-OpenOrca-7B performs strongly in this task, achieving a 0.853 F1-score against a gold standard of 1,000 relationships derived from the IEEE Thesaurus. These promising results bring us one step closer to the next generation of tools for automatically curating KOSs, ultimately making the scientific literature easier to explore.

Aggarwal, T., Salatino, A., Osborne, F., Motta, E. (2024). Identifying Semantic Relationships Between Research Topics Using Large Language Models in a Zero-Shot Learning Setting. In Proceedings of the 4th International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment co-located with 23rd International Semantic Web Conference (ISWC 2024). CEUR-WS.

Identifying Semantic Relationships Between Research Topics Using Large Language Models in a Zero-Shot Learning Setting

Aggarwal T.;Salatino A.;Osborne F.;Motta E.

2024

Abstract

Knowledge Organization Systems (KOS), such as ontologies, taxonomies, and thesauri, play a crucial role in organising scientific knowledge. They help scientists navigate the vast landscape of research literature and are essential for building intelligent systems such as smart search engines, recommendation systems, conversational agents, and advanced analytics tools. However, the manual creation of these KOSs is costly, time-consuming, and often leads to outdated and overly broad representations. As a result, researchers have been exploring automated or semi-automated methods for generating ontologies of research topics. This paper analyses the use of large language models (LLMs) to identify semantic relationships between research topics. We specifically focus on six open and lightweight LLMs (up to 10.7 billion parameters) and use two zero-shot reasoning strategies to identify four types of relationships: broader, narrower, same-as, and other. Our preliminary analysis indicates that Dolphin2.1-OpenOrca-7B performs strongly in this task, achieving a 0.853 F1-score against a gold standard of 1,000 relationships derived from the IEEE Thesaurus. These promising results bring us one step closer to the next generation of tools for automatically curating KOSs, ultimately making the scientific literature easier to explore.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Large Language Models; Ontology Generation; Research Topics; Scholarly Knowledge; Scientific Knowledge Graphs; Zero-Shot Learning;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				4th International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment, Sci-K 2024 - November 12, 2024
			
	Anno del convegno
	
				2024
			
	Titolo degli atti
	
				Proceedings of the 4th International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment
co-located with 23rd International Semantic Web Conference (ISWC 2024)
			
	Collana o serie
	
				CEUR WORKSHOP PROCEEDINGS
			
	Data ahead of print o Data prima pubblicazione Online
	
				2024
			
	Data di pubblicazione
	
				2024
			
	Numero del volume
	
				3780
			
	URL alternativo
	
				https://ceur-ws.org/Vol-3780/
			
	Fulltext
	
				open
			
	Citazione
	
				Aggarwal, T., Salatino, A., Osborne, F., Motta, E. (2024). Identifying Semantic Relationships Between Research Topics Using Large Language Models in a Zero-Shot Learning Setting. In Proceedings of the 4th International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment
co-located with 23rd International Semantic Web Conference (ISWC 2024). CEUR-WS.
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
Aggarwal-2024-ISWC-VoR.pdf accesso aperto Descrizione: This volume and its papers are published under the Creative Commons License Attribution 4.0 International (CC BY 4.0). Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 503.5 kB Formato Adobe PDF Visualizza/Apri	503.5 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/525457

Citazioni

1

ND

Social impact