Scientific Knowledge Graphs have recently become a powerful tool for exploring the research landscape and assisting scientific inquiry. It is crucial to generate and validate these resources to ensure they offer a comprehensive and accurate representation of specific research fields. However, manual approaches are not scalable, while automated methods often result in lower-quality resources. In this paper, we investigate novel validation techniques to improve the accuracy of automated KG generation methodologies, leveraging both a human-in-the-loop (HiL) and a large language model (LLM)-in-the-loop. Using the automated generation pipeline of the Computer Science Knowledge Graph as a case study, we demonstrate that precision can be increased by 12% (from 75% to 87%) using only LLMs. Moreover, a hybrid approach incorporating both LLMs and HiL significantly enhances both precision and recall, resulting in a 4% increase in the F1 score (from 77% to 81%).
Tsaneva, S., Dessi, D., Osborne, F., Sabou, M. (2024). Enhancing Scientific Knowledge Graph Generation Pipelines with LLMs and Human-in-the-Loop. In Proceedings of the 4th International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment co-located with 23rd International Semantic Web Conference (ISWC 2024). CEUR-WS.
Enhancing Scientific Knowledge Graph Generation Pipelines with LLMs and Human-in-the-Loop
Osborne F.;
2024
Abstract
Scientific Knowledge Graphs have recently become a powerful tool for exploring the research landscape and assisting scientific inquiry. It is crucial to generate and validate these resources to ensure they offer a comprehensive and accurate representation of specific research fields. However, manual approaches are not scalable, while automated methods often result in lower-quality resources. In this paper, we investigate novel validation techniques to improve the accuracy of automated KG generation methodologies, leveraging both a human-in-the-loop (HiL) and a large language model (LLM)-in-the-loop. Using the automated generation pipeline of the Computer Science Knowledge Graph as a case study, we demonstrate that precision can be increased by 12% (from 75% to 87%) using only LLMs. Moreover, a hybrid approach incorporating both LLMs and HiL significantly enhances both precision and recall, resulting in a 4% increase in the F1 score (from 77% to 81%).File | Dimensione | Formato | |
---|---|---|---|
Tsaneva-2024-ISWC-VoR.pdf
accesso aperto
Descrizione: This volume and its papers are published under the Creative Commons License Attribution 4.0 International (CC BY 4.0).
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Creative Commons
Dimensione
1.83 MB
Formato
Adobe PDF
|
1.83 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.