Data lakes are repositories of data stored in natural/raw format. A data lake may include structured data from relational databases, semi-structured data (i.e., JSON, CSV), unstructured data (i.e., text data), or binary data (i.e., images, audio, video). It is usually built on top of cost-efficient infrastructures such as Hadoop, Amazon S3, MongoDB, ElasticSearch, etc. Several organisations rely on big data lakes for crucial tasks such as reporting, visualisation, advanced analytics, machine learning, and business intelligence. A major limitation of this solution is that without descriptive metadata and a mechanism to maintain it, such data tend to be noisy, making their management and analysis complex and time-consuming. Therefore, there is the need to add a semantic layer based on a formal ontology to describe the data and efficient mechanism to represent them as a knowledge graph. In this paper, we present a methodology to add a semantic layer to a data lake and thus obtain a knowledge graph that can support structured queries and advanced data exploration. We describe a practical implementation of a methodology applied to a data lake consisting of text data describing the online marketplace for lodging and tourism activities. We report statistics about the data lake and the resulting knowledge graph.

Chessa, A., Fenu, G., Motta, E., Osborne, F., Recupero, D., Salatino, A., et al. (2022). Enriching Data Lakes with Knowledge Graphs. In 1st International Workshop on Knowledge Graph Generation From Text and the 1st International Workshop on Modular Knowledge, TEXT2KG 2022 and MK 2022 (pp.123-131). CEUR-WS.

Enriching Data Lakes with Knowledge Graphs

Osborne F.;
2022

Abstract

Data lakes are repositories of data stored in natural/raw format. A data lake may include structured data from relational databases, semi-structured data (i.e., JSON, CSV), unstructured data (i.e., text data), or binary data (i.e., images, audio, video). It is usually built on top of cost-efficient infrastructures such as Hadoop, Amazon S3, MongoDB, ElasticSearch, etc. Several organisations rely on big data lakes for crucial tasks such as reporting, visualisation, advanced analytics, machine learning, and business intelligence. A major limitation of this solution is that without descriptive metadata and a mechanism to maintain it, such data tend to be noisy, making their management and analysis complex and time-consuming. Therefore, there is the need to add a semantic layer based on a formal ontology to describe the data and efficient mechanism to represent them as a knowledge graph. In this paper, we present a methodology to add a semantic layer to a data lake and thus obtain a knowledge graph that can support structured queries and advanced data exploration. We describe a practical implementation of a methodology applied to a data lake consisting of text data describing the online marketplace for lodging and tourism activities. We report statistics about the data lake and the resulting knowledge graph.
paper
Information Extraction; Knowledge Graphs; Semantic Data Lake;
English
1st International Workshop on Knowledge Graph Generation From Text and the 1st International Workshop on Modular Knowledge, TEXT2KG 2022 and MK 2022 - 30 May 2022
2022
Tiwari, S; Mihindukulasooriya, N; Osborne, F; Kontokostas, D; De Souza, J; Kejriwal, M; Bozzato, L; Carriero, VA; Hahmann, T; Zimmermann, A
1st International Workshop on Knowledge Graph Generation From Text and the 1st International Workshop on Modular Knowledge, TEXT2KG 2022 and MK 2022
2022
3184
123
131
open
Chessa, A., Fenu, G., Motta, E., Osborne, F., Recupero, D., Salatino, A., et al. (2022). Enriching Data Lakes with Knowledge Graphs. In 1st International Workshop on Knowledge Graph Generation From Text and the 1st International Workshop on Modular Knowledge, TEXT2KG 2022 and MK 2022 (pp.123-131). CEUR-WS.
File in questo prodotto:
File Dimensione Formato  
Chessa-2022-Ceur Workshop Proceed-VoR.pdf

accesso aperto

Descrizione: Intervento a convegno
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 1.01 MB
Formato Adobe PDF
1.01 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/412076
Citazioni
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
Social impact