The growing diffusion of IT-based services generates a lot of data useful for supporting the activities of firms, organisations, and state agencies. In such a context, data quality tasks are frequently addressed using cleansing routines, often framed in the wider context of ETL pro- cesses (Extraction, Transformation, and Loading). The design of these cleansing routines often relies on the experience of domain-experts, and this makes the evaluation of the quality level achieved a relevant concern to ensure the believability of the analysed results. In this paper we describe two model based techniques aimed at respec- Tively evaluating the consistency of a dataset and at identifying the cleansing alternatives. The techniques have been applied on a real-world dataset derived from the Italian labour market domain, which we made publicly available to the community.

Boselli, R., Cesarini, M., Mercorio, F., Mezzanzanica, M. (2014). Data quality on KDD: A real-life scenario. In 22nd Italian Symposium on Advanced Database Systems, SEBD 2014 (pp.378-385). Universita Reggio Calabria and Centro di Competenza (ICT-SUD).

Data quality on KDD: A real-life scenario

BOSELLI, ROBERTO
Primo
;
CESARINI, MIRKO
Secondo
;
MERCORIO, FABIO
Penultimo
;
MEZZANZANICA, MARIO
Ultimo
2014

Abstract

The growing diffusion of IT-based services generates a lot of data useful for supporting the activities of firms, organisations, and state agencies. In such a context, data quality tasks are frequently addressed using cleansing routines, often framed in the wider context of ETL pro- cesses (Extraction, Transformation, and Loading). The design of these cleansing routines often relies on the experience of domain-experts, and this makes the evaluation of the quality level achieved a relevant concern to ensure the believability of the analysed results. In this paper we describe two model based techniques aimed at respec- Tively evaluating the consistency of a dataset and at identifying the cleansing alternatives. The techniques have been applied on a real-world dataset derived from the Italian labour market domain, which we made publicly available to the community.
paper
Data cleansing; Data quality; Data visualisation; Model based approach; Model checking; Software
English
22nd Italian Symposium on Advanced Database Systems, SEBD 2014
2014
Boselli, R; Cesarini, M; Mercorio, F; Mezzanzanica, M
22nd Italian Symposium on Advanced Database Systems, SEBD 2014
9781634391450
2014
378
385
open
Boselli, R., Cesarini, M., Mercorio, F., Mezzanzanica, M. (2014). Data quality on KDD: A real-life scenario. In 22nd Italian Symposium on Advanced Database Systems, SEBD 2014 (pp.378-385). Universita Reggio Calabria and Centro di Competenza (ICT-SUD).
File in questo prodotto:
File Dimensione Formato  
SEBD2014.pdf

accesso aperto

Dimensione 475.02 kB
Formato Adobe PDF
475.02 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/58792
Citazioni
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
Social impact