The growing diffusion of IT-based services generates a lot of data useful for supporting the activities of firms, organisations, and state agencies. In such a context, data quality tasks are frequently addressed using cleansing routines, often framed in the wider context of ETL pro- cesses (Extraction, Transformation, and Loading). The design of these cleansing routines often relies on the experience of domain-experts, and this makes the evaluation of the quality level achieved a relevant concern to ensure the believability of the analysed results. In this paper we describe two model based techniques aimed at respec- Tively evaluating the consistency of a dataset and at identifying the cleansing alternatives. The techniques have been applied on a real-world dataset derived from the Italian labour market domain, which we made publicly available to the community.
Boselli, R., Cesarini, M., Mercorio, F., Mezzanzanica, M. (2014). Data quality on KDD: A real-life scenario. In 22nd Italian Symposium on Advanced Database Systems, SEBD 2014 (pp.378-385). Universita Reggio Calabria and Centro di Competenza (ICT-SUD).
Data quality on KDD: A real-life scenario
BOSELLI, ROBERTOPrimo
;CESARINI, MIRKOSecondo
;MERCORIO, FABIOPenultimo
;MEZZANZANICA, MARIOUltimo
2014
Abstract
The growing diffusion of IT-based services generates a lot of data useful for supporting the activities of firms, organisations, and state agencies. In such a context, data quality tasks are frequently addressed using cleansing routines, often framed in the wider context of ETL pro- cesses (Extraction, Transformation, and Loading). The design of these cleansing routines often relies on the experience of domain-experts, and this makes the evaluation of the quality level achieved a relevant concern to ensure the believability of the analysed results. In this paper we describe two model based techniques aimed at respec- Tively evaluating the consistency of a dataset and at identifying the cleansing alternatives. The techniques have been applied on a real-world dataset derived from the Italian labour market domain, which we made publicly available to the community.File | Dimensione | Formato | |
---|---|---|---|
SEBD2014.pdf
accesso aperto
Dimensione
475.02 kB
Formato
Adobe PDF
|
475.02 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.