The paper focuses on exploiting model checking tools to assess data quality in the context of data warehouses (DW) and decision support systems (DSS). Data quality assessment is not an easy task, especially when dealing with (very) large source databases requiring (very) extended transformation and quality improvement activities. Nevertheless, data quality is paramount for DSS and DW. The authors propose a methodology that can be used to evaluate the quality of both the data sources and the output of quality improvement processes. The methodology can be applied to the data-sets that can be modelled as flows of events and which can be described by a finite state machine. The methodology allows for high degree of automation, being suitable for repetitive unmanned data management processes. The paper outlines the preliminary results of the methodology applied to a real case scenario, where monthly updates from a database source system (millions of records) are Extracted, Transformed, and Loaded to a DW. In the scenario, data undergoes very complex tasks aiming at correcting errors and improving the overall data quality. The methodology has proved successful, by giving insights on the data quality levels and by providing suggestions on how to ameliorate the data quality.

Mezzanzanica, M., Boselli, R., Cesarini, M., Mercorio, F. (2011). Data Quality through Model Checking Techniques. In Advances in Intelligent Data Analysis X (pp.270-281). Berlin : Springer [10.1007/978-3-642-24800-9_26].

Data Quality through Model Checking Techniques

MEZZANZANICA, MARIO;BOSELLI, ROBERTO;CESARINI, MIRKO;MERCORIO, FABIO
2011

Abstract

The paper focuses on exploiting model checking tools to assess data quality in the context of data warehouses (DW) and decision support systems (DSS). Data quality assessment is not an easy task, especially when dealing with (very) large source databases requiring (very) extended transformation and quality improvement activities. Nevertheless, data quality is paramount for DSS and DW. The authors propose a methodology that can be used to evaluate the quality of both the data sources and the output of quality improvement processes. The methodology can be applied to the data-sets that can be modelled as flows of events and which can be described by a finite state machine. The methodology allows for high degree of automation, being suitable for repetitive unmanned data management processes. The paper outlines the preliminary results of the methodology applied to a real case scenario, where monthly updates from a database source system (millions of records) are Extracted, Transformed, and Loaded to a DW. In the scenario, data undergoes very complex tasks aiming at correcting errors and improving the overall data quality. The methodology has proved successful, by giving insights on the data quality levels and by providing suggestions on how to ameliorate the data quality.
paper
Data Quality, Model Checking, ETL Certification, Semantic Data Analysis
English
10th International Symposium on Intelligent Data Analysis, IDA 2011
2011
Gama, J; Bradley, E; Hollmén, J
Advances in Intelligent Data Analysis X
978-3-642-24799-6
2011
7014
270
281
none
Mezzanzanica, M., Boselli, R., Cesarini, M., Mercorio, F. (2011). Data Quality through Model Checking Techniques. In Advances in Intelligent Data Analysis X (pp.270-281). Berlin : Springer [10.1007/978-3-642-24800-9_26].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/26566
Citazioni
  • Scopus 16
  • ???jsp.display-item.citation.isi??? 8
Social impact