The paper focuses on exploiting model checking tools to assess data quality in the context of data warehouses (DW) and decision support systems (DSS). Data quality assessment is not an easy task, especially when dealing with (very) large source databases requiring (very) extended transformation and quality improvement activities. Nevertheless, data quality is paramount for DSS and DW. The authors propose a methodology that can be used to evaluate the quality of both the data sources and the output of quality improvement processes. The methodology can be applied to the data-sets that can be modelled as flows of events and which can be described by a finite state machine. The methodology allows for high degree of automation, being suitable for repetitive unmanned data management processes. The paper outlines the preliminary results of the methodology applied to a real case scenario, where monthly updates from a database source system (millions of records) are Extracted, Transformed, and Loaded to a DW. In the scenario, data undergoes very complex tasks aiming at correcting errors and improving the overall data quality. The methodology has proved successful, by giving insights on the data quality levels and by providing suggestions on how to ameliorate the data quality.
Mezzanzanica, M., Boselli, R., Cesarini, M., Mercorio, F. (2011). Data Quality through Model Checking Techniques. In Advances in Intelligent Data Analysis X (pp.270-281). Berlin : Springer [10.1007/978-3-642-24800-9_26].
Data Quality through Model Checking Techniques
MEZZANZANICA, MARIO;BOSELLI, ROBERTO;CESARINI, MIRKO;MERCORIO, FABIO
2011
Abstract
The paper focuses on exploiting model checking tools to assess data quality in the context of data warehouses (DW) and decision support systems (DSS). Data quality assessment is not an easy task, especially when dealing with (very) large source databases requiring (very) extended transformation and quality improvement activities. Nevertheless, data quality is paramount for DSS and DW. The authors propose a methodology that can be used to evaluate the quality of both the data sources and the output of quality improvement processes. The methodology can be applied to the data-sets that can be modelled as flows of events and which can be described by a finite state machine. The methodology allows for high degree of automation, being suitable for repetitive unmanned data management processes. The paper outlines the preliminary results of the methodology applied to a real case scenario, where monthly updates from a database source system (millions of records) are Extracted, Transformed, and Loaded to a DW. In the scenario, data undergoes very complex tasks aiming at correcting errors and improving the overall data quality. The methodology has proved successful, by giving insights on the data quality levels and by providing suggestions on how to ameliorate the data quality.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.