The longitudinal data collected by public administrations and large organisations are apt to describe social and economic phenomena, whose dynamics require strong attention from policy makers and civil servants. Unfortunately the quality of the stored data is often very poor, therefore data cleansing is a mandatory step before their exploitation. This paper is driven by the idea that formal methods (specifically model checking) can provide a strong contribution to extracting, formalising, and refining consistency requirements from the domain knowledge, and then verifying the real data against the elicited requirements. We developed a methodology (the Robust Data Quality Analysis) assessing the quality of both the original data and the cleansing results. We applied the proposed approach to a real world scenario in the labour market domain, evaluating the consistency of millions of people careers. The results show that our approach can provide an effective contribution to the improvement of data cleansing activities. Copyright
Boselli, R., Cesarini, M., Mercorio, F., Mezzanzanica, M. (2014). Longitudinal data consistency verification using formal methods. INTERNATIONAL JOURNAL OF INFORMATION QUALITY, 3(3), 185-206 [10.1504/IJIQ.2014.064054].
Longitudinal data consistency verification using formal methods
BOSELLI, ROBERTO;CESARINI, MIRKO;MERCORIO, FABIO;MEZZANZANICA, MARIO
2014
Abstract
The longitudinal data collected by public administrations and large organisations are apt to describe social and economic phenomena, whose dynamics require strong attention from policy makers and civil servants. Unfortunately the quality of the stored data is often very poor, therefore data cleansing is a mandatory step before their exploitation. This paper is driven by the idea that formal methods (specifically model checking) can provide a strong contribution to extracting, formalising, and refining consistency requirements from the domain knowledge, and then verifying the real data against the elicited requirements. We developed a methodology (the Robust Data Quality Analysis) assessing the quality of both the original data and the cleansing results. We applied the proposed approach to a real world scenario in the labour market domain, evaluating the consistency of millions of people careers. The results show that our approach can provide an effective contribution to the improvement of data cleansing activities. CopyrightI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.