A conceptual framework for the automatic discovery of dependencies between data quality dimensions is described. Dependency discovery consists in recovering the dependency structure for a set of data quality dimensions measured on attributes of a database. This task is accomplished through the data mining methodology, by learning a Bayesian Network from a database. The Bayesian Network is used to analyze dependency between data quality dimensions associated with different attributes. The proposed framework is instantiated on a real world database. The task of dependency discovery is presented in the case when the following data quality dimensions are considered; accuracy, completeness, and consistency. The Bayesian Network model shows how data quality can be improved while satisfying budget constraints. © Springer-Verlag Berlin Heidelberg 2010.
Barone, D., Stella, F., Batini, C. (2010). Dependency Discovery in Data Quality. In Advanced Information Systems Engineering 22nd International Conference, CAiSE 2010, Hammamet, Tunisia, June 7-9, 2010, Proceedings (pp.53-67). SPRINGER-VERLAG BERLIN [10.1007/978-3-642-13094-6_6].
Dependency Discovery in Data Quality
STELLA, FABIO ANTONIO;BATINI, CARLO
2010
Abstract
A conceptual framework for the automatic discovery of dependencies between data quality dimensions is described. Dependency discovery consists in recovering the dependency structure for a set of data quality dimensions measured on attributes of a database. This task is accomplished through the data mining methodology, by learning a Bayesian Network from a database. The Bayesian Network is used to analyze dependency between data quality dimensions associated with different attributes. The proposed framework is instantiated on a real world database. The task of dependency discovery is presented in the case when the following data quality dimensions are considered; accuracy, completeness, and consistency. The Bayesian Network model shows how data quality can be improved while satisfying budget constraints. © Springer-Verlag Berlin Heidelberg 2010.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.