A conceptual framework for the automatic discovery of dependencies between data quality dimensions is described. Dependency discovery consists in recovering the dependency structure for a set of data quality dimensions measured on attributes of a database. This task is accomplished through the data mining methodology, by learning a Bayesian Network from a database. The Bayesian Network is used to analyze dependency between data quality dimensions associated with different attributes. The proposed framework is instantiated on a real world database. The task of dependency discovery is presented in the case when the following data quality dimensions are considered; accuracy, completeness, and consistency. The Bayesian Network model shows how data quality can be improved while satisfying budget constraints. © Springer-Verlag Berlin Heidelberg 2010.

Barone, D., Stella, F., Batini, C. (2010). Dependency Discovery in Data Quality. In Advanced Information Systems Engineering 22nd International Conference, CAiSE 2010, Hammamet, Tunisia, June 7-9, 2010, Proceedings (pp.53-67). SPRINGER-VERLAG BERLIN [10.1007/978-3-642-13094-6_6].

Dependency Discovery in Data Quality

STELLA, FABIO ANTONIO;BATINI, CARLO
2010

Abstract

A conceptual framework for the automatic discovery of dependencies between data quality dimensions is described. Dependency discovery consists in recovering the dependency structure for a set of data quality dimensions measured on attributes of a database. This task is accomplished through the data mining methodology, by learning a Bayesian Network from a database. The Bayesian Network is used to analyze dependency between data quality dimensions associated with different attributes. The proposed framework is instantiated on a real world database. The task of dependency discovery is presented in the case when the following data quality dimensions are considered; accuracy, completeness, and consistency. The Bayesian Network model shows how data quality can be improved while satisfying budget constraints. © Springer-Verlag Berlin Heidelberg 2010.
paper
Data quality, Bayesian networks, Data mining
English
The 22nd International Conference on Advanced Information Systems Engineering (CAiSE'10)
2010
Advanced Information Systems Engineering 22nd International Conference, CAiSE 2010, Hammamet, Tunisia, June 7-9, 2010, Proceedings
3642130933
2010
6051
53
67
none
Barone, D., Stella, F., Batini, C. (2010). Dependency Discovery in Data Quality. In Advanced Information Systems Engineering 22nd International Conference, CAiSE 2010, Hammamet, Tunisia, June 7-9, 2010, Proceedings (pp.53-67). SPRINGER-VERLAG BERLIN [10.1007/978-3-642-13094-6_6].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/12965
Citazioni
  • Scopus 19
  • ???jsp.display-item.citation.isi??? 9
Social impact