Many data sets contain temporal records over a long period of time; each record is associated with a time stamp and describes some aspects of a realworld entity at that particular time (e.g., author information in DBLP). In such cases, we often wish to identify records that describe the same entity over time and so be able to enable interesting longitudinal data analysis. However, existing record linkage techniques ignore the temporal information and can fall short for temporal data. This paper studies linking temporal records. First, we apply time decay to capture the effect of elapsed time on entity value evolution. Second, instead of comparing each pair of records locally, we propose clustering methods that consider time order of the records and make global decisions. Experimental results show that our algorithms significantly outperform traditional linkage methods on various temporal data sets.
Li, P., Luna Dong, X., Maurino, A., Srivastava, D. (2011). Linking Temporal Records. PROCEEDINGS OF THE VLDB ENDOWMENT, 4(11), 956-967.
Linking Temporal Records
LI, PEI;MAURINO, ANDREA;
2011
Abstract
Many data sets contain temporal records over a long period of time; each record is associated with a time stamp and describes some aspects of a realworld entity at that particular time (e.g., author information in DBLP). In such cases, we often wish to identify records that describe the same entity over time and so be able to enable interesting longitudinal data analysis. However, existing record linkage techniques ignore the temporal information and can fall short for temporal data. This paper studies linking temporal records. First, we apply time decay to capture the effect of elapsed time on entity value evolution. Second, instead of comparing each pair of records locally, we propose clustering methods that consider time order of the records and make global decisions. Experimental results show that our algorithms significantly outperform traditional linkage methods on various temporal data sets.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.