The aim of this work is to obtain a good quality approximation of the nearest neighbor distance (nnd) profile among sequences of a time series. The knowledge of the nearest neighbor distance of all the sequences provides useful information regarding, for example, anomalies and clusters of a time series, however the complexity of this task grows quadratically with the number of sequences, thus limiting its possible application. We propose here an approximate method which allows one to obtain good quality nnd profiles faster (1-2 orders of magnitude) than the brute force approach and which exploits the interdependence of three different topologies of a time series, one induced by the SAX clustering procedure, one induced by the position in time of each sequence and one by the Euclidean distance. The quality of the approximation has been evaluated with real life time series, where more than 98% of the nnd values obtained with our approach are exact and the average relative error for the approximated ones is usually below 10%.
Dominoni, M., Avogadro, P. (2019). Topological Approach for Finding Nearest Neighbor Sequence in Time Series. In Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR (pp.233-244). SciTePress [10.5220/0008493302330244].
Topological Approach for Finding Nearest Neighbor Sequence in Time Series
Dominoni, Matteo;Avogadro, Paolo
2019
Abstract
The aim of this work is to obtain a good quality approximation of the nearest neighbor distance (nnd) profile among sequences of a time series. The knowledge of the nearest neighbor distance of all the sequences provides useful information regarding, for example, anomalies and clusters of a time series, however the complexity of this task grows quadratically with the number of sequences, thus limiting its possible application. We propose here an approximate method which allows one to obtain good quality nnd profiles faster (1-2 orders of magnitude) than the brute force approach and which exploits the interdependence of three different topologies of a time series, one induced by the SAX clustering procedure, one induced by the position in time of each sequence and one by the Euclidean distance. The quality of the approximation has been evaluated with real life time series, where more than 98% of the nnd values obtained with our approach are exact and the average relative error for the approximated ones is usually below 10%.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.