We initiate a study on the fundamental relation between data sanitization (i.e., the process of hiding confidential information in a given dataset) and frequent pattern mining, in the context of sequential (string) data. Current methods for string sanitization hide confidential patterns introducing, however, a number of spurious patterns that may harm the utility of frequent pattern mining. The main computational problem is to minimize this harm. Our contribution here is twofold. First, we present several hardness results, for different variants of this problem, essentially showing that these variants cannot be solved or even be approximated in polynomial time. Second, we propose integer linear programming formulations for these variants and algorithms to solve them, which work in polynomial time under certain realistic assumptions on the problem parameters.

Bernardini, G., Conte, A., Gourdel, G., Grossi, R., Loukides, G., Pisanti, N., et al. (2020). Hide and Mine in Strings: Hardness and Algorithms. In 20th IEEE International Conference on Data Mining, ICDM 2020, Sorrento, Italy, November 17-20, 2020. IEEE 2020 (pp.924-929) [10.1109/ICDM50108.2020.00103].

Hide and Mine in Strings: Hardness and Algorithms

Bernardini, Giulia;
2020

Abstract

We initiate a study on the fundamental relation between data sanitization (i.e., the process of hiding confidential information in a given dataset) and frequent pattern mining, in the context of sequential (string) data. Current methods for string sanitization hide confidential patterns introducing, however, a number of spurious patterns that may harm the utility of frequent pattern mining. The main computational problem is to minimize this harm. Our contribution here is twofold. First, we present several hardness results, for different variants of this problem, essentially showing that these variants cannot be solved or even be approximated in polynomial time. Second, we propose integer linear programming formulations for these variants and algorithms to solve them, which work in polynomial time under certain realistic assumptions on the problem parameters.
paper
Data privacy, data sanitization, knowledge hiding, frequent pattern mining, string algorithms
English
2020 IEEE International Conference on Data Mining (ICDM)
2020
Claudia Plant, Haixun Wang, Alfredo Cuzzocrea, Carlo Zaniolo, Xindong Wu:
20th IEEE International Conference on Data Mining, ICDM 2020, Sorrento, Italy, November 17-20, 2020. IEEE 2020
978-1-7281-8316-9
2020
924
929
none
Bernardini, G., Conte, A., Gourdel, G., Grossi, R., Loukides, G., Pisanti, N., et al. (2020). Hide and Mine in Strings: Hardness and Algorithms. In 20th IEEE International Conference on Data Mining, ICDM 2020, Sorrento, Italy, November 17-20, 2020. IEEE 2020 (pp.924-929) [10.1109/ICDM50108.2020.00103].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/303067
Citazioni
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 2
Social impact