This study provides insights into both addressing data confidentiality concerns and enhancing document retrieval effectiveness in Data Marketplaces, which in this specific study consist of unstructured, textual documents. Through a semi-automatic sanitization process leveraging token masking with text summarization, possibly complemented by Coreference Resolution, the proposed solution mitigates the risk of inferring confidential information while maintaining search performance. Experimental results demonstrate encouraging improvements in both aspects with respect to baseline solutions.
Cassani, L., Livraga, G., Viviani, M. (2024). Assessing Document Sanitization for Controlled Information Release and Retrieval in Data Marketplaces. In Experimental IR Meets Multilinguality, Multimodality, and Interaction 15th International Conference of the CLEF Association, CLEF 2024, Grenoble, France, September 9–12, 2024, Proceedings, Part I (pp.88-99) [10.1007/978-3-031-71736-9_4].
Assessing Document Sanitization for Controlled Information Release and Retrieval in Data Marketplaces
Viviani, Marco
2024
Abstract
This study provides insights into both addressing data confidentiality concerns and enhancing document retrieval effectiveness in Data Marketplaces, which in this specific study consist of unstructured, textual documents. Through a semi-automatic sanitization process leveraging token masking with text summarization, possibly complemented by Coreference Resolution, the proposed solution mitigates the risk of inferring confidential information while maintaining search performance. Experimental results demonstrate encouraging improvements in both aspects with respect to baseline solutions.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.