Searching for online information is nowadays a critical task in a scenario characterized by information overload and misinformation. To address these issues, it is necessary to provide users with both topically relevant and truthful information. Re-ranking is a strategy often used in Information Retrieval (IR) to consider multiple dimensions of relevance. However, re-rankers often analyze the full text of documents to obtain an overall relevance score at the re-ranking stage, which can lead to sub-optimal results. Some recent Transformer-based re-rankers actually consider text passages rather than the entire document, but focus only on topical relevance. Transformers are also being used in non-IR solutions to identify information truthfulness, but just to perform a binary classification task. Therefore, in this article, we propose an IR model based on re-ranking that focuses on suitably identified text passages from documents for retrieving both topically relevant and truthful information. This approach significantly reduces the noise introduced by query-unrelated content in long documents and allows us to evaluate the document’s truthfulness against it, enabling more effective retrieval. We tested the effectiveness of the proposed solution in the context of the Consumer Health Search task, considering publicly available datasets. Our results show that the proposed approach statistically outperforms full-text retrieval models in the context of multidimensional relevance, such as those based on aggregation, and monodimensional relevance Transformer-based re-rankers, such as BERT-based re-rankers.
Upadhyay, R., Pasi, G., Viviani, M. (2023). A Passage Retrieval Transformer-Based Re-Ranking Model for Truthful Consumer Health Search. In Machine Learning and Knowledge Discovery in Databases: Research Track European Conference, ECML PKDD 2023, Turin, Italy, September 18–22, 2023, Proceedings, Part I (pp.355-371). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-43412-9_21].
A Passage Retrieval Transformer-Based Re-Ranking Model for Truthful Consumer Health Search
Upadhyay, R;Pasi, G;Viviani, M
2023
Abstract
Searching for online information is nowadays a critical task in a scenario characterized by information overload and misinformation. To address these issues, it is necessary to provide users with both topically relevant and truthful information. Re-ranking is a strategy often used in Information Retrieval (IR) to consider multiple dimensions of relevance. However, re-rankers often analyze the full text of documents to obtain an overall relevance score at the re-ranking stage, which can lead to sub-optimal results. Some recent Transformer-based re-rankers actually consider text passages rather than the entire document, but focus only on topical relevance. Transformers are also being used in non-IR solutions to identify information truthfulness, but just to perform a binary classification task. Therefore, in this article, we propose an IR model based on re-ranking that focuses on suitably identified text passages from documents for retrieving both topically relevant and truthful information. This approach significantly reduces the noise introduced by query-unrelated content in long documents and allows us to evaluate the document’s truthfulness against it, enabling more effective retrieval. We tested the effectiveness of the proposed solution in the context of the Consumer Health Search task, considering publicly available datasets. Our results show that the proposed approach statistically outperforms full-text retrieval models in the context of multidimensional relevance, such as those based on aggregation, and monodimensional relevance Transformer-based re-rankers, such as BERT-based re-rankers.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.