Bicocca Open Archive

While Dense Retrieval Models (DRMs) have advanced Information Retrieval (IR), one limitation of these neural models is their narrow generalizability and robustness. To cope with this issue, one can leverage the Mixture-of-Experts (MoE) architecture. While previous IR studies have incorporated MoE architectures within the Transformer layers of DRMs, our work investigates an architecture that integrates a single MoE block (SB-MoE) after the output of the final Transformer layer. Our empirical evaluation investigates how SB-MoE compares, in terms of retrieval effectiveness, to standard fine-tuning. In detail, we fine-tune three DRMs (TinyBERT, BERT, and Contriever) across four benchmark collections with and without adding the MoE block. Moreover, since MoE showcases performance variations with respect to its parameters (i.e., the number of experts), we conduct additional experiments to investigate this aspect further. The findings show the effectiveness of SB-MoE especially for DRMs with a low number of parameters (i.e., TinyBERT), as it consistently outperforms the fine-tuned underlying model on all four benchmarks. For DRMs with a higher number of parameters (i.e., BERT and Contriever), SB-MoE requires larger numbers of training samples to yield better retrieval performance.

Sokli, E., Kasela, P., Peikos, G., Pasi, G. (2024). Investigating Mixture of Experts in Dense Retrieval [Altro].

Investigating Mixture of Experts in Dense Retrieval

Effrosyni Sokli;Pranav Kasela;Georgios Peikos;Gabriella Pasi

2024

Abstract

While Dense Retrieval Models (DRMs) have advanced Information Retrieval (IR), one limitation of these neural models is their narrow generalizability and robustness. To cope with this issue, one can leverage the Mixture-of-Experts (MoE) architecture. While previous IR studies have incorporated MoE architectures within the Transformer layers of DRMs, our work investigates an architecture that integrates a single MoE block (SB-MoE) after the output of the final Transformer layer. Our empirical evaluation investigates how SB-MoE compares, in terms of retrieval effectiveness, to standard fine-tuning. In detail, we fine-tune three DRMs (TinyBERT, BERT, and Contriever) across four benchmark collections with and without adding the MoE block. Moreover, since MoE showcases performance variations with respect to its parameters (i.e., the number of experts), we conduct additional experiments to investigate this aspect further. The findings show the effectiveness of SB-MoE especially for DRMs with a low number of parameters (i.e., TinyBERT), as it consistently outperforms the fine-tuned underlying model on all four benchmarks. For DRMs with a higher number of parameters (i.e., BERT and Contriever), SB-MoE requires larger numbers of training samples to yield better retrieval performance.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Altro
			
	Descrizione
	
				Preprint
			
	Parole chiave
	
				Computer Science - Information Retrieval; Computer Science - Information Retrieval; Computer Science - Artificial Intelligence
			
	Lingua del contenuto
	
				English
			
	Data di pubblicazione
	
				2024
			
	URL alternativo
	
				http://arxiv.org/abs/2412.11864v1
			
	Citazione
	
				Sokli, E., Kasela, P., Peikos, G., Pasi, G. (2024). Investigating Mixture of Experts in Dense Retrieval [Altro].
			
	Fulltext
	
				open
			
	Appare nelle tipologie:
	
				99 - Altro

File in questo prodotto:

File	Dimensione	Formato
Sokli-2024-Investigating-preprint.pdf accesso aperto Descrizione: Depositato in arXiv Tipologia di allegato: Submitted Version (Pre-print) Licenza: Creative Commons Dimensione 431.49 kB Formato Adobe PDF Visualizza/Apri	431.49 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/548722

Citazioni

ND

ND

Social impact