The recent advancements in Neural Networks research have pushed forward the state-of-the-art in many language-related tasks, including Information Retrieval, bringing new opportunities for representing and leveraging user-related information during personalization. However, their application in the context of Personalized Search is still an open research area, with many issues and challenges to be addressed and tackled. In this dissertation, we focus on representing the user preferences from multiple perspectives, managing and selecting the user information to personalize the current search, and improving query representations with user-specific data by proposing new approaches based on Neural Networks. Moreover, we address the lack of publicly available large-scale datasets suited for training and evaluating Neural Networks-based approaches for Personalized Search. We first study the problem of leveraging the user preferences represented from multiple perspectives by proposing a multi-representation re-ranking model. We show that our proposed approach achieves competitive performance while being fast, scalable, and extended to include additional representations and features. We then conduct an in-depth analysis of a Neural Networks mechanism, the Attention, when employed for user modeling, highlighting some shortcomings due to one of its internal components, the Softmax normalization function. We address those shortcomings by introducing a novel Attention variant, the Denoising Attention, that adopts a more robust normalization scheme and employs a filtering mechanism. Experimental evaluations clearly show the benefits of our proposed approach over other Attention variants. Furthermore, we address the enhancement of query representations with user-specific data by proposing a novel Personalized Query Expansion approach designed for contextualized word embeddings, which leverages an offline clustering-based procedure to identify the user-related terms that better represent the user interests. We show it improves in terms of retrieval effectiveness over word embedding-based Query Expansion methods at the state-of-the-art while also achieving sub-millisecond expansion time thanks to an approximation we propose. Finally, we discuss the state of Personalized Information Retrieval evaluation and the available publicly available datasets and propose and share a novel large-scale benchmark across four domains, with more than 18 million documents and 1.9 million queries. We present a detailed description of the benchmark construction procedure, highlighting its characteristics and challenges, and provide baselines for future works. The solutions and findings presented in this dissertation show that Personalized Search is still an open research area. Moreover, the new opportunities brought to the table by the recent advancements in Neural Networks also introduce new challenges that need to be correctly addressed to both take full advantage of their potential and make them valuable for real-world Personalized Search applications.
I recenti progressi nella ricerca sulle Artificial Neural Networks (reti neurali) hanno fatto avanzare lo stato dell'arte in molti task legati al linguaggio, tra cui l'Information Retrieval, offrendo nuove opportunità per rappresentare e sfruttare le informazioni relative all'utente durante la personalizzazione. Tuttavia, la loro applicazione nel contesto della Personalized Search è ancora un'area di ricerca aperta, con molte questioni e sfide da affrontare. In questa tesi, ci concentriamo sulla rappresentazione delle preferenze dell'utente da più prospettive, sulla gestione e sulla selezione delle informazioni dell'utente per personalizzare la ricerca corrente e sul miglioramento della rappresentazione delle query con dati specifici dell'utente, proponendo nuovi approcci basati sulle reti neurali. Inoltre, affrontiamo il problema della mancanza di grandi dataset condivisi pubblicamente, adatti all'addestramento e alla valutazione di approcci basati su reti neurali per la ricerca personalizzata. In primo luogo, studiamo il problema di sfruttare le preferenze degli utenti rappresentate da più prospettive, proponendo un modello di ri-ranking multi-rappresentazione. Dimostriamo che l'approccio proposto raggiunge prestazioni competitive, è efficiente, scalabile e può essere esteso per includere rappresentazioni ed informazioni aggiuntive. In seguito, abbiamo condotto un'analisi approfondita di un meccanismo delle reti neurali, l'Attention, quando viene impiegato per la modellazione degli utenti, evidenziando alcune carenze dovute a uno dei suoi componenti interni, la funzione di normalizzazione Softmax. Per ovviare a tali carenze, abbiamo introdotto una nuova variante dell'Attention, l'a Denoising Attention, che adotta uno schema di normalizzazione più robusto e impiega un meccanismo di filtraggio. Le valutazioni sperimentali mostrano chiaramente i vantaggi dell'approccio proposto rispetto alle altre varianti di Attention. Inoltre, ci occupiamo del miglioramento delle rappresentazioni delle query con dati specifici dell'utente, proponendo un nuovo approccio di Query Expansion personalizzata progettato per i contextual word embedding, che sfrutta una procedura offline basata sul clustering per identificare i termini correlati all'utente che meglio rappresentano i suoi interessi. Dimostriamo che migliora in termini di efficacia di recupero rispetto ai metodi di Query Expansion basati su word embedding allo stato dell'arte, ottenendo anche tempi di espansione inferiori al millisecondo grazie a un'approssimazione da noi proposta. Infine, discutiamo lo stato della valutazione dell'Information Retrieval personalizzato e i dataset disponibili pubblicamente e proponiamo e condividiamo un nuovo benchmark su larga scala per quattro domini, con oltre 18 milioni di documenti e 1,9 milioni di query. Presentiamo una descrizione dettagliata della procedura di costruzione del benchmark, evidenziandone le caratteristiche e le sfide, e forniamo delle linee guida per i lavori futuri. Le soluzioni e i risultati presentati in questa tesi dimostrano che la ricerca personalizzata è un'area di ricerca ancora aperta. Inoltre, le nuove opportunità offerte dai recenti progressi delle reti neurali introducono anche nuove sfide che devono essere affrontate correttamente per sfruttare appieno il loro potenziale e renderle utili per le applicazioni di ricerca personalizzata del mondo reale.
(2023). Neural Approaches to Personalized Search. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2023).
Neural Approaches to Personalized Search
BASSANI, ELIAS
2023
Abstract
The recent advancements in Neural Networks research have pushed forward the state-of-the-art in many language-related tasks, including Information Retrieval, bringing new opportunities for representing and leveraging user-related information during personalization. However, their application in the context of Personalized Search is still an open research area, with many issues and challenges to be addressed and tackled. In this dissertation, we focus on representing the user preferences from multiple perspectives, managing and selecting the user information to personalize the current search, and improving query representations with user-specific data by proposing new approaches based on Neural Networks. Moreover, we address the lack of publicly available large-scale datasets suited for training and evaluating Neural Networks-based approaches for Personalized Search. We first study the problem of leveraging the user preferences represented from multiple perspectives by proposing a multi-representation re-ranking model. We show that our proposed approach achieves competitive performance while being fast, scalable, and extended to include additional representations and features. We then conduct an in-depth analysis of a Neural Networks mechanism, the Attention, when employed for user modeling, highlighting some shortcomings due to one of its internal components, the Softmax normalization function. We address those shortcomings by introducing a novel Attention variant, the Denoising Attention, that adopts a more robust normalization scheme and employs a filtering mechanism. Experimental evaluations clearly show the benefits of our proposed approach over other Attention variants. Furthermore, we address the enhancement of query representations with user-specific data by proposing a novel Personalized Query Expansion approach designed for contextualized word embeddings, which leverages an offline clustering-based procedure to identify the user-related terms that better represent the user interests. We show it improves in terms of retrieval effectiveness over word embedding-based Query Expansion methods at the state-of-the-art while also achieving sub-millisecond expansion time thanks to an approximation we propose. Finally, we discuss the state of Personalized Information Retrieval evaluation and the available publicly available datasets and propose and share a novel large-scale benchmark across four domains, with more than 18 million documents and 1.9 million queries. We present a detailed description of the benchmark construction procedure, highlighting its characteristics and challenges, and provide baselines for future works. The solutions and findings presented in this dissertation show that Personalized Search is still an open research area. Moreover, the new opportunities brought to the table by the recent advancements in Neural Networks also introduce new challenges that need to be correctly addressed to both take full advantage of their potential and make them valuable for real-world Personalized Search applications.File | Dimensione | Formato | |
---|---|---|---|
phd_unimib_748203.pdf
accesso aperto
Descrizione: Neural Approaches to Personalized Search
Tipologia di allegato:
Doctoral thesis
Dimensione
1.58 MB
Formato
Adobe PDF
|
1.58 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.