This paper presents a novel approach to address contextual bandit problems with partially observable, delayed feedback by introducing an approximate Thompson sampling technique. This is a common setting, with applications ranging from online marketing to vaccine trials. Leveraging Bootstrapped Thompson sampling (BTS), we obtain an approximate posterior distribution over delay distributions and conversion probabilities, thereby extending an Expectation-Maximisation (EM) model to the Bayesian domain. Unlike prior methodologies, our approach does not overlook uncertainty on delays. Within the EM framework, we employ the Kaplan-Meier estimator to place no restriction on delay distributions. Through extensive benchmarking against state-of-the-art techniques, our approach demonstrates superior performance across the majority of tested environments, with comparable performance in the remaining cases. Furthermore, our method offers practical implementation using off-the-shelf libraries, facili...

Gigli, M., Stella, F. (2024). Bootstrap Your Conversions: Thompson Sampling for Partially Observable Delayed Rewards. In Proceedings of the 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024) (pp.1438-1452). ML Research Press.

Bootstrap Your Conversions: Thompson Sampling for Partially Observable Delayed Rewards

Gigli, M;Stella, F
2024

Abstract

This paper presents a novel approach to address contextual bandit problems with partially observable, delayed feedback by introducing an approximate Thompson sampling technique. This is a common setting, with applications ranging from online marketing to vaccine trials. Leveraging Bootstrapped Thompson sampling (BTS), we obtain an approximate posterior distribution over delay distributions and conversion probabilities, thereby extending an Expectation-Maximisation (EM) model to the Bayesian domain. Unlike prior methodologies, our approach does not overlook uncertainty on delays. Within the EM framework, we employ the Kaplan-Meier estimator to place no restriction on delay distributions. Through extensive benchmarking against state-of-the-art techniques, our approach demonstrates superior performance across the majority of tested environments, with comparable performance in the remaining cases. Furthermore, our method offers practical implementation using off-the-shelf libraries, facili...
paper
Multi-armed bandit; Thompson sampling; censored data
English
40th Conference on Uncertainty in Artificial Intelligence, UAI 2024 - 15 July 2024through 19 July 2024
2024
Kiyavash, N; Mooij, JM
Proceedings of the 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024)
2024
244
1438
1452
open
Gigli, M., Stella, F. (2024). Bootstrap Your Conversions: Thompson Sampling for Partially Observable Delayed Rewards. In Proceedings of the 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024) (pp.1438-1452). ML Research Press.
File in questo prodotto:
File Dimensione Formato  
Stella-2024-UAI 2024-VoR.pdf

accesso aperto

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Altro
Dimensione 872.26 kB
Formato Adobe PDF
872.26 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/529042
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
Social impact