Bicocca Open Archive

Existing Bayesian nonparametric methodologies for bandit problems focus on exact observations, leaving a gap in those bandit applications where censored observations are crucial.We address this gap by extending a Bayesian nonparametric two-armed bandit problem to right-censored data, where each arm is generated from a beta-Stacy process as defined byWalker and Muliere (1997). We first show some properties of the expected advantage of choosing one arm over the other, namely the monotonicity in the arm response and, limited to the case of continuous state space, the continuity in the right-censored arm response. We partially characterize optimal strategies by proving the existence of stay-with-a-winner and stay-witha-winner/switch-on-a-loser break-even points, under non-restrictive conditions that include the special cases of the simple homogeneous process and the Dirichlet process. Numerical estimations and simulations for a variety of discrete and continuous state space settings are presented to illustrate the performance and flexibility of our framework.

Peluso, S., Mira, A., Muliere, P. (2017). Learning vs earning trade-off with missing or censored observations: The two-armed bayesian nonparametric beta-stacy bandit problem. ELECTRONIC JOURNAL OF STATISTICS, 11(2), 3368-3406 [10.1214/17-EJS1342].

Learning vs earning trade-off with missing or censored observations: The two-armed bayesian nonparametric beta-stacy bandit problem

Peluso Stefano;Mira Antonietta;Muliere Pietro

2017

Abstract

Existing Bayesian nonparametric methodologies for bandit problems focus on exact observations, leaving a gap in those bandit applications where censored observations are crucial.We address this gap by extending a Bayesian nonparametric two-armed bandit problem to right-censored data, where each arm is generated from a beta-Stacy process as defined byWalker and Muliere (1997). We first show some properties of the expected advantage of choosing one arm over the other, namely the monotonicity in the arm response and, limited to the case of continuous state space, the continuity in the right-censored arm response. We partially characterize optimal strategies by proving the existence of stay-with-a-winner and stay-witha-winner/switch-on-a-loser break-even points, under non-restrictive conditions that include the special cases of the simple homogeneous process and the Dirichlet process. Numerical estimations and simulations for a variety of discrete and continuous state space settings are presented to illustrate the performance and flexibility of our framework.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Bandit Problem; Bayesian Nonparametrics; Beta-Stacy Process
			
	Lingua del contenuto
	
				English
			
	Data di pubblicazione
	
				2017
			
	Rivista
	
				ELECTRONIC JOURNAL OF STATISTICS
			
	Numero del volume
	
				11
			
	Fascicolo
	
				2
			
	Pagina iniziale
	
				3368
			
	Pagina finale
	
				3406
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1214/17-EJS1342
			
	URL alternativo
	
				https://projecteuclid.org/download/pdfview_1/euclid.ejs/1507255609
			
	Fulltext
	
				none
			
	Citazione
	
				Peluso, S., Mira, A., Muliere, P. (2017). Learning vs earning trade-off with missing or censored observations: The two-armed bayesian nonparametric beta-stacy bandit problem. ELECTRONIC JOURNAL OF STATISTICS, 11(2), 3368-3406 [10.1214/17-EJS1342].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/266153

Citazioni

1

1

Social impact