Multi-armed Bandits: an introduction

Doctorants/Post-Doctorants

Lieu: 
Salle séminaire M3-324
Orateur: 
Hassan Saber
Affiliation: 
INRIA Lille
Dates: 
Mercredi, 13 Novembre, 2019 - 17:00 - 18:00
Résumé: 
This presentation is about optimal sequential allocation in unknown random environments. More precisely, we consider the setting known under the conventional, if not very explicit, name of (stochastic) multi-armed bandit, in reference to the 19th century gambling game. In the multi-armed bandit model, the emphasis is put on focussing as quickly as possible on the best available option(s) rather than on estimating precisely the efficiency of each option. These options are referred to as arms and each of them is associated with a distribution; arms are indexed by a and associated distributions are denoted by ν a .The archetypal example occurs in clinical trials where the options (or arms) correspond to available treatments whose efficiencies are unknown a priori and patients arrive sequentially; the action consists of prescribing a particular treatment to the patient and the observation corresponds (for instance) to the success or failure of the treatment. The goal is clearly here to achieve as many successes as possible. A strategy for doing so is said to be anytime if it does not require to know in advance the number of patients that will participate to the experiment. Although the term multi-armed bandit was probably coined in the late 1960’s (Gittins, 1979), the origin of the problem can be traced back to fundamental questions about optimal stopping policies in the context of clinical trials (see Thompson, 1933, 1935) raised since the 1930’s (see also Wald, 1945; Robbins, 1952).
 
Abstract tiré de "KULLBACK-LEIBLER UPPER CONFIDENCE BOUNDS
FOR OPTIMAL SEQUENTIAL ALLOCATION", By Olivier Cappé, Aurélien Garivier, Odalric-Ambrym
Maillard, Rémi Munos and Gilles Stoltz