Multi-armed Bandits: an introduction
Doctorants/Post-Doctorants
Lieu:
Salle séminaire M3-324
Orateur:
Hassan Saber
Affiliation:
INRIA Lille
Dates:
Mercredi, 13 Novembre, 2019 - 17:00 - 18:00
Résumé:
This presentation is about optimal sequential allocation in unknown random environments. More precisely, we consider the setting known under the conventional, if not very explicit, name of (stochastic) multi-armed bandit, in reference to the 19th century gambling game. In the multi-armed bandit model, the emphasis is put on focussing as quickly as possible on the best available option(s) rather than on estimating precisely the efficiency of each option. These options are referred to as arms and each of them is associated with a distribution; arms are indexed by a and associated distributions are denoted by ν a .The archetypal example occurs in clinical trials where the options (or arms) correspond to available treatments whose efficiencies are unknown a priori and patients arrive sequentially; the action consists of prescribing a particular treatment to the patient and the observation corresponds (for instance) to the success or failure of the treatment. The goal is clearly here to achieve as many successes as possible. A strategy for doing so is said to be anytime if it does not require to know in advance the number of patients that will participate to the experiment. Although the term multi-armed bandit was probably coined in the late 1960’s (Gittins, 1979), the origin of the problem can be traced back to fundamental questions about optimal stopping policies in the context of clinical trials (see Thompson, 1933, 1935) raised since the 1930’s (see also Wald, 1945; Robbins, 1952).
Abstract tiré de "KULLBACK-LEIBLER UPPER CONFIDENCE BOUNDS
FOR OPTIMAL SEQUENTIAL ALLOCATION", By Olivier Cappé, Aurélien Garivier, Odalric-Ambrym
Maillard, Rémi Munos and Gilles Stoltz
- Accueil
- Annuaire
- Equipes
- Evènements
- Congrès
- Invités
- Séminaires, Groupes de Travail et Colloquium
- Séminaires
- Analyse Complexe et Equations Différentielles
- Analyse Fonctionnelle
- Analyse Numérique et Equations Aux Dérivées Partielles
- Arithmétique
- Formes Automorphes
- Géométrie Algébrique
- Géométrie des espaces singuliers
- Géométrie Dynamique
- Histoire des Mathématiques
- Physique Mathématique
- Probabilités et Statistique
- Singularités et Applications
- Théorie Analytique et Analyse Harmonique
- Topologie
- Colloquium
- Groupes de Travail
- Analyse harmonique et théorie analytique
- Autour des fractales
- Calcul de Malliavin et processus fractionnaires
- Déformations des singularités de surfaces
- Equations aux dérivées partielles
- Extraction du signal
- Fondements mathématiques du deep learning
- Géométrie Non-Archimédienne
- Géométrie Stochastique
- Idéaux de Hodge
- Leçons d'Analyse
- Matrices Aléatoires
- Probabilités
- Statistique et Grande Dimension
- Systèmes Dynamiques
- Topologie
- W-algèbres
- Doctorants et Post-doctorants
- Séminaires
- Soutenances
- Anciens Séminaires et Groupes de Travail
- Formation par la Recherche
- Laboratoire
- Liens utiles
- Projets
- Recrutements
- Services