Giter Site home page Giter Site logo

yzj527415955 / bandit-algorithms Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mgpopinjay/bandit-algorithms

0.0 1.0 0.0 458 KB

A small collection of Bandit Algorithms (ETC, E-Greedy, Elimination, UCB, Exp3, LinearUCB, and Thompson Sampling)

Python 100.00%

bandit-algorithms's Introduction

Bandit Algorithms

Multi-arm bandit algorithms are fundamental to enabling recommendation systems, online advertising, clinical trials, online resource allocation and online search. Here is a small collection of them in Python Implementation: ETC, E-Greedy, Elimination, UCB, Exp3, LinearUCB, and Thompson Sampling.

Vis-a-vis Bandits used in Reinforcement Learning settings, three key assumptions apply:

  • Reward observed only conrrespond to the action taken (feedback).
  • Action does not alter the environment.
  • Taking an action does not restrict future actions.

Furthermore, these algorithms differ in their requirement of the time horizon n and suboptimality gap \Delta, and in their scaling of regret bounds.

Algorithm Environment Requirement Regret Scaling
Explore-then-Commit (ETC) Unstructured / Stochastic \Delta, n R_{n} = \frac{4}{n}  ln(n) + C
Epsilon-Greedy (E-Greedy) Unstructured / Stochastic \Delta R_{n} = \frac{1}{\Delta }  ln(n) + C
Elimination Unstructured / Stochastic n R_{n} = \frac{C_{1}}{\Delta }  ln(n) + C_{2}
Upper-Confidence Bound (UCB) - Infinite Horizon Unstructured / Stochastic --- R_{n} = \frac{2}{\Delta }  ln(n)
Exponantiate-Explore-Exploit (Exp3) - Partial feedback Adversarial / Non-stochastic --- R_{n} = \sqrt{2nk  ln(k)}
Linear UCB (LinUCB) Contextual / Linear --- R_{n} = Cd \sqrt{n} ln(nL)
Thompson Sampling Structured --- R_{n} = ln(n)

For detailed background of each algorithm, refer to Bandit Algorithms (2020) by Lattimore & Szepesvari.

bandit-algorithms's People

Contributors

mgpopinjay avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.