Multi-Armed Bandits

This module implements Multi-Armed Bandit algorithms as described in John Myles White's book, Bandit Algorithms for Website Optimization

Bandit algorithms help when determining the highest average reward amongst multiple options without wasting time on bad choices.

Traditional A/B testing uses a statistical sampling of results after the fact; this is an exploratory approach that tries different options. Unfortunately, this approach ignores exploiting known good choices. These algorithms balance the desire to explore for new, better options while still earning as much reward as possible by using known good options.

This code is derived from code samples provided in Bandit Algorithms for Website Optimization. Improvements include documentation and simplified updates to make code more "pythonic".

Tests are included to generate test data using each of the implemented algorithms. The data is saved in this directory as tab-separated-values.

For convenience(?), the generate_plots.sh script will execute the included R scripts to provided comparative graphs of each algorithm's performance.

A minimal R setup should install the ggplot2 and plyr libraries.

# Generate data and graphs
$ ./generate_plots.sh

A comparison of the accuracy of the algorithms:

Usage

How can I use this?

The only way to know whether an algorithm is suitable is to exercise it under a variety of circumstances. Algorithms vary in performance depending upon:

How long you intend to run the experiment
How many options (arms) are available
Probabilities of reward from a particular option (arm)
Initial expectations of the success of a particular option

The testing framework includes 3 basic tests that can be used as a starting point to develop tests that simulate the types of rewards you expect to see.

What to test?

Click-through rates or user signups are good examples. These are binary "rewards", meaning a user either clicked-through or did not, signed up, or did not. An "arm" is an option in the software that is explored or exploited and the reward is a successful result. If a particular arm yields consistent rewards, a good algorithm should exploit that more frequently than a random coin flip.

A concrete strategy might include creating a mapping of options and assigning expected reward values. In this example, there are 2 options, and we initialize the 'green-logo' with a value of 0.9, "seeding" the algorithm with the expectation that the green logo performs a little better than the unknown red logo. An exploitative algorithm will choose the green-logo more frequently than the red (at least until/if the red logo begins to perform better than the green).

arms = {
    'red-logo': 0.0,
    'green-logo': 0.9
}
Algo(values=arms.values())

How long

Each simulation selects an arm 500 times by default. You can alter this value to see how an algorithm performs the more opportunities you provide to select an option.

How many options (arms) and expected rewards

The default is to consider 5 arms. 4 arms output a reward 10% of the time, while 1 arm (the best arm) outputs a reward 90% of the time.

A test can pass a list of arm probabilities to the simulator to tweak bot the number of arms and the relative probabilities that any specific arm will return a reward.

Initial expectations

The algorithms are initialized with a pessimistic expectation that all arms are unfamiliar. The algorithm can be constructed with options that may be initialized with higher expectations of reward for some arms

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
r		r
README.md		README.md
__init__.py		__init__.py
bandits.py		bandits.py
generate_plots.sh		generate_plots.sh
test_bandits.py		test_bandits.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Armed Bandits

Usage

What to test?

How long

How many options (arms) and expected rewards

Initial expectations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Multi-Armed Bandits

Usage

What to test?

How long

How many options (arms) and expected rewards

Initial expectations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages