Giter Site home page Giter Site logo

variational-bayesian's Introduction

FAQ:

Status: can replicate the ordering of the variances, but the numbers don't quite match yet.

This is a replication of the pre-print paper Variational Dropout and the Local Reparameterization Trick by Diederik Kingma, Tim Salimans and Max Welling.

The code is written using Theano and Lasagne, following Lasagne layer conventions so that it should be modular enough to use elsewhere. Instructions for how to replicate results are below.

Installation

The requirements listed in the requirements.txt are only what's required to install this package so you can use it as a module. They aren't sufficient to actually run all of the scripts and notebooks. In addition, you will need:

ipython[notebook]
holoviews
holo-nets
pandas
seaborn

There is a Dockerfile accompanying this repository, which can be pulled from Docker Hub. It's based on the IPython Scipyserver image. To run the image with a self-signed certificate you can do the following; first pull the image:

docker pull gngdb/variational-dropout

Then clone this repository so it can be mounted in this container and cd into it.

git clone https://github.com/gngdb/variational-dropout.git
cd variational-dropout

Then run the container with the following command (choosing a password):

docker run -d -p 443:8888 -e PASSWORD=<CHOOSE A PASSWORD> -v $PWD:/variational-dropout gngdb/variational-dropout

Now you can just navigate to https://localhost to use your notebook. Unfortunately, this has no support for CUDA or GPUs (although it is possible to do this inside a container) so any of the experiment scripts will take a very long time to run. They're not completely unworkable on a reasonable desktop though.

Finally, in order to run scripts or use most of the notebooks you must install the package in develop mode. Open a terminal on the Jupyter server (or otherwise get a shell inside the container):

python setup.py develop

Replicating Results

There are practically just two parts of the paper we'd like to be able to reproduce:

  • Table 1 - showing empirical variance estimates of the method versus other methods.
  • Figure 1 - showing performance in terms of percentage error on the test set for the following:
    • No dropout
    • Regular binary dropout
    • Gaussian dropout A (Srivastava et al)
    • Variational dropout A
    • Variational dropout A2
    • Gaussian dropout B (Wang et al)
    • Variational dropout B

Once this is done, we'd like to look at the adaptive gradients in a bit more detail (there doesn't appear to have been space in the paper to discuss them more) and see what kind of properties they have.

The following graphs are attempting to reproduce Figure 1, and we can see a similar progression for Variational Dropout A and A2, getting better performance. In this case A has performed better than A2, which is not what we see in the paper.

figure1a

figure1b

These graphs are produced in the notebook called Opening Results and the results are by running the scripts in the experiments directory.

The following are the results reproduced for Table 1 in the paper. The ordering of the variances is approximately correct, but the variances increase after training to 100 epochs, which is likely a bug. Also, the difference between the estimators is not as great as in the paper:

stochastic gradient estimator top 10 bottom 10 top 100 bottom 100
local reparameterization 2.4e+04 6.1e+02 2.5e+05 3e+03
separate weight samples 4.8e+04 1.2e+03 4.9e+05 8.2e+03
single weight sample 5.8e+04 1.5e+03 4.7e+05 6.8e+03
no dropout 1.5e+04 5.5e+02 1.4e+05 2.7e+03

These are produced in the notebook Comparing Empirical Variance.

Finally, there is the notebook Investigating Adaptive Properties, which includes the following image showing the alpha parameters (noise standard deviations) over the MNIST dataset. It's nice to see that it learns to ignore the edges:

ignore

variational-bayesian's People

Contributors

ihaeyong avatar

Stargazers

heyuan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.