Giter Site home page Giter Site logo

dhschall / binary-neural-networks Goto Github PK

View Code? Open in Web Editor NEW

This project forked from akshaychawla/binary-neural-networks

0.0 0.0 0.0 16.05 MB

Exploring "Binary Neural Networks" (https://arxiv.org/abs/1602.02830) in Theano. A set of experiments that use binarised weights and/or activations to reduce computational load of convolutional neural networks.

Python 100.00%

binary-neural-networks's Introduction

Note: This is still an ongoing project, changes and results will keep appearing as experiments are run

Binary Neural Networks

An attempt to recreate neural networks where the weights and activations are binary variables. This is a common underlying theme of the papers BinaryConnect, Binarized neural networks and XNOR-Net. The goal is to free these high performing deep learning models from the shackles of a supercomputer (read: GPUs) and bring them to edge devices which typically have a much lower memory footprint and limited computation capabilities.

Quantize

NOTE TO SELF: With shallow cifar, test accuracy was 63%.

Requirements

  1. Theano (0.9.0 or higher)
  2. Python 2.7
  3. numpy
  4. tqdm (awesome progbars)
  5. tensorboard (logging)

Idea

  1. Regularization: Binarization of weights is a form of noise for the system. Hence just like dropout, binarization can act as a regularizer.
  2. Discretization is a form of corruption, we can make discretization errors in different weights but the randomness of this corruption cancels out these errors.
  3. Perform forward and backward pass using binarized form of weights, however keep a second set of weights (using full precision fp32) for gradient update. This is because SGD makes infinitesimal changes that would be lost due to binarization. For forward pass, sample a set of binary weights from full precision weights using deterministic or stochastic binarization.
  4. Deterministic binarization: Wb = +1 if W >=0 ; else -1.
  5. Stochastic binarization: (TODO)
  6. Clip weights if they exceed +1/-1 as they do not contribute to the network due to the binarization.

Experiment 1: MNIST

In this experiment a baseline and binarized version of a standard MLP is trained on the MNIST dataset. In order to have a fair comparison between the two networks, the data is not augmented and dropout is applied to the baseline network to give it a fair chance against the binary network (as the binarization acts as a regularizer). The learning rate is set at a constant 0.001 and both networks are trained for 200 epochs with a batch size of 256. The training loss and validation accuracy is visualized in tensorboard using this gist. The basic architecture is as follows:

architecture

Results

1. Baseline

Drawing

Test accuracy: 0.9687

2. Binarized

Drawing

Test accuracy: 0.9372

Experiment 2: CIFAR 10

Results

(TODO)

Why?

This repository is the result of my curiosity to fix computation problems that arise with the deployment of (very big) neural networks. I am fascinated by recent research that has come up with effective ways to compress, prune and/or quantize deep networks so that they run in resource constrained environments like ARM chips. This is a (tiny) step in that direction.

binary-neural-networks's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.