Giter Site home page Giter Site logo

digit-classifier's Introduction

MNIST Handwritten Digit Classifier

An implementation of multilayer neural network using numpy library. The implementation is a modified version of Michael Nielsen's implementation in Neural Networks and Deep Learning book.

Brief Background:

If you are familiar with basics of Neural Networks, feel free to skip this section. For total beginners who landed up here before reading anything about Neural Networks:

Sigmoid Neuron

  • Neural networks are made up of building blocks known as Sigmoid Neurons. These are named so because their output follows Sigmoid Function.
  • xj are inputs, which are weighted by wj weights and the neuron has its intrinsic bias b. The output of neuron is known as "activation ( a )".

Note: There are other functions in use other than sigmoid, but this information for now is sufficient for beginners.

  • A neural network is made up by stacking layers of neurons, and is defined by the weights of connections and biases of neurons. Activations are a result dependent on a certain input.

Why a modified implementation ?

This book and Stanford's Machine Learning Course by Prof. Andrew Ng are recommended as good resources for beginners. At times, it got confusing to me while referring both resources:

MATLAB has 1-indexed data structures, while numpy has them 0-indexed. Some parameters of a neural network are not defined for the input layer, so there was a little mess up in mathematical equations of book, and indices in code. For example according to the book, the bias vector of second layer of neural network was referred as bias[0] as input layer (first layer) has no bias vector. I found it a bit inconvenient to play with.

I am fond of Scikit Learn's API style, hence my class has a similar structure of code. While theoretically it resembles the book and Stanford's course, you can find simple methods such as fit, predict, validate to train, test, validate the model respectively.

Naming and Indexing Convention:

I have followed a particular convention in indexing quantities. Dimensions of quantities are listed according to this figure.

Small Labelled Neural Network

Layers

  • Input layer is the 0th layer, and output layer is the Lth layer. Number of layers: NL = L + 1.
sizes = [2, 3, 1]

Weights

  • Weights in this neural network implementation are a list of matrices (numpy.ndarrays). weights[l] is a matrix of weights entering the lth layer of the network (Denoted as wl).
  • An element of this matrix is denoted as wljk. It is a part of jth row, which is a collection of all weights entering jth neuron, from all neurons (0 to k) of (l-1)th layer.
  • No weights enter the input layer, hence weights[0] is redundant, and further it follows as weights[1] being the collection of weights entering layer 1 and so on.
weights = |¯   [[]],    [[a, b],    [[p],   ¯|
          |              [c, d],     [q],    |
          |_             [e, f]],    [r]]   _|

Biases

  • Biases in this neural network implementation are a list of one-dimensional vectors (numpy.ndarrays). biases[l] is a vector of biases of neurons in the lth layer of network (Denoted as bl).
  • An element of this vector is denoted as blj. It is a part of jth row, the bias of jth in layer.
  • Input layer has no biases, hence biases[0] is redundant, and further it follows as biases[1] being the biases of neurons of layer 1 and so on.
biases = |¯   [[],    [[0],    [[0]]   ¯|
         |     []],    [1],             |
         |_            [2]],           _|

'Z's

  • For input vector x to a layer l, z is defined as: zl = wl . x + bl
  • Input layer provides x vector as input to layer 1, and itself has no input, weight or bias, hence zs[0] is redundant.
  • Dimensions of zs will be same as biases.

Activations

  • Activations of lth layer are outputs from neurons of lth which serve as input to (l+1)th layer. The dimensions of biases, zs and activations are similar.
  • Input layer provides x vector as input to layer 1, hence activations[0] can be related to x - the input training example.

Execution of Neural network

#to train and test the neural network algorithm, please use the following command
python main.py

digit-classifier's People

Contributors

chrisantaki avatar kdexd avatar manojbalaji1 avatar radarhere avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

digit-classifier's Issues

Dataset

Hi
wich dataset did you use? I am getting this error
File "main.py", line 20, in
training_data, validation_data, test_data = load_mnist()
File "D:\Downloads\digit-classifier-master\digit-classifier-master\collect.py", line 15, in load_mnist
training_data, validation_data, test_data = pickle.load(data_file)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)

I downloaded the dataset that is in the repo from the book

What kind of accuracy do you get?

After 10 epochs, I have a very poor accuracy, which is not far from randomly guessing the number ;)

Epoch 1, accuracy 8.62 %.
Epoch 2, accuracy 11.36 %.
Epoch 3, accuracy 11.88 %.
Epoch 4, accuracy 12.1 %.
Epoch 5, accuracy 12.77 %.
Epoch 6, accuracy 13.16 %.
Epoch 7, accuracy 13.38 %.
Epoch 8, accuracy 13.56 %.
Epoch 9, accuracy 13.79 %.
Epoch 10, accuracy 13.97 %.
Test Accuracy: 14.6%

Isn't it because the hidden layers are too small?

layers = [784,3,4,10]

I have seen examples with layer sizes [784,256,256,10], and the accuracy was closer to 90%.

(But probably this would be too slow? Is it the reason why you chose so few hidden nodes?)

License

Hey,

Could you please specify the license of the code?

Maciek

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.