Giter Site home page Giter Site logo

classification-using-logistic-regression-and-single-layer-neural-network's Introduction

University at Buffalo


Identification of Handwritten digit images using different classification algorithms such as Multi-class Logistic Regression, Single Hidden Layer Neural Network and Convolutional Neural Network.

Project Summary


We have trained our classification model on MNIST data using Multi-classLogistic Regression, Single Hidden Layer Neural Network, Convolutional Neural Network and predicted the labels of the digit images in both MNIST and USPS digit dataset.

Logistic Regression:

  • We trained the model and tuned the hyperparameter i.e. learning rate, by using our own implementation of Logistic regression, we achieved an accuracy of 91.56% on MNIST test images and 45.15% on USPS test images at learning rate of 0.14 and lambda (regulariser) value of 0.
  • Using tensorflow, we have achieved an accuracy of 92.41% on MNIST test images and 48.32% on USPS test images

Single hidden layer neural network:

  • We trained the model and tuned the hyperparameter i.e. learning rate and number of units in hidden layer, we achieved an accuracy of 97.76% on MNIST test images and 64.6% on USPS test images.

Convolutional Neural Network:

  • On training the model using CNN, we achieved an accuracy of 99.18% on MNIST test images and 75% on USPS test images.

No Free Lunch theorem remains valid:

  • Accuracies of the models on the USPS data were way lower than the accuracies they gave for the MNIST dataset. After seeing such performance of all the three training models on the USPS data set which were trained on the MNIST data set we can conclude that our model is not the best in the general way but performs well on the dataset in which it was trained. Our model needs to have the training knowledge of the USPS data in order to perform well on the USPS data set.

Approach


Logistic Regression

  • MNIST dataset files were downloaded from the website mentioned in the main.pdf have been read using the gzip library function of the python with the help of direction mentioned in this website.
  • Train and Test Images were then flattened into 2D numpy array, N x 784 size.
  • Train image data and Test image data have been then standardised with zero mean and unit standard deviation.
  • Previous two steps were performed similarly on the USPS test data.
  • Weight numpy array was generated valus of 1s and dimension of K x D, where K is the number of classes i.e., 10 and D is the number of data features plus 1 bias feature i.e., (784 + 1).
  • Column of 1s was added as the first column to both Train set and the Test sets of MNIST and USPS data making the total feature dimension of 785
  • One hot vectors of MNIST data labels and USPS data labels were created.
  • Separate functions for calculating the Cross Entropy error, Gradient descent, Softmax, and Predicting the image labels were created in the LRlibs.py file.
    • Cross Entropy error function was implemented as per the formula.
    • Gradient descent function was done using the pseudo code mentioned in the document provided along with the Project 3
    • Softmax function was implemented using the exp(x − max(x))/Σexp(x) formula.
    • Predict function was implemented by returning the column which had max of the W.dot(X) in each row.
  • Model was trained on the Training data images with epoch count of 200 and varying learning rate of 0.01 to 0.14. ​The accuracies and the cross entropy error have been tabulated in the tables 1.1 and 1.2 for both MNIST and USPS data sets.

Single Hidden Layer Neural Network

  • This has been implemented using TensorFlow
  • The model consists of one hidden layer and one output layer
  • In hidden layer, we have initialized 784x1024 weights and 1024 bias for 1024 units by random values. Then, multiplied the input with the weights and added bias to it. On this value, applied ReLU activation function.
  • The output of hidden layer is fed to the output layer. The output layer consists of 1024x10 weights and 10 bias for 10 units in this layer and are randomly initialized.
  • The output of the model is obtained by multiplying these weights with output from hidden layer and adding bias of output layer. Output will be a one-hot representation of the predicted label
  • We have trained the above model using AdamOptimizer with number of epochs = 20000 and input batch size = 50 in each epoch and chose that model which has the minimum cross-entropy error
  • We have tuned the hyperparameters such as number of units in hidden layer (784, 864, 944, 1024) and learning rate (0.01 to 0.05) and chose the model which has maximum accuracy on validation set
  • Then, ran the model on MNIST and USPS test data to get test accuracy.

Convolutional Neural Network

  • This is also implemented using TensorFlow
  • The model consists of 4 layers: Convolution layer 1 (convolution with 32 features applied on 5x5 patch of image + 2D max pooling), convolution layer 2 (convolution with 64 features applied on 5x5 patch of image + 2D max pooling), fully connected layer with 1024 neurons and ReLU activation function, logit layer with 10 neurons corresponding to 10 labels.
  • In fully connected layer, few neuron outputs are dropped out to prevent overfitting. The no_drop_prob placeholder makes sure that dropout occurs only during training and not during testing.
  • Trained the model using AdamOptimizer with learning rate set to 1e-4 and number of epochs= 20000 on input batches of size 50 and chose that model which has the minimum cross-entropy error.
  • Then, ran the model on MNIST and USPS test data to get test accuracy. USPS test set was divided.
  • No need of tuning of hyperparameter in CNN.

USPS data extraction:

  • While extracting USPS image features, we have to make it resemble to the MNIST image features as close as possible so that our trained model can classify the images correctly.
  • In order to do that, we followed below steps:
    • Resized the image to square shape i.e. width = height = max(width, height) it was done in such a way that the aspect ratio of the digits was not skewed.
    • Converted the image into grayscale.
    • Inverted the image pixels value, that is 255 - Image, black became white and vice-versa so has to follow same pixels values that of the MNIST images.
    • Resized the each image to 28x28
    • Normalized the image pixels with ([value-min]/[max-min]) formula
    • Flattened each image into 1x784 numpy array

Results


Logistic Regression

Hyperparameters = learning_rate

At epoch count = 200MNISTUSPS
Sr.Learning_RateRegulariserTraining (%)Val (%)Test (%)Test (%)
10.01184.8788.4885.9440.33
20.02184.92588.5485.9840.39
30.01087.1690.0687.8541.26
40.02088.8791.3689.341.95
50.03089.4791.9689.9442.56
60.04089.9492.3290.3942.96
70.05090.292.5890.5943.35
80.06090.4992.6890.8243.6
90.07090.792.790.8843.94
100.08090.8892.7890.9844.18
110.09091.0292.8691.1244.4
120.1091.1592.991.2244.51
130.11091.2592.9891.3544.65
140.12091.3793.191.444.84
150.13091.4893.2291.545.05
160.14091.5493.391.5645.15
170.15091.6293.3891.5545.32

Single Hidden Layer Neural Network

Hyperparameters = number of units in hidden layer, learning_rate

With epoch_count = 20000 and batch_size = 50MNISTUSPS
Sr.Number of units in hidden layerLearning RateTraining (%)Val (%)Test (%)Test (%)
17840.0199.2797.4297.3563.18
27840.0297.9396.5896.0161.63
37840.0395.0894.3293.8756.88
47840.0493.3793.0892.454.64
57840.0588.8788.4288.3747.09
68640.0199.3397.697.4665.25
78640.0297.3695.3295.7759.63
88640.0395.7294.8294.0956.85
98640.0492.0891.1690.9452.24
108640.0591.0790.290.4649.24
119440.0199.1397.7497.4764.2
129440.0297.996.7496.1261.74
139440.039594.1693.856.89
149440.0492.3992.1291.0750.29
159440.0591.6791.391.0950.27
1610240.0199.2597.5297.8164.6
1710240.0297.696.295.5759.59
1810240.0395.4694.294.1654.95
1910240.0490.6190.3889.9752.33
2010240.0589.5689.0888.8349.46

Logistic Regression

Output for Learning rate 0.07 Current learning rate is 0.070000

iteration 0/200: loss 2.303
iteration 10/200: loss 0.759
iteration 20/200: loss 0.583
iteration 30/200: loss 0.509
iteration 40/200: loss 0.468
iteration 50/200: loss 0.440
iteration 60/200: loss 0.421
iteration 70/200: loss 0.406
iteration 80/200: loss 0.394
iteration 90/200: loss 0.384
iteration 100/200: loss 0.376
iteration 110/200: loss 0.369
iteration 120/200: loss 0.362
iteration 130/200: loss 0.357
iteration 140/200: loss 0.352
iteration 150/200: loss 0.348
iteration 160/200: loss 0.344
iteration 170/200: loss 0.341
iteration 180/200: loss 0.338
iteration 190/200: loss 0.335
training set Accuracy is 0.907055
validation set Accuracy is 0.927000
Test set Accuracy is 0.908800
USPS set Accuracy is 0.439422

Logistic Regression (using TensorFlow):

Output for learning rate 0.5, number of epochs: 10000

The accuracy on MNIST test set: 92.41
The accuracy on USPS test set: 48.32

Single Hidden Layer Neural Network:

Output for learning rate 0.01, number of epochs: 20000, number of units in hidden layer: 784

MNIST validation accuracy: 97.42
MNIST test accuracy: 97.35
The accuracy on USPS test set: 63.18

Convolutional Neural Network:

Output for learning rate 1e-4, number of epochs: 20000

MNIST test accuracy: 99.18
The accuracy on USPS test set: 75.13

Documentation


Report and documentation can be found on this Documentation link

Folder Tree


  • Report contains summary report detailing our implementation and results.
  • code contains the source code of our machine learning algorithm
  • Materials contains the project related informative materials
  • Bonus contains source code of our machine learning algorithm using back-propogation.
  • proj3_images contains image data for training, validation and testing

Code outline

code:

  • logistic_main.py: Run this file for execution of logistic regression model without using tensorflow
  • logistic_tensorflow_main.py: Run this file for execution of logistic regression model using tensorflow. It creates the model, trains it, tests it on MNIST validation, test set and USPS test set
  • single_layer_NN_main.py: Run this file for execution of single hidden layer NN model using tensorflow. It creates the model, trains it, tests it on MNIST validation, test set and USPS test set
  • cnn_main2.py: Run this file for execution of CNN model using tensorflow. It creates the model, trains it, tests it on MNIST validation, test set and USPS test set
  • libs.py:
    • read_gz(images,labels): To read MNIST gz data
    • view_image(image, label=""): to view single image from the MNSIT data
    • yDash(trains_images, W): for performing the W.dot(X)
    • softmax(x): for calculating the softmax of each row in W.dot(X)
    • sgd(W, train_images, T, L2_lambda, epochNo, learning_rate): gradient descend for optimising the weights
    • cross_entropy(W, X, T, L2_lambda): cacluating the loss in the model
    • predict(W, X): predicting the labels from the output of the model
  • single_layer_NN_lib.py:
    • **create_single_hidden_layer_nn(number_hidden_units): create input layer, one hidden layer with specified number of neurons and output layer
  • cnn_lib.py:
    • weight_init(shape): Initialize weight variables
    • **bias_init(shape): Initialize bias variables
    • **convolution(x, W): convolves input with given weights and stride 1 and with zero padding
    • **maxpool(x): performs max pooling on window size of 2x2 and stride of 2 with zero padding
  • USPS_data_extraction.py:
    • make_square(im): To make the image with equal height and width
    • extract_usps_data(): Get USPS test images and labels. Usps_test_images is a Nx784 numpy array and Usps_test_labels is Nx10 numpy array (one-hot representation)

Bonus:This zip folder consists of implementation of Single Hidden Layer NN model using backpropagation

  • main.py: Run this file for execution of single hidden layer NN model using back propogation. It creates the model, trains it, tests it on MNIST validation, test set and USPS test set
  • SNlibs.py:
    • read_gz(images,labels): To read MNIST gz data
    • view_image(image, label=""): to view single image from the MNSIT data
    • softmax(x): for calculating the softmax of each row in W.dot(X)
    • calculate_loss(model, X,y, reg_lambda): Helper function to evaluate the total loss on the dataset
    • cross_entropy(W, X, T, L2_lambda): cacluating the loss in the model
    • predict(W, X): predicting the labels from the output of the model
    • build_model(nn_hdim, num_passes, X, y, reg_lambda, learning_rate, T): This function learns parameters for the neural network and returns the model
  • USPS_data_extraction.py:
    • make_square(im): To make the image with equal height and width
    • extract_usps_data(): Get USPS test images and labels. Usps_test_images is a Nx784 numpy array and Usps_test_labels is Nx10 numpy array (one-hot representation)

Contributors


Instructor


  • Prof. Sargur N. Srihari

Teaching Assistants


  • Jun Chu
  • Tianhang Zheng
  • Mengdi Huai

References


License


This project is open-sourced under MIT License

classification-using-logistic-regression-and-single-layer-neural-network's People

Contributors

jayantsolanki avatar swatishr avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.