Giter Site home page Giter Site logo

afagarap / cnn-svm Goto Github PK

View Code? Open in Web Editor NEW
339.0 11.0 128.0 64.65 MB

An Architecture Combining Convolutional Neural Network (CNN) and Linear Support Vector Machine (SVM) for Image Classification

Home Page: https://arxiv.org/abs/1712.03541

License: Apache License 2.0

Python 98.53% Shell 1.47%
artificial-intelligence artificial-neural-networks classification convolutional-neural-networks deep-learning machine-learning support-vector-machine supervised-learning softmax-layer tensorflow

cnn-svm's Introduction

An Architecture Combining Convolutional Neural Network (CNN) and Linear Support Vector Machine (SVM) for Image Classification

DOI PyPI

This project was inspired by Y. Tang's Deep Learning using Linear Support Vector Machines (2013).

The full paper on this project may be read at arXiv.org.

Abstract

Convolutional neural networks (CNNs) are similar to "ordinary" neural networks in the sense that they are made up of hidden layers consisting of neurons with "learnable" parameters. These neurons receive inputs, performs a dot product, and then follows it with a non-linearity. The whole network expresses the mapping between raw image pixels and their class scores. Conventionally, the Softmax function is the classifier used at the last layer of this network. However, there have been studies (Alalshekmubarak and Smith, 2013; Agarap, 2017; Tang, 2013) conducted to challenge this norm. The cited studies introduce the usage of linear support vector machine (SVM) in an artificial neural network architecture. This project is yet another take on the subject, and is inspired by (Tang, 2013). Empirical data has shown that the CNN-SVM model was able to achieve a test accuracy of ~99.04% using the MNIST dataset (LeCun, Cortes, and Burges, 2010). On the other hand, the CNN-Softmax was able to achieve a test accuracy of ~99.23% using the same dataset. Both models were also tested on the recently-published Fashion-MNIST dataset (Xiao, Rasul, and Vollgraf, 2017), which is suppose to be a more difficult image classification dataset than MNIST (Zalandoresearch, 2017). This proved to be the case as CNN-SVM reached a test accuracy of ~90.72%, while the CNN-Softmax reached a test accuracy of ~91.86%. The said results may be improved if data preprocessing techniques were employed on the datasets, and if the base CNN model was a relatively more sophisticated than the one used in this study.

Usage

First, clone the project.

git clone https://github.com/AFAgarap/cnn-svm.git/

Run the setup.sh to ensure that the pre-requisite libraries are installed in the environment.

sudo chmod +x setup.sh
./setup.sh

Program parameters.

usage: main.py [-h] -m MODEL -d DATASET [-p PENALTY_PARAMETER] -c
               CHECKPOINT_PATH -l LOG_PATH

CNN & CNN-SVM for Image Classification

optional arguments:
  -h, --help            show this help message and exit

Arguments:
  -m MODEL, --model MODEL
                        [1] CNN-Softmax, [2] CNN-SVM
  -d DATASET, --dataset DATASET
                        path of the MNIST dataset
  -p PENALTY_PARAMETER, --penalty_parameter PENALTY_PARAMETER
                        the SVM C penalty parameter
  -c CHECKPOINT_PATH, --checkpoint_path CHECKPOINT_PATH
                        path where to save the trained model
  -l LOG_PATH, --log_path LOG_PATH
                        path where to save the TensorBoard logs

Then, go to the repository's directory, and run the main.py module as per the desired parameters.

cd cnn-svm
python3 main.py --model 2 --dataset ./MNIST_data --penalty_parameter 1 --checkpoint_path ./checkpoint --log_path ./logs

Results

The hyperparameters used in this project were manually assigned, and not through optimization.

Hyperparameters CNN-Softmax CNN-SVM
Batch size 128 128
Learning rate 1e-3 1e-3
Steps 10000 10000
SVM C N/A 1

The experiments were conducted on a laptop computer with Intel Core(TM) i5-6300HQ CPU @ 2.30GHz x 4, 16GB of DDR3 RAM, and NVIDIA GeForce GTX 960M 4GB DDR5 GPU.

Figure 1. Training accuracy (left) and loss (right) of CNN-Softmax and CNN-SVM on image classification using MNIST.

The orange plot refers to the training accuracy and loss of CNN-Softmax, with a test accuracy of 99.22999739646912%. On the other hand, the blue plot refers to the training accuracy and loss of CNN-SVM, with a test accuracy of 99.04000163078308%. The results do not corroborate the findings of Tang (2017) for MNIST handwritten digits classification. This may be attributed to the fact that no data preprocessing nor dimensionality reduction was done on the dataset for this project.

Figure 2. Training accuracy (left) and loss (right) of CNN-Softmax and CNN-SVM on image classification using Fashion-MNIST.

The red plot refers to the training accuracy and loss of CNN-Softmax, with a test accuracy of 91.86000227928162%. On the other hand, the light blue plot refers to the training accuracy and loss of CNN-SVM, with a test accuracy of 90.71999788284302%. The result on CNN-Softmax corroborates the finding by zalandoresearch on Fashion-MNIST.

Citation

To cite the paper, kindly use the following BibTex entry:

@article{agarap2017architecture,
  title={An Architecture Combining Convolutional Neural Network (CNN) and Support Vector Machine (SVM) for Image Classification},
  author={Agarap, Abien Fred},
  journal={arXiv preprint arXiv:1712.03541},
  year={2017}
}

To cite the repository/software, kindly use the following BibTex entry:

@misc{abien_fred_agarap_2017_1098369,
  author       = {Abien Fred Agarap},
  title        = {AFAgarap/cnn-svm v0.1.0-alpha},
  month        = dec,
  year         = 2017,
  doi          = {10.5281/zenodo.1098369},
  url          = {https://doi.org/10.5281/zenodo.1098369}
}

License

Copyright 2017-2020 Abien Fred Agarap

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

cnn-svm's People

Contributors

afagarap avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cnn-svm's Issues

ModuleNotFoundError: No module named 'tensorflow.examples'

Hi,

I cant run the code with the following command:
python3 main.py --model 2 --dataset ./MNIST_data --penalty_parameter 1 --checkpoint_path ./checkpoint --log_path ./logs
I would get the error below:
"ModuleNotFoundError: No module named 'tensorflow.examples'"

How can I solve the Issue?

max nargin decision problem in code

In cnn-svm/model/cnn_svm.py , output of one versus rest svm is written as :
output = tf.identity(tf.sign(output), name='prediction')
correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y_input, 1))

tf.sign(output) seems incorrect. It should be removed. SVM here use max margin decision. When output margins of multiple binary linear SVM is converted to 1 and -1, tf.argmax(output, 1) may return a false value, because more than one binary linear SVM's output may be converted to 1 as they may all output a positive margin.

pt_cnn_svm

Why didn't you provide the hinge loss version?

Not able to create diretcory

Hello i have tried to run your code but it is not able to create log directory. i have created manually and pass path of that but then also it is showing the same error.
"tensorflow.python.framework.errors_impl.InvalidArgumentError: Failed to create a directory: C:/Users/Anamika/Downloads/cnn-svm-master/cnn-svm-master/logsThu Aug 9 18:32:45 2018-training; Invalid argument"

Can't run the training on the model for CNNSVM

Hi, I am trying to run the code however I am getting this error.

InvalidArgumentError: You must feed a value for placeholder tensor 'input_4/x_input' with dtype float and shape [?,784]
[[{{node input_4/x_input}}]]

Some questions about the code of the svm prediction part

This is the predicted output part of the svm network:
output = tf.identity(tf.sign(output), name='prediction')
correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y_input, 1))

After the output is processed by the function tf.sign(),a two-category tag that is processed into (1,-1).This is not predictable for multi-class handwritten data sets.
I want to ask you this question.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.