Giter Site home page Giter Site logo

cifar10_image_recognition's Introduction

EVA6_Session7_Advanced_Concepts

Time to try our hands on something more than just digits. How about some cars ... planes ... maybe a few animals here and there? Welcome to our experimentation of Advanced Concepts using CIFAR10 dataset.

Topics

Understanding the CIFAR-10 dataset

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

Here are the classes in the dataset, as well as 10 random images from each:

image

The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.

Source: https://www.cs.toronto.edu/~kriz/cifar.html

Concept Time!

Dilated Convolution

dilated_convolution

Source: Rohan Shravan

Dilated convolution is a way of increasing the receptive view (global view) of the network exponentially and linear parameter accretion. With this purpose, it finds usage in applications thats care more about integrating the knowledge of the wider context with less cost.

The key application the dilated convolution authors have in mind is a dense prediction:vision applications where the predicted object has a similar size and structure to the input image. For example, semantic segmentation with one label per pixel; image super-resolution, denoising, demosaicing, bottom-up saliency, keypoint detection, etc.

In many such applications one wants to integrate information from different spatial scales and balance two properties:

โˆ™ local, pixel-level accuracy, such as precise detection of edges, and

โˆ™ integrating the knowledge of the wider, global context

image

Source: Rohan Shravan

image

Source: Rohan Shravan

Depthwise Separable Convolution

image

Source: Rohan Shravan

Objectives

  • A GPU based code with Model architecture of C1C2C3C40 (No MaxPooling, but 3 3x3 layers with stride of 2 instead. It would be a bonus if we can figure out how to use Dilated kernels instead of MP or strided convolution)
  • Total Receptive Field of more than 52
  • Two of the layers must use Depthwise Separable Convolution
  • One of the layers must use Dilated Convolution
  • use GAP (compulsory mapped to # of classes):- CANNOT add FC after GAP to target # of classes
  • Use albumentation library and apply:
    • Horizontal flip
    • shiftScaleRotate
    • coarseDropout (max_holes = 1, max_height=16px, max_width=1, min_holes = 1, min_height=16px, min_width=16px, fill_value=(mean of your dataset), mask_fill_value = None)
    • grayscale
  • Minimun 87% Test Accuracy
  • Total Parameters below 100K

Code Structure

Code is split into different modules(as it should be!). If you are looking for the final notebook, you can find it here.

  • dataset contains the code for data downloading, prepping and preprocessing. You can find code related to transformations and augmentations here.

    • dataset.py: Data loading and processing code is here.
  • models will take you to our modelling directory which contains code for our network structure and the training and testing modules.

  • utils has code for our visualization needs.

    • plots.py: Visualization for Train, Test logs and sample images.
  • CIFAR10_Image_Recognition.ipynb is the one notebook to rule them all! To see the final results of experiments.

Logs

Model Summary

image

Training and Validation Loss

image

Training and Validation Accuracy

image

Conclusions and notes

Objectives Achieved

  • A GPU based code with Model architecture of C1C2C3C40 (No MaxPooling, but 3 3x3 layers with stride of 2 instead. It would be a bonus if we can figure out how to use Dilated kernels instead of MP or strided convolution)
    • Dilated Convolution in place of Max Pooling Achieved!
  • Total Receptive Field of more than 52: Receptive Field of 107 achieved
  • Two of the layers must use Depthwise Separable Convolution
  • One of the layers must use Dilated Convolution
  • use GAP (compulsory mapped to # of classes):- CANNOT add FC after GAP to target # of classes
  • Use albumentation library and apply:
    • Horizontal flip
    • shiftScaleRotate
    • coarseDropout (max_holes = 1, max_height=16px, max_width=1, min_holes = 1, min_height=16px, min_width=16px, fill_value=(mean of your dataset), mask_fill_value = None)
    • greyscale
  • Minimun 87% Test Accuracy: Achieved max of 89.35%
  • Total Parameters below 100K: 96,436 Parameters

Notes:

  • In place of Max pooling, we have employed a "Depthwise Convolution" with kernel size of 3 and stride of 2, which reduced the channel size to half.
  • The usage of Depthwise Convolution greatly reduced the number of parameters required as there is only one depth filter for each input channel.

Collaborators

Abhiram Gurijala
Arijit Ganguly
Rohin Sequeira

cifar10_image_recognition's People

Contributors

arijit-datascience avatar rohinsequeira avatar abhiram-ds avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.