goodboyanush / iiit-bangalore-march-april-2019 Goto Github PK

View Code? Open in Web Editor NEW

15.0 5.0 2.0 55.83 MB

Course notes, some coding, assignments, and etc.

License: Apache License 2.0

Jupyter Notebook 100.00%

iiit-bangalore-march-april-2019's Introduction

Visual Recognition Course (March April 2019)

Basic Info

Where: IIIT-Bangalore
When: March/April 2019
Who: Anush Sankaran, IBM Research AI (co-instructed with Prof. Dinesh Babu Jayagopi)

Course Overview

Date	Topic	Content	Slides	Notes
22nd March, 2019 (Friday)	Introduction to Image Classification, Neural Networks, and Optimization	- What is visual recognition? - Logistic regression - Stochastic Gradient Descent - Multilayer perceptron - Backpropagation - DL + ML Pipeleine	slides
30th March, 2019 (Saturday)	Unsupervised Feature Learning, Autoencoders, Convolutional Neural Networks	- Popular applications of DL - Stacked autoencoders - Convolution & Pooling layers - Convolutional autoencoder	slides	Notebook
6th April, 2019 (Saturday)	Hyper-parameter optimization, Training Process	Convolutional neural network - One time model setup - Hyper-parameter optimization	slides	Notebook
10th April, 2019 (Wednesday)	Different CNN Architectures	Data Augmentation - Transfer Learning - Comparison of Different CNN Architectures - Watson Studio Hands-on	slides	Watson Studio: How To
20th April, 2019 (Saturday)	Generative Modelling	Unsupervised learning - Distribution fitting - PixelRNN/CNN - Variational Autoencoder (VAE) - Generative Adversarial Network (GAN) - Open source GAN toolkit	slides	Open source GAN Toolkit
27th April, 2019 (Saturday)	CNN Visualization and Face Recognition	Neuron Visualization - Guided BackProp - Grad-CAM - Face Classification - Face Generation - DeepFake - Model Trust	slides

Acknowledgement

References and they have better slides! With huge respects to their slides, hard work, and efforts, I acknowledge them and only makes sense to reuse some part of their slides!

Book on “Deep Learning” (https://www.deeplearningbook.org/ )
CS231n: Convolutional Neural Networks for Visual Recognition (http://vision.stanford.edu/teaching/cs231n/index.html )
CS 6501-004: Deep Learning for Visual Recognition (http://vicenteordonez.com/deeplearning/ )
ECE 6504 Deep Learning for Perception (https://computing.ece.vt.edu/~f15ece6504/ )

iiit-bangalore-march-april-2019's People

Contributors

Stargazers

Watchers

Forkers

satu0king amitpatra

iiit-bangalore-march-april-2019's Issues

Assignment #2 updates: New model and some code

import keras
from keras.layers import Dense, Conv2D, BatchNormalization, Activation
from keras.layers import AveragePooling2D, Input, Flatten
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, LearningRateScheduler
from keras.callbacks import ReduceLROnPlateau
from keras.preprocessing.image import ImageDataGenerator
from keras.regularizers import l2
from keras import backend as K
from keras.models import Model

def resnet_layer(inputs,
                 num_filters=16,
                 kernel_size=3,
                 strides=1,
                 activation='relu',
                 batch_normalization=True,
                 conv_first=True):
    """2D Convolution-Batch Normalization-Activation stack builder

    # Arguments
        inputs (tensor): input tensor from input image or previous layer
        num_filters (int): Conv2D number of filters
        kernel_size (int): Conv2D square kernel dimensions
        strides (int): Conv2D square stride dimensions
        activation (string): activation name
        batch_normalization (bool): whether to include batch normalization
        conv_first (bool): conv-bn-activation (True) or
            bn-activation-conv (False)

    # Returns
        x (tensor): tensor as input to the next layer
    """
    conv = Conv2D(num_filters,
                  kernel_size=kernel_size,
                  strides=strides,
                  padding='same',
                  kernel_initializer='he_normal',
                  kernel_regularizer=l2(1e-4))

    x = inputs
    if conv_first:
        x = conv(x)
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
    else:
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
        x = conv(x)
    return x


def resnet_v1(input_shape, depth, num_classes=10):
    """ResNet Version 1 Model builder [a]

    Stacks of 2 x (3 x 3) Conv2D-BN-ReLU
    Last ReLU is after the shortcut connection.
    At the beginning of each stage, the feature map size is halved (downsampled)
    by a convolutional layer with strides=2, while the number of filters is
    doubled. Within each stage, the layers have the same number filters and the
    same number of filters.
    Features maps sizes:
    stage 0: 32x32, 16
    stage 1: 16x16, 32
    stage 2:  8x8,  64
    The Number of parameters is approx the same as Table 6 of [a]:
    ResNet20 0.27M
    ResNet32 0.46M
    ResNet44 0.66M
    ResNet56 0.85M
    ResNet110 1.7M

    # Arguments
        input_shape (tensor): shape of input image tensor
        depth (int): number of core convolutional layers
        num_classes (int): number of classes (CIFAR10 has 10)

    # Returns
        model (Model): Keras model instance
    """
    if (depth - 2) % 6 != 0:
        raise ValueError('depth should be 6n+2 (eg 20, 32, 44 in [a])')
    # Start model definition.
    num_filters = 16
    num_res_blocks = int((depth - 2) / 6)

    inputs = Input(shape=input_shape)
    x = resnet_layer(inputs=inputs)
    # Instantiate the stack of residual units
    for stack in range(3):
        for res_block in range(num_res_blocks):
            strides = 1
            if stack > 0 and res_block == 0:  # first layer but not first stack
                strides = 2  # downsample
            y = resnet_layer(inputs=x,
                             num_filters=num_filters,
                             strides=strides)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters,
                             activation=None)
            if stack > 0 and res_block == 0:  # first layer but not first stack
                # linear projection residual shortcut connection to match
                # changed dims
                x = resnet_layer(inputs=x,
                                 num_filters=num_filters,
                                 kernel_size=1,
                                 strides=strides,
                                 activation=None,
                                 batch_normalization=False)
            x = keras.layers.add([x, y])
            x = Activation('relu')(x)
        num_filters *= 2

    # Add classifier on top.
    # v1 does not use BN after last shortcut connection-ReLU
    x = AveragePooling2D(pool_size=8)(x)
    y = Flatten()(x)
    outputs = Dense(num_classes,
                    activation='softmax',
                    kernel_initializer='he_normal')(y)

    # Instantiate model.
    model = Model(inputs=inputs, outputs=outputs)
    return model


model = resnet_v1(input_shape=(32,32,3), depth=20)
model.summary()

Note: Inceptionv3 can be used with CIFAR-10, by resizing the CIFAR-10 image size to (139,139,3). But instead of doing that,
Please use the above model for your Assignment number 2. This model does not have pre-trained weights from Imagenet. Hence, you need to train this model from scratch. Training from scratch will help you to

Please try exploring the following hyperparameter/ training strategies:

Layer activation function
Data preprocessing, normalisation
Dropout (where to apply, what is the strength)
BatchNorm (where to apply)
Loss function
Learning rate and learning strategy

Also, bonus marks for trying,

Data augmentation
Transfer learning

Report the final best set of hyper parameters set that gave you the best accuracy.

You will be evaluated on what values you explored for each of the parameter and in what order? Please do explain how you baby-sit the learning process.

Please note that there is no single correct answer to this question.

Assignment 1 Part B

Hi Anush,

For converting a random noisy image to MNIST image is it necessary to start with (4, 4, 8) image or can we choose some other starting dimension.

Also a simple three layer decoder using (4, 4, 8) input is not able to reproduce anything even after 100 epochs of training. Can we use the concepts learnt in the last lecture to modify our decoder to something more powerful!

Request for feedback

As this is my first formal teaching experience, your feedback on the plus and minus of my teaching would really help me a lot. Especially, it would help in teaching your batch itself in the following semester.
Please be critcial on the review and no kind of feedback will be taken in the negative sense.

Assignment 3 Question 1

For the question 1, I do not understand the binary classification part.
What I understand is that we do face recognition using CNN and then at test time pick two images and classify them to the two classes by either checking the labels or through some distance metrics and using threshold on the features extracted. Is my understanding right?

Assignment 1 part A

Hi Anush,
I went through the keras implementation of the Upsampling layer. It simply says that it repeats the rows and columns values.
Here is the link by keras team.
https://github.com/keras-team/keras/blob/8ed57c168f171de7420e9a96f9e305b8236757df/keras/layers/convolutional.py#L1552
Although I implemented the same thing, it still does not give me the right output.
Any suggestions ?

Assignment2

To use GPU powered notebooks, I couldn't find option for that in the Environment tab.
Attaching the screenshot.

Assignment 2 - Pre trained weights

Hi Anush,

Just a small clarification needed.

Are we allowed to use the pre-trained weights for the Inception V3 network (weights for imagenet dataset are available in keras) and do transfer learning -or- should we just use the architecture and train the model from scratch.

Query regarding Assignment 3

I got an email question regarding assignment 3, copy pasting the question answer here so that it could aid the broader group

Question:
I had a query about generating faces using GAN. Do we have to train the GAN from scratch using the IIITB face dataset or use the pretrained model on the celebA dataset and use transfer learning on top of it?

Looking for a 3 month intern student for this summer

Interested students please post your resume here. Please do it at the earliest.