Giter Site home page Giter Site logo

aurangzaib / carnd-traffic-sign-classifier-project Goto Github PK

View Code? Open in Web Editor NEW

This project forked from udacity/carnd-traffic-sign-classifier-project

1.0 2.0 0.0 139.38 MB

Classify Traffic Signs.

Jupyter Notebook 46.31% Python 0.59% HTML 53.10%
neural-networks convolutional-neural-networks tensorflow python image-classification traffic-sign-classification lenet

carnd-traffic-sign-classifier-project's Introduction

Traffic Sign Recognition

 

For the complete implementation and notebook:

https://github.com/aurangzaib/CarND-Traffic-Sign-Classifier-Project

 

Build a Traffic Sign Recognition Project:

The goals / steps of this project are the following:

  • Load the German Traffic Signs pre-labelled dataset

  • Explore, summarize and visualize the data set

  • Design, train and test a model architecture with high accuracy

  • Use the model to make predictions on new images from the internet

  • Analyze the softmax probabilities of the new images

 

Data Set Summary & Exploration:

First we will explore together the dataset of traffic signs. We will see what is a shape of an image, how many training, validation and testing examples are available in the dataset.

As we will see, the dataset doesn't have a uniform distribution of the samples for each class.

def get_data_summary(feature, label):
    import numpy as np
    # What's the shape of an traffic sign image?
    image_shape = feature[0].shape
    # How many unique classes/labels there are in the dataset.
    unique_classes, n_samples = np.unique(label,
                                          return_index=False,
                                          return_inverse=False,
                                          return_counts=True)
    n_classes = len(unique_classes)
    n_samples = n_samples.tolist()
    print("Image data shape =", image_shape)
    return image_shape[0], image_shape[2], n_classes, n_samples


def train_test_examples(x_train, x_validation, x_test):
    # Number of training examples
    n_train = len(x_train)
    # Number of validation examples
    n_validation = len(x_validation)
    # Number of testing examples.
    n_test = len(x_test)
    print("Number of training examples =", n_train)
    print("Number of validation examples =", n_validation)
    print("Number of testing examples =", n_test)
Property Summary
Image shape 32x32x3
Training samples 34799
Validation samples 12630
Testing samples 4410
Unique classes 43

 

2. Include an exploratory visualization of the dataset:

def get_classes_samples(index, labels):
    return [i for i, _x_ in enumerate(labels) if _x_ == index]


def loopover_data(index, x, y, high_range, steps):
    import matplotlib.pyplot as plt
    % matplotlib inline
    images = get_classes_samples(index, y)
    selected_images = images[:high_range:steps] if len(images) > 100 else images
    images_in_row = int(high_range/steps)
    fig, axes = plt.subplots(1, images_in_row, figsize=(15, 15))
    for _index, image_index in enumerate(selected_images):
        image = x[image_index].squeeze()
        axes[_index].imshow(image)
    plt.show()


def visualize_data(x, y, n_classes, n_samples, high_range=160, steps=20, show_desc=True, single_class=False):
    from pandas.io.parsers import read_csv
    label_signs = read_csv('signnames.csv').values[:, 1]  # fetch only sign names
    if single_class:
        loopover_data(n_classes, x, y, high_range, steps)
    else:
        for index in range(n_classes):
            if show_desc:
                print("Class {} -- {} -- {} samples".format(index + 1, label_signs[index], n_samples[index]))
            loopover_data(index, x, y, high_range, steps)

Now we will visualize the dataset, what are the features available and how the labels are distributed in the dataset:

Class 1 -- Speed limit (20km/h) -- 180 samples

 

Class 2 -- Speed limit (30km/h) -- 1980 samples

 

Class 3 -- Speed limit (50km/h) -- 2010 samples

 

Class 4 -- Speed limit (60km/h) -- 1260 samples

 

Class 5 -- Speed limit (70km/h) -- 1770 samples

 

Class 6 -- Speed limit (80km/h) -- 1650 samples

 

Class 7 -- End of speed limit (80km/h) -- 360 samples

 

Class 8 -- Speed limit (100km/h) -- 1290 samples

 

Class 9 -- Speed limit (120km/h) -- 1260 samples

 

Class 10 -- No passing -- 1320 samples

 

def histogram_data(x, n_samples, n_classes):
    import matplotlib.pyplot as plt
    width = 1 / 1.2
    fig = plt.figure(figsize=(15, 6))
    ax = fig.add_subplot(111)
    ax.set_title('Samples Distribution')
    ax.set_xlabel('Classes')
    ax.set_ylabel('Number of Samples')
    plt.bar(range(n_classes), n_samples, width, color="blue")
    plt.show()

 

Labels distribution in Train Dataset:

 

Labels distribution in Augmented Dataset:

As we discussed, the dataset contains very few samples for some the classes. Obviously, we need to fix it.

We will see later how we can fix this issue by augmenting the given dataset but for now let's enjoy the histogram after the data augmentation.

 

Labels distribution in Test Dataset:

 

Pre-process the Data Set:

Preprocessing is an important step before training neural network. It consists of:

  • Grayscale the images.

  • Normalize the dataset using Feature Scaling.

 

Yann LeCun Paper describes that the color channel info doesn't play any useful part in classification, so we apply grayscaling on the images to have uniform values in all 3 channels. The images are transformed to 3 channel grayscale using OpenCV.

The train, validation and test datasets are normalized using Feature Rescaling.

def grayscale(x):
    import cv2 as cv
    import numpy as np
    for index, image in enumerate(x):
        gray = cv.cvtColor(image, cv.COLOR_RGB2GRAY)
        im2 = np.zeros_like(image)
        im2[:, :, 0], im2[:, :, 1], im2[:, :, 2] = gray, gray, gray
        x[index] = im2
    return x


def normalizer(x):
    import numpy as np
    x_min = float(np.min(x))
    x_max = float(np.max(x))
    x = (x - x_min) / (x_max - x_min)
    return x


def pre_process(features, labels, is_train=False):
    from sklearn.utils import shuffle
    assert (len(features) == len(labels))
    features = grayscale(features)
    features = normalizer(features)
    if is_train:
        features, labels = shuffle(features, labels)
    return features, labels
Property Value
Before Normalization:
Pixel Value 0 to 255
After Normalization:
Pixel Value -1 to +1
Mean ~0

 

png

 

Image Transformations and Rotations:

 

The idea for Label preserving data augmentation came from the AlexNet for ImageNet Classification

As we saw earlier, the dataset doesn’t contain the uniform distribution of the samples for each class. We can fix it by generating new images by performing transformation using translation, rotation, changing brightness etc. This is called Data Augmentation.

With augmentation, we gain another advantage that now our training set is larger than before and also more varied so it also helps in reducing the overfit during the training process.

I primarily used OpenCV for image transformations.

def visualize_augmented_features(features, labels, index, images_in_row=1):
    import matplotlib.pyplot as plt
    from random import choice
    %matplotlib inline
    indices = get_classes_samples(index, labels)
    fig, axes = plt.subplots(1, images_in_row, figsize=(15, 15))
    for index in range(images_in_row):
        random_index = choice(indices)
        image = features[random_index].squeeze()
        axes[index].imshow(image)
    plt.show()
    
    
def perform_rotation(image, cols, rows):
    from random import randint
    import cv2
    center = (int(cols / 2), int(cols / 2))
    angle = randint(-12, 12)
    transformer = cv2.getRotationMatrix2D(center, angle, 1)
    image = cv2.warpAffine(image, transformer, (cols, rows))
    return image


def perform_translation(image, cols, rows, value):
    import cv2
    import numpy as np
    transformer = np.float32([[1, 0, value], [0, 1, value]])
    image = cv2.warpAffine(image, transformer, (cols, rows))
    return image

    
def perform_transformation(feature, label):
    from random import randint
    transform_level = 10
    rows, cols, channels = feature.shape
    rotational_value = randint(-int(rows / transform_level), int(rows / transform_level))
    image = perform_rotation(feature, cols, rows)
    image = perform_translation(image, cols, rows, rotational_value)
    return image, label


def augment_dataset(features, labels, n_classes):
    from random import randint
    from sklearn.utils import shuffle
    import numpy as np
    transforms_per_image = 20
    iterations = 100
    augmented_features, augmented_labels = [], []
    for _i_ in range(iterations):
        for i in range(transforms_per_image):
            # get a random class from 0 to 42
            random_class = randint(0, n_classes)
            # select 10 features and labels of that class
            selected_index = get_classes_samples(random_class, labels [random_class:random_class + 1]
            # print("index: ", selected_index)
            selected_labels = labels[selected_index]
            # perform transformation in each of the features
            for index, transform_y in zip(selected_index, selected_labels):
                # get rows and cols of the image
                transform_x = features[index]
                rows, cols, channels = transform_x.shape
                # create several transforms from a single image
                for value in range(-int(rows), int(rows), 4):
                    # perform transformations on the image
                    aug_x, aug_y = perform_transformation(transform_x, transform_y)
                    augmented_features.append(aug_x)
                    augmented_labels.append(aug_y)
    # append the results of transformations
    augmented_features, augmented_labels = shuffle(augmented_features, augmented_labels)
    augmented_features = np.array(augmented_features)
    # assertion
    assert (len(augmented_features) == len(augmented_labels))
    return augmented_features, augmented_labels

Visualize how the transformation is performed.

 

png

png

 

Design and Test a Model Architecture

 

Now, I implemented the multi-layer Convolutional Neural Network architecture. 

 

The starting point was the LeNet Architecture which consists of Convolution Layers followed by Fully Connected layers:

 

Then I adjusted the architecture by following Sermanet & LeCun Publication on Traffic Sign Recognition. 
[http://yann.lecun.org/exdb/publis/psgz/sermanet-ijcnn-11.ps.gz

 

 

Then I made tweak the architecture a little further by:

  • Using 3 Convolution layers.

  • Using Dropouts after the Fully Connected layers. Dropout was proposed by Geoffrey Hinton et al. It is a technique to reduce overfit by randomly dropping the few units so that the network can never rely on any given activation. Dropout helps network to learn redundant representation of everything to make sure some of the information retain.

 

Note: When Dropout technique is used, the dropped out neurons do not contribute in forward and backward pass.

def le_net(_x_, mu, stddev, dropouts, input_channels=1, output_channels=10):
    from tensorflow.contrib.layers import flatten
    train_dropouts = {
        'c1': dropouts[0],
        'c2': dropouts[1],
        'c3': dropouts[2],
        'fc1': dropouts[3],
        'fc2': dropouts[4],
    }
    w, b = get_weights_biases(mu, stddev, input_channels, output_channels)
    padding = 'VALID'
    k = 2
    st, pool_st, pool_k = [1, 1, 1, 1], [1, k, k, 1], [1, k, k, 1]
    # Layer 1 -- convolution layer:
    conv1 = convolution_layer(_x_, w['c1'], b['c1'], st, padding, pool_k, pool_st, train_dropouts['c1'])
    # Layer 2 -- convolution layer:
    conv2 = convolution_layer(conv1, w['c2'], b['c2'], st, padding, pool_k, pool_st, train_dropouts['c2'])
    # Layer 3 -- convolution layer
    conv3 = convolution_layer(conv2, w['c3'], b['c3'], st, padding, pool_k, pool_st, train_dropouts['c3'], apply_pooling=False)
    # Flatten
    fc1 = flatten(conv3)
    print("Faltten: {}\n".format(fc1.get_shape()[1:]))
    # Layer 4 -- fully connected layer:
    fc1 = full_connected_layer(fc1, w['fc1'], b['fc1'], train_dropouts['fc1'])
    # Layer 5 -- full connected layer:
    fc2 = full_connected_layer(fc1, w['fc2'], b['fc2'], train_dropouts['fc2'])
    # Layer 6 -- fully connected output layer:
    out = output_layer(fc2, w['out'], b['out'])
    # parameters in each layer
    n_parameters(conv1, conv2, conv3, fc1, fc2, out)
    return out

My architecture is slightly modified from the above mentioned reference architecture and is as follows: 

Layer Description Filter Weight Filter Bias Stride Padding Dropout Dimension Parameter
Layer 1 Convolutional 5x5x6 6 2x2 Valid 1.0 Input: 32x32x3 456
ReLU: 28x28x6
Max Pooling: 14x14x6
Layer 2 Convolutional 5x5x16 16 2x2 Valid 1.0 Input: 14x14x6 2416
ReLU: 10x10x16
Max Pooling: 5x5x16
Layer 3 Convolutional 5x5x400 400 2x2 Valid 1.0 Input: 5x5x16 160400
ReLU: 1x1x400
Flatten 400
Layer 4 Fully Connected 400x120 120 0.6 Input: 400 48120
ReLU: 120
Layer 5 Fully Connected 120x84 84 0.5 Input: 120 14520
ReLU: 84
Layer 6 Output 84x43 43 Input: 84 7140
Logits: 84

 

retrain_model = False
no_improvement_count = 0
prev_accuracy = 0
current_accuracy = 0
if retrain_model:
    with tf.Session() as sess:
        sess.run(init)
        print("Training....")
        for e in range(hyper_params['epoch']):
            # training the network
            prev_accuracy = current_accuracy
            x_train_p, y_train_p = shuffle(x_train_p, y_train_p)
            batches = get_batches(hyper_params['batch_size'], x_train_p, y_train_p)
            for batch_x, batch_y in batches:
                batch_x, batch_y = shuffle(batch_x, batch_y)
                sess.run(optimizer, feed_dict={
                    x: batch_x, y: batch_y,
                    dropouts: hyper_params['dropouts']
                })
            # validation the network
            validation_accuracy = sess.run(accuracy, feed_dict={
                x: x_validation_p, y: y_validation_p,
                dropouts: hyper_params['test_dropouts']
            })
            # early termination
            current_accuracy = validation_accuracy
            no_improvement_count = no_improvement_count + 1 if current_accuracy < prev_accuracy else 0
            if no_improvement_count > 3:
                break
            print("{}th epoch - before: {:2.3f}%".format(e + 1, validation_accuracy * 100))
        saver.save(sess, save_file)
    print("Model saved")

The hyper parameters are as follows:

Parameter Value
Mean 0
Standard Deviation 0.1
Epochs 25
Batch Size 128
Learn Rate 0.001
Dropouts Layer 1: 1.0
Layer 2: 1.0
Layer 3: 0.6
Layer 4: 0.5
Layer 5: 0.5
Test Dropouts 1.0

  Now we will train the classifier.

 

I used Adam Optimizer to optimize Weights and Biases using Back Propogation instead of using Stochastic Gradient Descent. Following is the implementation of the LeNet Architecture. For implementation details of each layer, please have a look to the Github repo.

 

When training a network on not-so-powerful computers, it is important to apply Mini-batching so that the network can be trained with small chunks of the training data at a time without overloading the memory of the machine. I wrote following code for mini-batching:

def get_batches(_batch_size_, _features_, _labels_):
    import math
    total_size, index, batch = len(_features_), 0, []
    n_batches = int(math.ceil(total_size / _batch_size_)) if _batch_size_ > 0 else 0
    for _i_ in range(n_batches - 1):
        batch.append([_features_[index:index + _batch_size_],
                      _labels_[index:index + _batch_size_]])
        index += _batch_size_
    batch.append([_features_[index:], _labels_[index:]])
    return batch

 

Now, for actual training of the network, we need to create a session of TensorFlow and optimize the parameters. My results for the Validation sets are:

 

Epochs Accuracy (%) Epochs Accuracy (%)
1st 71.020 2nd 80.839
3rd 86.576 4th 86.469
5th 89.138 6th 90.680
7th 91.270 8th 92.449
9th 92.426 10th 93.424
11th 93.832 12th 93.379
13th 93.469 14th 94.490
15th 93.628 16th 94.376
17th 93.129 18th 94.535
19th 94.376 20 95.215
21st 94.921 22nd 94.671
23rd 94.649 24th 94.581
25th 94.172

 

Now comes the part where we test the accuracy of the network on the hidden test data.

test_data = True
if test_data:
    with tf.Session() as sess:
        saver.restore(sess, save_file)
        print("Model restored")
        test_accuracy = sess.run(accuracy, feed_dict={
            x: x_test_p,
            y: y_test_p,
            dropouts: hyper_params['test_dropouts']
        })
        print("test accuracy: {:2.3f}%".format(test_accuracy * 100))

The network is able to achieve 95.306% accuracy on the test data.

Test Accuracy 95.306%

 

Test a Model on New Images:

To give myself more insight into how my model is working, I downloaded several images from the internet of traffic signs and tested the accuracy of the pre-trained network.

 

Discussion​ on New Test Data:

 

Image 1: It is Left Turn sign and the network classifies correctly. <br />

 

Image 2: It is a Bicycle Crossing sign but it is slightly modified from the sign the network was trained on. The network confuses it with the Right-of-way sign.

 

Image 3: It is Ahead Only sign and the network classifies correctly.

 

Image 4: It is a Traffic Signal sign. The network confuses it with the Pedastrian sign.

 

Image 5: It is a Slippery Road sign. The network confuses it with the Stop sign. The reason might be that in train data the Slippery Road sign has car inclined in it while the test image has car horizontal it.

 

Image 6, 7, 8: These are Priority road, Turn Right Ahead and Yeild signs respectively. The network classifies correctly.

test_new_data = True
if test_new_data:
    with tf.Session() as sess:
        x_test_new, y_test_new, file_names = get_new_test_data('/test-data/')
        x_test_new_p, y_test_new_p = pre_process(x_test_new, y_test_new)
        saver.restore(sess, save_file)
        predicted_logits = sess.run(accuracy, feed_dict={
            x: x_test_new_p,
            y: y_test_new_p,
            dropouts: hyper_params['test_dropouts']
        })
        prediction_probabilities = sess.run(soft_max_prb, feed_dict={
            x: x_test_new_p,
            dropouts: hyper_params['test_dropouts']
        })
        print("predicted logits: {}\n".format(predicted_logits))
        top_p, top_i = sess.run(tf.nn.top_k(tf.constant(prediction_probabilities), k=5))
        for index in range(len(top_p)):
            visualize_test_images(x_test_new[index], is_single=True)
            print(file_names[index])
            for p, i in zip(top_p[index], top_i[index]):
                print("{}:  {:2.3f}%".format(traffic_sign_name(i), p * 100))
            print("\n")

Predictions on New Test Data:

Predictions Confidence (%)
Dangerous curve to the left 100.000
Slippery road 0.000
Right-of-way at the next intersection 0.000
Double curve 0.000
Road work 0.000
Ground Truth Dangerous curve left

Predictions Confidence (%)
Road narrows on the right 42.401
Pedestrians 37.532
General caution 19.522
Traffic signals 0.370
Right-of-way at the next intersection 0.150
Ground Truth Bicycle crossing

Predictions Confidence (%)
Ahead only 100.000
No passing 0.000
Bicycles crossing 0.000
Turn left ahead 0.000
Speed limit (60km/h) 0.000
Ground Truth Ahead Only

Predictions Confidence (%)
Road narrows on the right 98.333%
Pedestrians 0.889
Road work 0.563
Children crossing 0.071
General caution 0.052
Ground Truth Traffic Signals

Prediction Confidence (%)
No entry 99.996
Stop 0.004
Turn right ahead 0.000
Roundabout mandatory 0.000
Speed limit (70km/h) 0.000
Ground Truth Slippery road

Predictions Confidence (%)
Priority road 100.000
Roundabout mandatory 0.000
Right-of-way at the next intersection 0.000
Yield 0.000
No passing 0.000
Ground Truth Priority road

Predictions Confidence (%)
Turn right ahead 99.983
Stop 0.017
Keep left 0.000
Speed limit (30km/h) 0.000
No entry 0.000
Ground Truth Go straight or right

Predictions Confidence (%)
Roundabout mandatory 42.161
End of no passing by vehicles over 3.5 metric tons 38.446
Right-of-way at the next intersection 13.663
Priority road 2.242
Keep right 1.334
Ground Truth Roundabout mandatory

 

For the complete implementation and notebook:

https://github.com/aurangzaib/CarND-Traffic-Sign-Classifier-Project

carnd-traffic-sign-classifier-project's People

Contributors

andrewpaster avatar aurangzaib avatar awbrown90 avatar brok-bucholtz avatar davidawad avatar domluna avatar dsilver829 avatar josemacenteno avatar mvirgo avatar ryan-keenan avatar swwelch avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.