Giter Site home page Giter Site logo

carnd-traffic-sign-classifier-project's Introduction

Traffic Sign Recognition

Files Submitted

Dataset Exploration

Dataset Summary

The downloaded dataset was comprised of traffic sign images, where each image was 32 pixels wide, 32 pixels tall and 24-bit colors. In total we had:

  • 34,799 training examples;
  • 12,630 testing examples;
  • 43 distinct classes.

Original dataset histogram

Exploratory Visualization

Here is a random sample of 250 traffic signs from the training dataset:

Dataset sample

It is noteworthy the pictures were taken with various light conditions and some are quite hard to read.

Augmented Dataset

The dataset distribution was very irregular and that could negatively impact the training and performance of our model.

This could be mitigated by augmenting the original dataset by generating variations of the signs with lower frequencies.

I decided to rotate those signs by ± 30 degrees in each of the 3 axis, giving an illusion of perspective:

Augumented sample

By doing this, I was able to increase the training dataset size from 34,799 to 86,430 samples with uniform distribution:

Augmented dataset histogram

Design and Test a Model Architecture

Preprocessing

The images were pre-processed with only 2 steps:

  1. Conversion from 24-bit color space into 8-bit grayscale using cv2.cvtColor. Although colors can be an important signal to help identify traffic signs, they must remain distinct if converted to grayscale. The reduced color space can simplify architecture and reduce the demand for computational resources;

  2. Normalization via cv2.equalizeHist. This would increase contrast and highlight the important features.

Dataset sample

Model Architecture

The model employed was based on the LeNet implemented in a previous lab, with minimal changes to produce logits with 43 classes instead of 10:

LeNet

Model Training

Since the input images should contain a single traffic sign, the tf.nn.softmax_cross_entropy_with_logits function was choosen as optmizer:

Computes softmax cross entropy between logits and labels. Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class).

The tf.nn.sparse_softmax_cross_entropy_with_logits function would be another suitable optimizer, but it wasn't evaluated at this time.

Solution Approach

After over a dozen iterations, tweaking each hyperparameter individually, the best results were achieved with the following values:

  • Number of epochs: 120;
  • Batch size: 128;
  • Learning rate: 0.001.

That lead to the following accuracy:

  • Validation set: 93.1%;
  • Test set: 89.6%.

I also wrote a script to extract a segment from an image, given its URL and use it to create an addtional test set with traffics signs images found on the World Wide Web. Like the one bellow:

Web image

From that original image, I chose 7 that had belong to the 43 classes we used to train our model:

Web test set

This picture seems to be taken in a studio, so brightness and contrast are excellent. However, many of the chosen signs are overlapping with other signs and that might lead the model to misclassify them.

After running those images against the trained model, the result was:

Sign Prediction Hit
Slippery road Slippery road ✔️
End of no passing by vehicles over 3.5 metric tons Ahead only
No entry No entry ✔️
Go straight or right Go straight or right ✔️
Road work Prediction Road work ✔️
Traffic signals Traffic signals ✔️
Children crossing Children crossing ✔️

That is 6 out 7 or 85.7% accuracy. Which is down from 93.1% for the validation set and 89.6% from the test set, which might indicate overfitting.

Digging down into the top five predictions for each sign:

It was a surprise that every top prediction was close to 100% of confidence. I wonder if this is another indicator of overfitting.

Final Thoughts

The results are far from stellar and definetelly not even close to a solution that could be shipped and used in the real world.

Some areas where I could have invested more time:

  • Image pre-processing: remove the background, leaving only the region that encompasses the sign. That should straight-forward, since all signs fall into a limited class of geometric shapes (circle, triangle, rectangle, etc). Another step of normalization, after the background removal;
  • Model architecture: LeNet has an incredible accuracy for a much simpler problem and it was surprising to see it perform well to classify traffic signs, but I could spent some time researching papers about other architectures that could perform better. Also, there are indications of overfitting and we could make some improvements on the pooling steps or even add dropouts at some layers.

All things considered, I am extremely satisfied to learn so much about Python (and so many libraries), while experimenting with a real world application for CNNs.

carnd-traffic-sign-classifier-project's People

Contributors

andrewpaster avatar antorsae avatar brok-bucholtz avatar davidawad avatar domluna avatar dsilver829 avatar mvirgo avatar rodfernandez avatar ryan-keenan avatar swwelch avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.