Giter Site home page Giter Site logo

allensmile / semantic_segmentation_for_autonomous_driving Goto Github PK

View Code? Open in Web Editor NEW

This project forked from themoskowitz/semantic_segmentation_for_autonomous_driving

0.0 2.0 0.0 13.48 MB

Various networks that could be the basis of a vision system for self-driving cars

Jupyter Notebook 100.00%

semantic_segmentation_for_autonomous_driving's Introduction

Semantic_Segmentation_for_Autonomous_Driving

Video_Demo1

Video_Demo2

demo image

In this project I wanted to explore the state of the art in semantic segmentation for self-driving cars. Here are the different models I tried out in order from least effective to most effective.

Generic diagram of a Unet Model

Unet_KITTI: This is a version of a Unet model that I trained from scratch on the KITTI dataset to distinguish road from non-road. A key point to note -- in the skip connections (where a convolutional layer on the way down is joined with a deconvolutional layer of the same size on the way back up), the two tensors are concatenated. I ultimately concluded that this was a less successful approach. In this version I also followed the model laid out in the paper where VALID padding forces you to crop the convolutional tensors in order to combine them with the deconvolutional tensors. This means you lose some information around the edges over the course of your network. It's not a dealbreaker but it's annoying. Alternatively I could have started out by cropping the input image to a size that worked perfectly with the model so no more cropping would be needed but then I'd still be losing that border information.

Unet_VGG_Encoder: This is similar to the previous model except this time instead of training both the encoder and decoder parts from scratch, I used VGG weights for my encoder layers and only trained the decoder layers. I expected this to be much more successful but on Unet it didn't make much of a difference (it was important when I switched to Fully-Convolutional Networks however). I also switched to SAME padding in this version and didn't notice a difference in the results. This was convenient as it enabled me to use the network on images of any size without cropping.

Generic diagram of a Fully-Convolutional Network

FCN_VGG: This is a version of a fully-convolutional network that I trained with VGG encoding, still on the KITTI dataset, with the same task as the others. FCNs typically swap deconvolution for upsampling and add rather than concatenate the skip connections. I found the most effective approach to be a combination of deconvolution rather than upsampling and adding rather than concatenating skip connections so this is something of a hybrid. This network performed noticeably better, getting a per-pixel accuracy of between 92 and 93% on the test set.

FCN_CITY: This was far and away the best model. It had the accuracy boost of FCN_VGG plus I trained it on the Cityscapes dataset, which had 10x as many images (~2800). Because this made the network much stronger I gave it a correspondingly more difficult task. This time I trained it to distinguish between five categories -- road, sidewalk, vehicle(including cars, trucks, trains, buses, bicycles), pedestrian and background. The model achieved roughly 96% accuracy on the test set. I then ran it on a few additional unlabeled videos taken with the same camera but which the network had never seen before. The results are the demos above.

Where to go from here There are some very obvious next steps, namely using a bigger dataset and using a better encoder. Even Cityscapes is relatively tiny with under 3,000 training images. A more substantial dataset, say of 30,000-100,000 images from a variety of locations, would make a real difference. Replacing the VGG encoding with Inception or, ideally, a network not trained on imagenet at all but instead on a more relevant dataset (road imagery) and trained to solve a more relevant problem would also likely to boost the accuracy significantly.

This project is "Project #2" in my Deep Learning Projects video.

semantic_segmentation_for_autonomous_driving's People

Contributors

themoskowitz avatar

Watchers

Allen avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.