Giter Site home page Giter Site logo

gavinchuan9 / nd013-c1-vision-starter Goto Github PK

View Code? Open in Web Editor NEW

This project forked from udacity/nd013-c1-vision-starter

0.0 0.0 0.0 60.93 MB

Starter Code for the Course 1 project of the Udacity Self-Driving Car Engineer Nanodegree Program

License: Other

Jupyter Notebook 99.60% Python 0.36% Dockerfile 0.04%

nd013-c1-vision-starter's Introduction

Object Detection in an Urban Environment

Data

For this project, we will be using data from the Waymo Open dataset.

[OPTIONAL] - The files can be downloaded directly from the website as tar files or from the Google Cloud Bucket as individual tf records. We have already provided the data required to finish this project in the workspace, so you don't need to download it separately.

Structure

Data

The data you will use for training, validation and testing is organized as follow:

/home/workspace/data/waymo
    - training_and_validation - contains 97 files to train and validate your models
    - train: contain the train data (empty to start)
    - val: contain the val data (empty to start)
    - test - contains 3 files to test your model and create inference videos

The training_and_validation folder contains file that have been downsampled: we have selected one every 10 frames from 10 fps videos. The testing folder contains frames from the 10 fps video without downsampling.

You will split this training_and_validation data into train, and val sets by completing and executing the create_splits.py file.

Experiments

The experiments folder will be organized as follow:

experiments/
    - pretrained_model/
    - exporter_main_v2.py - to create an inference model
    - model_main_tf2.py - to launch training
    - reference/ - reference training with the unchanged config file
    - experiment0/ - create a new folder for each experiment you run
    - experiment1/ - create a new folder for each experiment you run
    - experiment2/ - create a new folder for each experiment you run
    - label_map.pbtxt
    ...

Prerequisites

Local Setup

For local setup if you have your own Nvidia GPU, you can use the provided Dockerfile and requirements in the build directory.

Follow the README therein to create a docker container and install all prerequisites.

Download and process the data

Note: โ€If you are using the classroom workspace, we have already completed the steps in the section for you. You can find the downloaded and processed files within the /home/workspace/data/preprocessed_data/ directory. Check this out then proceed to the Exploratory Data Analysis part.

The first goal of this project is to download the data from the Waymo's Google Cloud bucket to your local machine. For this project, we only need a subset of the data provided (for example, we do not need to use the Lidar data). Therefore, we are going to download and trim immediately each file. In download_process.py, you can view the create_tf_example function, which will perform this processing. This function takes the components of a Waymo Tf record and saves them in the Tf Object Detection api format. An example of such function is described here. We are already providing the label_map.pbtxt file.

You can run the script using the following command:

python download_process.py --data_dir {processed_file_location} --size {number of files you want to download}

You are downloading 100 files (unless you changed the size parameter) so be patient! Once the script is done, you can look inside your data_dir folder to see if the files have been downloaded and processed correctly.

Classroom Workspace

In the classroom workspace, every library and package should already be installed in your environment. You will NOT need to make use of gcloud to download the images.

Instructions

Exploratory Data Analysis

You should use the data already present in /home/workspace/data/waymo directory to explore the dataset! This is the most important task of any machine learning project. To do so, open the Exploratory Data Analysis notebook. In this notebook, your first task will be to implement a display_instances function to display images and annotations using matplotlib. This should be very similar to the function you created during the course. Once you are done, feel free to spend more time exploring the data and report your findings. Report anything relevant about the dataset in the writeup.

Keep in mind that you should refer to this analysis to create the different spits (training, testing and validation).

Create the training - validation splits

In the class, we talked about cross-validation and the importance of creating meaningful training and validation splits. For this project, you will have to create your own training and validation sets using the files located in /home/workspace/data/waymo. The split function in the create_splits.py file does the following:

  • create three subfolders: /home/workspace/data/train/, /home/workspace/data/val/, and /home/workspace/data/test/
  • split the tf records files between these three folders by symbolically linking the files from /home/workspace/data/waymo/ to /home/workspace/data/train/, /home/workspace/data/val/, and /home/workspace/data/test/

Use the following command to run the script once your function is implemented:

python create_splits.py --data-dir /home/workspace/data

Edit the config file

Now you are ready for training. As we explain during the course, the Tf Object Detection API relies on config files. The config that we will use for this project is pipeline.config, which is the config for a SSD Resnet 50 640x640 model. You can learn more about the Single Shot Detector here.

First, let's download the pretrained model and move it to /home/workspace/experiments/pretrained_model/.

We need to edit the config files to change the location of the training and validation files, as well as the location of the label_map file, pretrained weights. We also need to adjust the batch size. To do so, run the following:

python edit_config.py --train_dir /home/workspace/data/train/ --eval_dir /home/workspace/data/val/ --batch_size 2 --checkpoint /home/workspace/experiments/pretrained_model/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint/ckpt-0 --label_map /home/workspace/experiments/label_map.pbtxt

A new config file has been created, pipeline_new.config.

Training

You will now launch your very first experiment with the Tensorflow object detection API. Move the pipeline_new.config to the /home/workspace/experiments/reference folder. Now launch the training process:

  • a training process:
python experiments/model_main_tf2.py --model_dir=experiments/reference/ --pipeline_config_path=experiments/reference/pipeline_new.config

Once the training is finished, launch the evaluation process:

  • an evaluation process:
python experiments/model_main_tf2.py --model_dir=experiments/reference/ --pipeline_config_path=experiments/reference/pipeline_new.config --checkpoint_dir=experiments/reference/

Note: Both processes will display some Tensorflow warnings, which can be ignored. You may have to kill the evaluation script manually using CTRL+C.

To monitor the training, you can launch a tensorboard instance by running python -m tensorboard.main --logdir experiments/reference/. You will report your findings in the writeup.

Improve the performances

Most likely, this initial experiment did not yield optimal results. However, you can make multiple changes to the config file to improve this model. One obvious change consists in improving the data augmentation strategy. The preprocessor.proto file contains the different data augmentation method available in the Tf Object Detection API. To help you visualize these augmentations, we are providing a notebook: Explore augmentations.ipynb. Using this notebook, try different data augmentation combinations and select the one you think is optimal for our dataset. Justify your choices in the writeup.

Keep in mind that the following are also available:

  • experiment with the optimizer: type of optimizer, learning rate, scheduler etc
  • experiment with the architecture. The Tf Object Detection API model zoo offers many architectures. Keep in mind that the pipeline.config file is unique for each architecture and you will have to edit it.

Important: If you are working on the workspace, your storage is limited. You may to delete the checkpoints files after each experiment. You should however keep the tf.events files located in the train and eval folder of your experiments. You can also keep the saved_model folder to create your videos.

Creating an animation

Export the trained model

Modify the arguments of the following function to adjust it to your models:

python experiments/exporter_main_v2.py --input_type image_tensor --pipeline_config_path experiments/reference/pipeline_new.config --trained_checkpoint_dir experiments/reference/ --output_directory experiments/reference/exported/

This should create a new folder experiments/reference/exported/saved_model. You can read more about the Tensorflow SavedModel format here.

Finally, you can create a video of your model's inferences for any tf record file. To do so, run the following command (modify it to your files):

python inference_video.py --labelmap_path label_map.pbtxt --model_path experiments/reference/exported/saved_model --tf_record_path /data/waymo/testing/segment-12200383401366682847_2552_140_2572_140_with_camera_labels.tfrecord --config_path experiments/reference/pipeline_new.config --output_path animation.gif

Project overview

This project will use machine learning methods to identify objects in images.

Set up

Please reference README.md to set pu local environment.

Doing evaluation process may occur this error:

TypeError: 'numpy.float64' object cannot be interpreted as an integer
/usr/local/lib/python3.8/dist-packages/numpy/core/function_base.py, line 120

changed the code at function_base.py (line 120) from

num = operator.index(num)

to

num = operator.index(int(num))

can fix it.

Open Docker

sudo docker run --gpus all -v /home/yourname/nd013-c1-vision-starter/:/app/project/ --network=host -ti project-dev bash

Open multiple terminals in docker

sudo docker ps # Show the container_id
sudo docker exec -it <container_id> bash

Open Jupyter notebook in docker

jupyter notebook --allow-root

Open nvidia-smi

watch -n 1 -d nvidia-smi # Query every 1 second

Open Tensorboard in docker

tensorboard --logdir /app/project/experiments/reference/ # Display single run in TensorBoard
tensorboard --logdir_spec=ref:/app/project/experiments/reference/,e0:/app/project/experiments/experiment0 # Display different runs in TensorBoard

Dataset

Dataset analysis

The dataset was recorded in different weather conditions such as sunny, night, rainfall, foggy, etc.
(The red box represents the vehicle, the blue box represents the pedestrian, the green box represents the cyclist)

  • Sunny:
    alt text
  • Night:
    alt text
  • Rain:
    alt text
  • Foggy:
    alt text

I randomly select 1000 images from dataset for statistics, the following statistics show that:

  • The vehicle was the largest at 77.0%.
  • The pedestrian was the second-largest at 22.4%.
  • The cyclist was the lowest at 0.6%.
    alt text

I also interested in the size of the objects, the following statistics show that:

  • The Small was the largest at about 75.6% ~ 82.4%
  • The Medium was the second-largest at about 15.3 ~ 19.4%.
  • The Large was the lowest at about 2.3% ~ 5.0%.
    alt text
    alt text
    alt text

Cross validation

I will divide 97 tfrecords from training_and_validation folder into 80(train):10(val):10(test) ratio.
Merge dataset in training_and_validation and test folders, and split them may appears out of proportion.
Because the data size are too different.

  • training_and_validation folder: each given tfrecord's size is about 3M Bytes,and has about 20 samples.
  • test folder: each given tfrecord's size is about 30M Bytes,and has about 200 samples.

I using following code to count number of samples in a TFRecord file

import tensorflow.compat.v1 as tf
tf.enable_eager_execution()
sum(1 for _ in tf.data.TFRecordDataset("your/tfrecord/path/segment-xxx_with_camera_labels.tfrecord"))

Training

Reference experiment

The training and validation results are as follows:

Metrics Values
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] 0.001
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] 0.003
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] 0.003
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] 0.008
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] 0.005
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] 0.102

alt text

  • classification loss: Still not stable until the end of training
  • localization loss: Loss has converged.
  • regularization loss: Loss is very high, may be stuck at local minima.
  • total loss: Overall, loss is very high.
  • Training and validation results quite match, did not occur overfitting, so the split ratio no need to adjust.
  • AR and AP are very low, it means that hard to detect objects.

Improve on the reference

Experiment 0

Data augmentation is a technique that can be used to expand the size of a training dataset,
training a model with a large dataset will improve the performance.
I add/adjust the data augmentation as follows:

Data augmentations Values Reason
random_horizontal_flip 0.5 Vehicles, pedestrians, and cyclist are almost symmetrical, it helps to to increase datasets
random_adjust_brightness 0.3 Objects can still be identify under different brightness
random_rgb_to_gray 0.2 It helps to to identify objects of different colors

Training and validation result as follows, the overall performance are slightly improved, but still hard to detect objects.
The animation produced by Exp0 is same as Ref, so, i will not upload animation in order to save storage space.

Metrics Values(Ref) Values(Exp0)
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] 0.000 0.006
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] 0.001 0.019
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] 0.000 0.002
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] 0.000 0.001
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] 0.000 0.027
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] 0.003 0.039
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] 0.000 0.005
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] 0.003 0.024
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] 0.008 0.057
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] 0.000 0.013
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] 0.005 0.164
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] 0.102 0.167

alt text

Experiment 1

From the Experiment 0 results, i thinks loss function was stuck at local minima.
So i change optimizer as follows, it helps to find out global minima.

optimizer {
    adam_optimizer {
        learning_rate {
            exponential_decay_learning_rate {
                initial_learning_rate: 0.001
                decay_steps: 500
            }
        }
    }
}

Training and validation result as follows, the overall performance are improved.

Metrics Values(Exp0) Values(Exp1)
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] 0.006 0.072
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] 0.019 0.154
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] 0.002 0.060
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] 0.001 0.024
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] 0.027 0.169
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] 0.039 0.273
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] 0.005 0.024
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] 0.024 0.090
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] 0.057 0.138
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] 0.013 0.067
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] 0.164 0.299
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] 0.167 0.400

alt text

nd013-c1-vision-starter's People

Contributors

gavinchuan9 avatar mvirgo avatar abhiojha8 avatar sudkul avatar defqoon avatar uanjali avatar ssubbotin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.