Giter Site home page Giter Site logo

kubric_object_detection's Introduction

Detecting Cans and Bottles via Synthetic Data

cool_image

Introduction

This repository is designed to identify bottles and cans in real images by exclusively leveraging synthetic data during the training process. This endeavor unfolds into two pivotal components: synthetic data generation and object detection. Specifically, we utilize Kubric for rendering synthetic data using supplied asset geometries and employ Stable Diffusion 2 to generate random textures. Furthermore, the TensorFlow Object Detection API is used to adeptly train our object detection model.

Setup

Pre-requisites:

Ensure to have NVIDIA Docker Container Toolkit installed on your system.

# Install 'make' utility
sudo apt install build-essential

Building Docker Images:

Execute the following command to build docker images:

# NOTE: might need sudo before the make command
make build-docker

The above command initiates the pull of kubricdockerhub/kubruntu for Kubric, subsequently building a Docker image tagged as gpu_docker_tf2. This image incorporates TensorFlow 2 for object detection and PyTorch.

Synthetic Data Generation

Generating Synthetic Data

Texture Generation

First, generate synthetic textures using Stable Diffusion 2 by specifying the PROMPT in the Makefile and running the following command:

make generate-textures

This will generate approximately 10,000 labeled texture images.

Or, download the generated textures

wget https://storage.googleapis.com/tx-hyu-task/generated_textures.zip
unzip generated_textures.zip

Generating Scenes with Kubric

You can render random scenes with organized or random placements of cans and bottles into the scene using random HDRI domes, texture placements from SD2 textures, random textures, random camera FOV, Blender rendering settings, lighting, and camera positions. Execute the following command:

make run-synthetic

This will produce around 20,000 images along with bounding box information.

Or, download the generated synthetic data, which has around 26k images:

# Download generated data generated from kubric
wget https://storage.googleapis.com/tx-hyu-task/kubric_synthetic_data_output.tar
tar -xvf tfrecord_output.tar

Creating TFRecords

After generating the data, create TFRecords from it with the following command:

make create-tfrecords

Or, download generated tfrecords:

# Download tfrecords
wget https://storage.googleapis.com/tx-hyu-task/tfrecord_output.tar
tar -xvf tfrecord_output.tar

For reproducing the object detection results only tfrecord_output.tar is required.

Object Detection

The object detector is based on an EfficientDet 0 model, and it does not utilize any external pretrained weights. Different datasets are used see Discussion.

  • Initial Model (v1): An initial model with an input size of 384x384 pixels was trained from scratch as a starting point for later models.

  • Improved Model (v2): A second model with an input size of 512x512 pixels was trained - weights are initialized from the initial model.

Training

To reproduce the training process for either model, follow these steps:

  1. Depending on the model you want to train, modify the Makefile accordingly:
# for model v1 - 384x384
PIPELINE_CONFIG_PATH := /workspace/tx-trainer/object_detector/effdet0.config
MODEL_DIR := /workspace/tx-trainer/models/effdet0

# After training model v2 train model v2

# for model v2 - 512x512
PIPELINE_CONFIG_PATH := /workspace/tx-trainer/object_detector/effdet0-v2.config
MODEL_DIR := /workspace/tx-trainer/models/effdet0-v2
  1. After making the necessary configuration changes, execute the following commands to perform training, evaluation, and model export:
make train
make evaluate-model
make export-model

This process will train the selected model, evaluate its performance, and export the trained model for future use.

Pretrained Weights

Download pretrained weights for EfficientDet V0 for input sizes 384x384 and 512x512:

wget https://storage.googleapis.com/tx-hyu-task/models.tar
tar -xvf models.tar

To view training logs, run TensorBoard:

tensorboard --logdir=models

Find the final exported TensorFlow SavedModel at:

models/effdet0-v2/exported/saved_model

Inference

# Download original assignment zip to get target images
wget https://storage.googleapis.com/tx-hyu-task/drink_detection_assigment.zip 
unzip drink_detection_assigment.zip

# Run inference on final model
sudo docker run -v`pwd`:/od gpu_docker_tf2 python /od/object_detector/inference.py \
                                --model_dir /od/models/effdet0-v2/exported/saved_model \
                                --image_dir /od/drink_detection_assigment/target_images/ \
                                --output_dir /od/detection_output \
                                --output_json /od/detection_output.json \
                                --score_threshold 0.5                    
  • --model_dir: The directory path to the pre-trained saved model, this is created by using the make export-model command after training.
  • --image_dir: The directory containing the input images on which object detection will be performed.
  • --output_dir: The directory where the detection results will be saved.
  • --output_json: The path to the JSON file where detection results in JSON format will be stored.
  • --score_threshold: The probability threshold used during the filtering of the detection results.

Json schema

The JSON output consists of a list of dictionaries, with each dictionary representing the detection results for a specific image. The schema is as follows:

  • Image Path (image_path): A string representing the file path of the image for which detections were made.
  • Detections (detections): A list containing dictionaries, each representing a detected object within the image. Each detection dictionary contains the following attributes:
    • Class (class): A string representing the class or category of the detected object (e.g., "bottle," "can").
    • Score (score): A floating-point number indicating the confidence score of the detection. Scores range from 0 to 1, with higher scores indicating greater confidence in the detection.
    • Bounding Box (bbox): A list of integers representing the bounding box coordinates around the detected object. The format is [x_min, y_min, x_max, y_max], where (x_min, y_min) is the top-left corner, and (x_max, y_max) is the bottom-right corner of the bounding box.

Results

Here are some detection outputs from the model on real-world images.

005 004
003 002

kubric_object_detection's People

Contributors

howaboutyu avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.