Giter Site home page Giter Site logo

sf-mask-rcnn's Introduction

Synthetic RGB-D Fusion (SF) Mask R-CNN

Synthetic RGB-D Fusion (SF) Mask R-CNN for unseen object instance segmentation

S. Back, J. Kim, R. Kang, S. Choi and K. Lee. Segmenting unseen industrial components in a heavy clutter using rgb-d fusion and synthetic data. 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020. [Paper] [Video]

Demo

SF Mask R-CNN


Unseen object instance segmentation performance on WISDOM dataset

Method Input Use Synthetic Data Backbone mask AP box AP Reference
SD Mask R-CNN Depth Yes (WISDOM) ResNet-35-FPN 51.6 - Danielczuk et al.
Mask R-CNN RGB No ResNet-35-FPN 38.4 - Danielczuk et al.
Mask R-CNN RGB No ResNet-50-FPN 40.1 36.7 Ito et al.
D-SOLO RGB No ResNet-50-FPN 42.0 39.1 Ito et al.
PPIS RGB No ResNet-50-FPN 52.3 48.1 Ito et al.
Mask R-CNN RGB Yes (Ours) ResNet-50-FPN 59.0 61.4 Ours
Mask R-CNN Depth Yes (Ours) ResNet-50-FPN 59.6 60.4 Ours
SF Mask R-CNN (early fusion) RGB-Depth Yes (Ours) ResNet-50-FPN 55.5 57.2 Ours
SF Mask R-CNN (late fusion) RGB-Depth Yes (Ours) ResNet-50-FPN 58.7 59.0 Ours
SF Mask R-CNN (confidence fusion) RGB-Depth Yes (Ours) ResNet-50-FPN 60.5 61.0 Ours

SF Mask R-CNN is an upgraded version of RGB-D fusion Mask R-CNN with a confidence map estimator [1]. The main differences from [1] are

  • SF Mask R-CNN generates a self-attention map from RGB and inpainted depth (validity mask and raw depth were used in [1])
  • This self-attention map is used as a confidence map; Thus, RGB and depth feature maps fused with spatial self-attention in four different scales.
  • It was fined-tuned on WISDOM-REAL-Train (100 images) and evaluated on public unseen object instance segmentation dataset, WISDOM (The only custom industrial dataset was used previously)

Updates

  • SF Mask R-CNN has been released (2020/02/18)
  • Paper extended from [1] with the detailed description, synthetic data, and robot demo will be publicly available soon... Stay tuned!

Getting Started


Environment Setup

  1. Setup anaconda environment
$ conda create -n sfmaskrcnn python=3.7
$ conda activate sfmaskrcnn
$ pip install torch torchvision
$ pip install imgviz tqdm tensorboardX pandas opencv-python imutils pyfastnoisesimd scikit-image pycocotools
$ pip install pyrealsense2 # for demo
$ conda activate sfmaskrcnn
  1. Download the provided SF Mask R-CNN weights pre-trained on our custom dataset.
  1. Download the WISDOM-Real dataset [Link]

  2. Set the path to the dataset and pretrained weights (You can put this into your bash profile)

$ export WISDOM_PATH={/path/to/the/wisdom-real/high-res/dataset}
$ export WEIGHT_PATH={/path/to/the/pretrained/weights}

Train

To train an SF Mask R-CNN (confidence fusion, RGB-noisy depth as input) on a synthetic dataset.

$ python train.py --gpu 0 --cfg rgb_noisydepth_confidencefusion

To fine-tune the SF Mask R-CNN on WISDOM dataset

$ python train.py --gpu 0 --cfg rgb_noisydepth_confidencefusion_FT --resume

Evaluation

To evaluate an SF Mask R-CNN (confidence fusion, RGB-noisy depth as input) on a WISDOM dataset

$ python eval.py --gpu 0 --cfg rgb_noisydepth_confidencefusion \
    --eval_data wisdom \
    --dataset_path $WISDOM_PATH \
    --weight_path $WEIGHT_PATH/SFMaskRCNN_ConfidenceFusion.tar 

Visualization

To visualize the inference results of SF Mask R-CNN on a WISDOM dataset

$ python inference.py --gpu 0 --cfg rgb_noisydepth_confidencefusion \
    --eval_data wisdom --vis_depth \
    --dataset_path $WISDOM_PATH \
    --weight_path $WEIGHT_PATH/SFMaskRCNN_ConfidenceFusion.tar 

Our custom synthetic dataset

$ python inference.py --gpu 0 --cfg rgb_noisydepth_confidencefusion \
    --eval_data synthetic --vis_depth \
    --dataset_path examples \
    --weight_path $WEIGHT_PATH/SFMaskRCNN_ConfidenceFusion.tar 

Demo with RealSense

To run real-time demo with realsense-d435

# SF Mask R-CNN (confidence fusion)
$ python demo.py --cfg rgb_noisydepth_confidencefusion \
    --weight_path $WEIGHT_PATH/SFMaskRCNN_ConfidenceFusion.tar 

# SF Mask R-CNN (early fusion)
$ python demo.py --cfg rgb_noisydepth_earlyfusion \
    --weight_path $WEIGHT_PATH/SFMaskRCNN_EarlyFusion.tar 


# SF Mask R-CNN (late fusion)
$ python demo.py --cfg rgb_noisydepth_latefusion \
    --weight_path $WEIGHT_PATH/SFMaskRCNN_LateFusion.tar 

Authors

Citation

If you use our work in a research project, please cite our work:

[1] @inproceedings{back2020segmenting,
  title={Segmenting unseen industrial components in a heavy clutter using rgb-d fusion and synthetic data},
  author={Back, Seunghyeok and Kim, Jongwon and Kang, Raeyoung and Choi, Seungjun and Lee, Kyoobin},
  booktitle={2020 IEEE International Conference on Image Processing (ICIP)},
  pages={828--832},
  year={2020},
  organization={IEEE}
}

sf-mask-rcnn's People

Contributors

seungback avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.