Giter Site home page Giter Site logo

hitori940101 / mask3d Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jonasschult/mask3d

0.0 0.0 0.0 17.31 MB

Mask3D predicts accurate 3D semantic instances achieving state-of-the-art on ScanNet, ScanNet200, S3DIS and STPLS3D.

License: MIT License

Shell 1.64% C++ 6.66% Python 82.24% Cuda 9.46%

mask3d's Introduction

Mask3D: Mask Transformer for 3D Instance Segmentation

Jonas Schult1, Francis Engelmann2,3, Alexander Hermans1, Or Litany4, Siyu Tang3, Bastian Leibe1

1RWTH Aachen University 2ETH AI Center 3ETH Zurich 4NVIDIA

Mask3D predicts accurate 3D semantic instances achieving state-of-the-art on ScanNet, ScanNet200, S3DIS and STPLS3D.

PWC PWC PWC PWC

PyTorch Lightning Config: Hydra

teaser



[Project Webpage] [Paper] [Demo]

News

  • 17. January 2023: Mask3D is accepted at ICRA 2023. ๐Ÿ”ฅ
  • 14. October 2022: STPLS3D support added.
  • 10. October 2022: Mask3D ranks 2nd on the STPLS3D Challenge hosted by the Urban3D Workshop at ECCV 2022.
  • 6. October 2022: Mask3D preprint released on arXiv.
  • 25. September 2022: Code released.

Code structure

We adapt the codebase of Mix3D which provides a highly modularized framework for 3D Semantic Segmentation based on the MinkowskiEngine.

โ”œโ”€โ”€ mix3d
โ”‚   โ”œโ”€โ”€ main_instance_segmentation.py <- the main file
โ”‚   โ”œโ”€โ”€ conf                          <- hydra configuration files
โ”‚   โ”œโ”€โ”€ datasets
โ”‚   โ”‚   โ”œโ”€โ”€ preprocessing             <- folder with preprocessing scripts
โ”‚   โ”‚   โ”œโ”€โ”€ semseg.py                 <- indoor dataset
โ”‚   โ”‚   โ””โ”€โ”€ utils.py        
โ”‚   โ”œโ”€โ”€ models                        <- Mask3D modules
โ”‚   โ”œโ”€โ”€ trainer
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ””โ”€โ”€ trainer.py                <- train loop
โ”‚   โ””โ”€โ”€ utils
โ”œโ”€โ”€ data
โ”‚   โ”œโ”€โ”€ processed                     <- folder for preprocessed datasets
โ”‚   โ””โ”€โ”€ raw                           <- folder for raw datasets
โ”œโ”€โ”€ scripts                           <- train scripts
โ”œโ”€โ”€ docs
โ”œโ”€โ”€ README.md
โ””โ”€โ”€ saved                             <- folder that stores models and logs

Dependencies ๐Ÿ“

The main dependencies of the project are the following:

python: 3.10.9
cuda: 11.3

You can set up a conda environment as follows

# Some users experienced issues on Ubuntu with an AMD CPU
# Install libopenblas-dev (issue #115, thanks WindWing)
# sudo apt-get install libopenblas-dev

export TORCH_CUDA_ARCH_LIST="6.0 6.1 6.2 7.0 7.2 7.5 8.0 8.6"

conda env create -f environment.yml

conda activate mask3d_cuda113

pip3 install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
pip3 install torch-scatter -f https://data.pyg.org/whl/torch-1.12.1+cu113.html
pip3 install 'git+https://github.com/facebookresearch/detectron2.git@710e7795d0eeadf9def0e7ef957eea13532e34cf' --no-deps

mkdir third_party
cd third_party

git clone --recursive "https://github.com/NVIDIA/MinkowskiEngine"
cd MinkowskiEngine
git checkout 02fc608bea4c0549b0a7b00ca1bf15dee4a0b228
python setup.py install --force_cuda --blas=openblas

cd ..
git clone https://github.com/ScanNet/ScanNet.git
cd ScanNet/Segmentator
git checkout 3e5726500896748521a6ceb81271b0f5b2c0e7d2
make

cd ../../pointnet2
python setup.py install

cd ../../
pip3 install pytorch-lightning==1.7.2

Data preprocessing ๐Ÿ”จ

After installing the dependencies, we preprocess the datasets.

ScanNet / ScanNet200

First, we apply Felzenswalb and Huttenlocher's Graph Based Image Segmentation algorithm to the test scenes using the default parameters. Please refer to the original repository for details. Put the resulting segmentations in ./data/raw/scannet_test_segments.

python -m datasets.preprocessing.scannet_preprocessing preprocess \
--data_dir="PATH_TO_RAW_SCANNET_DATASET" \
--save_dir="data/processed/scannet" \
--git_repo="PATH_TO_SCANNET_GIT_REPO" \
--scannet200=false/true

S3DIS

The S3DIS dataset contains some smalls bugs which we initially fixed manually. We will soon release a preprocessing script which directly preprocesses the original dataset. For the time being, please follow the instructions here to fix the dataset manually. Afterwards, call the preprocessing script as follows:

python -m datasets.preprocessing.s3dis_preprocessing preprocess \
--data_dir="PATH_TO_Stanford3dDataset_v1.2" \
--save_dir="data/processed/s3dis"

STPLS3D

python -m datasets.preprocessing.stpls3d_preprocessing preprocess \
--data_dir="PATH_TO_STPLS3D" \
--save_dir="data/processed/stpls3d"

Training and testing ๐Ÿš†

Train Mask3D on the ScanNet dataset:

python main_instance_segmentation.py

Please refer to the config scripts (for example here) for detailed instructions how to reproduce our results. In the simplest case the inference command looks as follows:

python main_instance_segmentation.py \
general.checkpoint='PATH_TO_CHECKPOINT.ckpt' \
general.train_mode=false

Trained checkpoints ๐Ÿ’พ

We provide detailed scores and network configurations with trained checkpoints.

S3DIS (pretrained on ScanNet train+val)

Following PointGroup, HAIS and SoftGroup, we finetune a model pretrained on ScanNet (config and checkpoint).

Dataset AP AP_50 AP_25 Config Checkpoint ๐Ÿ’พ Scores ๐Ÿ“ˆ Visualizations ๐Ÿ”ญ
Area 1 69.3 81.9 87.7 config checkpoint scores visualizations
Area 2 44.0 59.5 66.5 config checkpoint scores visualizations
Area 3 73.4 83.2 88.2 config checkpoint scores visualizations
Area 4 58.0 69.5 74.9 config checkpoint scores visualizations
Area 5 57.8 71.9 77.2 config checkpoint scores visualizations
Area 6 68.4 79.9 85.2 config checkpoint scores visualizations

S3DIS (from scratch)

Dataset AP AP_50 AP_25 Config Checkpoint ๐Ÿ’พ Scores ๐Ÿ“ˆ Visualizations ๐Ÿ”ญ
Area 1 74.1 85.1 89.6 config checkpoint scores visualizations
Area 2 44.9 57.1 67.9 config checkpoint scores visualizations
Area 3 74.4 84.4 88.1 config checkpoint scores visualizations
Area 4 63.8 74.7 81.1 config checkpoint scores visualizations
Area 5 56.6 68.4 75.2 config checkpoint scores visualizations
Area 6 73.3 83.4 87.8 config checkpoint scores visualizations
Dataset AP AP_50 AP_25 Config Checkpoint ๐Ÿ’พ Scores ๐Ÿ“ˆ Visualizations ๐Ÿ”ญ
ScanNet val 55.2 73.7 83.5 config checkpoint scores visualizations
ScanNet test 56.6 78.0 87.0 config checkpoint scores visualizations
Dataset AP AP_50 AP_25 Config Checkpoint ๐Ÿ’พ Scores ๐Ÿ“ˆ Visualizations ๐Ÿ”ญ
ScanNet200 val 27.4 37.0 42.3 config checkpoint scores visualizations
ScanNet200 test 27.8 38.8 44.5 config checkpoint scores visualizations
Dataset AP AP_50 AP_25 Config Checkpoint ๐Ÿ’พ Scores ๐Ÿ“ˆ Visualizations ๐Ÿ”ญ
STPLS3D val 57.3 74.3 81.6 config checkpoint scores visualizations
STPLS3D test 63.4 79.2 85.6 config checkpoint scores visualizations

BibTeX ๐Ÿ™

@article{Schult23ICRA,
  title     = {{Mask3D: Mask Transformer for 3D Semantic Instance Segmentation}},
  author    = {Schult, Jonas and Engelmann, Francis and Hermans, Alexander and Litany, Or and Tang, Siyu and Leibe, Bastian},
  booktitle = {{International Conference on Robotics and Automation (ICRA)}},
  year      = {2023}
}

mask3d's People

Contributors

jonasschult avatar francisengelmann avatar eltociear avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.