Giter Site home page Giter Site logo

deepphysicvision / occformer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zhangyp15/occformer

0.0 0.0 0.0 60.81 MB

[ICCV 2023] OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

Home Page: https://arxiv.org/abs/2304.05316

License: Apache License 2.0

Shell 0.67% C++ 5.00% Python 90.73% CSS 0.01% Cuda 3.47% Makefile 0.03% Batchfile 0.04% Jupyter Notebook 0.05% Dockerfile 0.02%

occformer's Introduction

OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

News

  • [2023/04/20] Update more pretrained weights.
  • [2023/04/12] Paper is on Arxiv.
  • [2023/04/11] Code and demo release.

Introduction

The vision-based perception for autonomous driving has undergone a transformation from the bird-eye-view (BEV) representations to the 3D semantic occupancy. Compared with the BEV planes, the 3D semantic occupancy further provides structural information along the vertical direction. This paper presents OccFormer, a dual-path transformer network to effectively process the 3D volume for semantic occupancy prediction. OccFormer achieves a long-range, dynamic, and efficient encoding of the camera-generated 3D voxel features. It is obtained by decomposing the heavy 3D processing into the local and global transformer pathways along the horizontal plane. For the occupancy decoder, we adapt the vanilla Mask2Former for 3D semantic occupancy by proposing preserve-pooling and class-guided sampling, which notably mitigate the sparsity and class imbalance. Experimental results demonstrate that OccFormer significantly outperforms existing methods for semantic scene completion on SemanticKITTI dataset and for LiDAR semantic segmentation on nuScenes dataset.

framework

Demo

nuScenes:

demo legend

SemanticKITTI:

demo

Benchmark Results

LiDAR Segmentation on nuScenes test set: nusc_test Semantic Scene Completion on SemanticKITTI test set: kitti_test

Getting Started

[1] Check installation for installation. Our code is mainly based on mmdetection3d.

[2] Check data_preparation for preparing SemanticKITTI and nuScenes datasets.

[3] Check train_and_eval for training and evaluation.

[4] Check predict_and_visualize for prediction and visualization.

[5] Check test_submission for preparing the test submission to SemanticKITTI SSC and nuScenes LiDAR Segmentation.

Model Zoo

We provide the pretrained weights on SemanticKITTI and nuScenes datasets, reproduced with the released codebase.

Dataset Backbone SC IoU SSC mIoU LiDARSeg mIoU Model Weights Training Logs
SemanticKITTI EfficientNetB7 36.42(val), 34.46(test) 13.50(val), 12.37(test) - Link Link
nuScenes R50 - - 68.1 Link Link
nuScenes R101-DCN - - 70.0 Link Link

For SemanticKITTI dataset, the validation performance may fluctuate around 13.2 ~ 13.6 (SSC mIoU) considering the limited training samples.

Related Projects

TPVFormer: Tri-perspective view (TPV) representation for 3D semantic occupancy.

OpenOccupancy: A large scale benchmark extending nuScenes for surrounding semantic occupancy perception.

Acknowledgement

This project is developed based on the following open-sourced projects: MonoScene, BEVDet, BEVFormer, Mask2Former. Thanks for their excellent work.

Citation

If you find this project helpful, please consider giving this repo a star or citing the following paper:

@article{zhang2023occformer,
  title={OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction},
  author={Zhang, Yunpeng and Zhu, Zheng and Du, Dalong},
  journal={arXiv preprint arXiv:2304.05316},
  year={2023}
}

occformer's People

Contributors

zhangyp15 avatar yunpengzhangphigent avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.