Giter Site home page Giter Site logo

jackzhousz / bevfusion Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mit-han-lab/bevfusion

0.0 0.0 0.0 8.91 MB

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

Home Page: https://bevfusion.mit.edu

License: Apache License 2.0

Shell 0.04% C++ 18.03% Python 69.58% Cuda 12.35%

bevfusion's Introduction

BEVFusion

PWC PWC

website | paper | video

demo

News

If you are interested in getting updates, please sign up here to get notified!

  • (2022/6/3) BEVFusion ranks first on nuScenes among all solutions.
  • (2022/6/3) We released the first version of BEVFusion (with pre-trained checkpoints and evaluation).
  • (2022/5/26) BEVFusion is released on arXiv.
  • (2022/5/2) BEVFusion ranks first on nuScenes among all solutions that do not use test-time augmentation and model ensemble.

Abstract

Multi-sensor fusion is essential for an accurate and reliable autonomous driving system. Recent approaches are based on point-level fusion: augmenting the LiDAR point cloud with camera features. However, the camera-to-LiDAR projection throws away the semantic density of camera features, hindering the effectiveness of such methods, especially for semantic-oriented tasks (such as 3D scene segmentation). In this paper, we break this deeply-rooted convention with BEVFusion, an efficient and generic multi-task multi-sensor fusion framework. It unifies multi-modal features in the shared bird's-eye view (BEV) representation space, which nicely preserves both geometric and semantic information. To achieve this, we diagnose and lift key efficiency bottlenecks in the view transformation with optimized BEV pooling, reducing latency by more than 40x. BEVFusion is fundamentally task-agnostic and seamlessly supports different 3D perception tasks with almost no architectural changes. It establishes the new state of the art on the nuScenes benchmark, achieving 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.9x lower computation cost.

Results

3D Object Detection (on nuScenes test)

Model Modality mAP NDS
BEVFusion-e C+L 74.99 76.09
BEVFusion C+L 70.23 72.88

3D Object Detection (on nuScenes validation)

Model Modality mAP NDS Checkpoint
BEVFusion C+L 68.39 71.32 Link
Camera-Only Baseline C 33.25 40.15 Link
LiDAR-Only Baseline L 64.68 69.28 Link

Note: The camera-only object detection baseline is a variant of BEVDet-Tiny with a much heavier view transformer and other differences in hyperparameters. Thanks to our efficient BEV pooling operator, this model runs fast and has higher mAP than BEVDet-Tiny under the same input resolution. Please refer to BEVDet repo for the original BEVDet-Tiny implementation. The LiDAR-only baseline is TransFusion-L.

BEV Map Segmentation (on nuScenes validation)

Model Modality mIoU Checkpoint
BEVFusion C+L 62.69 Link
Camera-Only Baseline C 56.56 Link
LiDAR-Only Baseline L 48.56 Link

Usage

Prerequisites

The code is built with following libraries:

After installing these dependencies, please run this command to install the codebase:

python setup.py develop

Data Preparation

nuScenes

Please follow the instructions from here to download and preprocess the nuScenes dataset. Please remember to download both detection dataset and the map extension (for BEV map segmentation). After data preparation, you will be able to see the following directory structure (as is indicated in mmdetection3d):

mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│   ├── nuscenes
│   │   ├── maps
│   │   ├── samples
│   │   ├── sweeps
│   │   ├── v1.0-test
|   |   ├── v1.0-trainval
│   │   ├── nuscenes_database
│   │   ├── nuscenes_infos_train.pkl
│   │   ├── nuscenes_infos_val.pkl
│   │   ├── nuscenes_infos_test.pkl
│   │   ├── nuscenes_dbinfos_train.pkl
│   │   ├── nuscenes_infos_train_mono3d.coco.json
│   │   ├── nuscenes_infos_val_mono3d.coco.json
│   │   ├── nuscenes_infos_test_mono3d.coco.json

Evaluation

We also provide instructions for evaluating our pretrained models. Please download the checkpoints using the following script:

./tools/download_pretrained.sh

Then, you will be able to run:

torchpack dist-run -np 8 python tools/test.py [config file path] pretrained/[checkpoint name].pth --eval [evaluation type]

For example, if you want to evaluate the detection variant of BEVFusion, you can try:

torchpack dist-run -np 8 python tools/test.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml pretrained/bevfusion-det.pth --eval bbox

While for the segmentation variant of BEVFusion, this command will be helpful:

torchpack dist-run -np 8 python tools/test.py configs/nuscenes/seg/fusion-bev256d2-lss.yaml pretrained/bevfusion-seg.pth --eval map

FAQs

Q: Can we directly use the info files prepared by mmdetection3d?

A: We recommend re-generating the info files using this codebase since we forked mmdetection3d before their coordinate system refactoring.

Acknowledgements

BEVFusion is based on mmdetection3d. It is also greatly inspired by the following outstanding contributions to the open-source community: LSS, BEVDet, TransFusion, CenterPoint, MVP, FUTR3D, CVT and DETR3D.

Please also check out related papers in the camera-only 3D perception community such as BEVDet4D, BEVerse, BEVFormer, M2BEV, PETR and PETRv2, which might be interesting future extensions to BEVFusion.

Citation

If BEVFusion is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

@article{liu2022bevfusion,
  title={BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation},
  author={Liu, Zhijian and Tang, Haotian and Amini, Alexander and Yang, Xingyu and Mao, Huizi and Rus, Daniela and Han, Song},
  journal={arXiv},
  year={2022}
}

bevfusion's People

Contributors

kentang-mit avatar yuanxianh avatar zhijian-liu avatar zhiqi-li avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.