Giter Site home page Giter Site logo

firestonelib / bevperception-survey-recipe Goto Github PK

View Code? Open in Web Editor NEW

This project forked from opendrivelab/birds-eye-view-perception

0.0 0.0 0.0 20.91 MB

Awesome BEV perception papers and cookbook for achieving SOTA results

License: Apache License 2.0

Shell 0.01% Python 25.58% Jupyter Notebook 74.41%

bevperception-survey-recipe's Introduction

BEVPerception-Survey-Recipe

Awesome BEV perception papers and toolbox for achieving SOTA results. 🀝Fundamental Vision

Table of contents

Introduction

This repo is associated with the survey paper "Delving into the Devils of Bird’s-eye-view Perception: A Review, Evaluation and Recipe", which provides an up-to-date literature survey for BEVPercption and an open source BEV toolbox based on PyTorch. In the literature survey, it includes different modalities (camera, lidar and fusion) and tasks (detection and segmentation). As for the toolbox, it provides useful recipe for BEV camera-based 3D object detection, including solid data augmentation strategies, efficient BEV encoder design, loss function family, useful test-time augmentation, ensemble policy, and so on. We hope this repo can not only be a good starting point for new beginners but also help current researchers in the BEV perception community.

Currently, the BEV perception community is very active and growing fast. There are also some good repos of BEV Perception, e.g.

  • BEVFormer . A cutting-edge baseline for camera-based detection via spatiotemporal transformers.
  • BEVDet . Official codes for the camera-based detection methods - BEVDet series, including BEVDet, BEVDet4D and BEVPoolv2.
  • PETR . Implicit BEV representation for camera-based detection and Segmentation, including PETR and PETRv2.
  • BEVDepth . Official codes for the BEVDepth and BEVStereo, which use LiDAR or temporal stereo to enhance depth estimation.
  • Lift-splat-shoot . Implicitly Unprojecting camera image features to 3D for the segmentation task.
  • BEVFusion (MIT) . Unifies camera and LiDAR features in the shared bird's-eye view (BEV) representation space for the detection and map segmentation tasks.
  • BEVFusion (ADLab) . A simple and robust LiDAR-Camera fusion framework for the detection task.

Major Features

  • Up-to-date Literature Survey for BEV Perception
    We summarized important methods in recent years about BEV perception including different modalities and tasks.
  • Convenient BEVPerception Toolbox
    We integrate bag of tricks in the BEV toolbox that help us achieve 1st in the camera-based detection track of the Waymo Open Challenge 2022, which can be grouped as four types -- data augmentation, design of BEV encoder, loss family and post-process policy. This toolbox can be used indedependly or as a plug-in for mmdet3d and detectron2.
Bag of Tricks
Multiple View Data Augmentation BEV encoder Loss & Heads family Post-Process
TBA TBA
  • Test-time Augmentation
  • Weighted Box Fusion
  • Two-stage Ensemble
  • Support Waymo Open Dataset (WOD) for camera-only detection
    We provide a suitable playground for new-beginners in this area, including hands-on tutorial and small-scale dataset (1/5 WOD in kitti format) to validate idea.

What's New

v0.1 was released in 10/13/2022.

  • Integrate some practical data augmentation methods for BEV camera-based 3D detection in the toolbox.
  • Offer a pipeline to process the Waymo dataset (camera-based 3D detection).
  • Release a baseline (with config) for Waymo dataset and also 1/5 Waymo dataset in Kitti format.

Please refer to changelog.md for details and release history.

Literature Survey

The general picture of BEV perception at a glance, where consists of three sub-parts based on the input modality. BEV perception is a general task built on top of a series of fundamental tasks. For better completeness of the whole perception algorithms in autonomous driving, we list other topics as well. More detail can be found in the survey paper.

We have summarized important datasets and methods in recent years about BEV perception in academia and also different roadmaps used in industry.

We have also summarized some conventional methods for different tasks.

BEV Toolbox

Get Started

Installation

pip install numpy opencv-python
pip install bev-toolbox

A simple example

We provide an example with a sample from Waymo dataset to introduce the usage of this toolbox.

import cv2
import numpy as np
from bev_toolbox.data_aug import RandomScaleImageMultiViewImage

# Declare an augmentation pipeline
transform = RandomScaleImageMultiViewImage(scales=[0.9, 1.0, 1.1])

# multiple-view images
imgs = [cv2.imread(f'example/cam{i}_img.jpg') for i in range(5)]
# intrinsic parameters of cameras
cam_intr = [np.load(f'example/cam{i}_intrinsic.npy') for i in range(5)]
# extrinsic parameters of cameras
cam_extr = [np.load(f'example/cam{i}_extrinsic.npy') for i in range(5)]
# transformations from lidar to image
lidar2img = [np.load(f'example/cam{i}_lidar2img.npy') for i in range(5)]

# Augment an image
imgs_new, cam_intr_new, lidar2img_new = transform(imgs, cam_intr, cam_extr, lidar2img)

For more details like the coordinate systems or visualization, please refer to example.md

Use BEV toolbox with mmdet3d

We provide wrappers of this BEV toolbox for mmdet3d and detectron2.

  1. Add the following code to train_video.py or test_video.py.
from bev_toolbox.init_toolbox import init_toolbox_mmdet3d
init_toolbox_mmdet3d()
  1. Use functions in the toolbox just like mmdet3d. For example, you can just add RandomScaleImageMultiViewImage to the configure file.
train_pipeline = [
    ...
    dict(type='RandomScaleImageMultiViewImage', scales=[0.9, 1.0, 1.1]),
    ...
]

Use BEV-toolbox with detectron2

We plan to make this toolbox compatible with detectron2 in the future.

Playground on Waymo

We provide a suitable playground on the Waymo dataset, including hands-on tutorial and small-scale dataset (1/5 WOD in kitti format) to validate idea.

Setup

Please refer to waymo_setup.md about how to run experiments on Waymo.

Config with Performance

We provide the improvement of each trick compared with the baseline on the Waymo validation set. All the models are trained with 1/5 training data of Waymo v1.3 which is represented as Waymo mini here. It's worthy noting that the results were run on data with png format. We are revalidating these results on the data with jpg format. So, the actual performance may be different.

βœ“: DONE, ☐: TODO.

Backbone Head Train data Trick and corresponding config LET-mAPL LET-mAPH L1/mAPH (Car) Status
ResNet101 DETR Waymo mini Baseline 34.9 46.3 25.5 βœ“
ResNet101 DETR Waymo mini Multi-scale resize, Flip 35.6 46.9 26.8 βœ“
ResNet101 DETR Waymo mini Conv offset in TSA 35.9 48.1 25.6 ☐
ResNet101 DETR Waymo mini Deformable view encoder 36.1 48.1 25.9 ☐
ResNet101 DETR Waymo mini Corner pooling 35.6 46.9 26.0 ☐
ResNet101 DETR Waymo mini 2x BEV scale] - - 25.5 ☐
ResNet101 DETR Waymo mini Sync BN - - 25.5 ☐
ResNet101 DETR Waymo mini EMA - - 25.6 ☐
ResNet101 DETR Waymo mini 2d auxiliary loss 35.3 47.4 24.6 ☐
ResNet101 DETR Waymo mini 2d auxiliary loss, Learnable loss weight 36.2 48.1 25.4 ☐
ResNet101 DETR Waymo mini Smooth L1 loss - - 26.2 ☐
ResNet101 DETR Waymo mini Label smoothing 36.0 46.7 - ☐

Ongoing Features

Literature Survey

  • Add new papers.

BEV toolbox

  • Data augmentation methods for BEV perception
    • Random horizontal flip
    • Random scale
    • Grid mask
    • New data augmentation
  • Integrate more tricks
    • Post-process
      • Test-time Augmentation
      • Weighted Box Fusion
      • Two-stage Ensemble
    • BEV Encoder
    • Loss Family
  • Add Visualization in BEV
  • Improve the current implementations.
  • Add documentation to introduce the APIs of the toolbox

License

This project is released under the Apache 2.0 license.

Citation

If you find this project useful in your research, please consider cite:

@article{li2022bevsurvey,
  title={Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe},
  author={Li, Hongyang and Sima, Chonghao and Dai, Jifeng and Wang, Wenhai and Lu, Lewei and Wang, Huijie and Xie, Enze and Li, Zhiqi and Deng, Hanming and Tian, Hao and Zhu, Xizhou and Chen, Li and Gao, Yulu and Geng, Xiangwei and Zeng, Jia and Li, Yang and Yang, Jiazhi and Jia, Xiaosong and Yu, Bohan and Qiao, Yu and Lin, Dahua and Liu, Si and Yan, Junchi and Shi, Jianping and Luo, Ping},
  journal={arXiv preprint arXiv:2209.05324},
  year={2022}
}
@misc{bevtoolbox2022,
  title={{BEVPerceptionx-Survey-Recipe} toolbox for general BEV perception},
  author={BEV-Toolbox Contributors},
  howpublished={\url{https://github.com/OpenPerceptionX/BEVPerception-Survey-Recipe}},
  year={2022}
}

Acknowledgement

Many thanks to these excellent open source projects and also the stargazers and forkers:

↳ Stargazers

Stargazers repo roster for @OpenPerceptionX/BEVPerception-Survey-Recipe

↳ Forkers

Forkers repo roster for @OpenPerceptionX/BEVPerception-Survey-Recipe

bevperception-survey-recipe's People

Contributors

chonghaosima avatar cyberknight42 avatar eloiz avatar faikit avatar henryjunw avatar hli2020 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.