Giter Site home page Giter Site logo

mcg-nju / sparsebev Goto Github PK

View Code? Open in Web Editor NEW
323.0 9.0 23.0 787 KB

[ICCV 2023] SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos

Home Page: https://arxiv.org/abs/2308.09244

License: MIT License

Python 85.01% C++ 5.23% Cuda 9.75%
3d-object-detection autonomous-driving bev-perception transformer

sparsebev's Introduction

SparseBEV

PWC

This is the official PyTorch implementation for our ICCV 2023 paper:

SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos
Haisong Liu, Yao Teng, Tao Lu, Haiguang Wang, Limin Wang
Nanjing University, Shanghai AI Lab

中文解读:https://zhuanlan.zhihu.com/p/654821380

News

Model Zoo

Setting Pretrain Training Cost NDSval NDStest FPS Weights
r50_nuimg_704x256 nuImg 21h (8x2080Ti) 55.6 - 15.8 gdrive
r50_nuimg_704x256_400q_36ep nuImg 28h (8x2080Ti) 55.8 - 23.5 gdrive
r101_nuimg_1408x512 nuImg 2d8h (8xV100) 59.2 - 6.5 gdrive
vov99_dd3d_1600x640_trainval_future DD3D 4d1h (8xA100) 84.9 67.5 - gdrive
vit_eva02_1600x640_trainval_future EVA02 11d (8xA100) 85.3 70.2 - gdrive
  • We use r50_nuimg_704x256 for ablation studies and r50_nuimg_704x256_400q_36ep for comparison with others.
  • We recommend using r50_nuimg_704x256 to validate new ideas since it trains faster and the result is more stable.
  • FPS is measured with AMD 5800X CPU and RTX 3090 GPU (without fp16).
  • The noise is around 0.3 NDS.

Environment

Install PyTorch 2.0 + CUDA 11.8:

conda create -n sparsebev python=3.8
conda activate sparsebev
conda install pytorch==2.0.0 torchvision==0.15.0 pytorch-cuda=11.8 -c pytorch -c nvidia

or PyTorch 1.10.2 + CUDA 10.2 for older GPUs:

conda create -n sparsebev python=3.8
conda activate sparsebev
conda install pytorch==1.10.2 torchvision==0.11.3 cudatoolkit=10.2 -c pytorch

Install other dependencies:

pip install openmim
mim install mmcv-full==1.6.0
mim install mmdet==2.28.2
mim install mmsegmentation==0.30.0
mim install mmdet3d==1.0.0rc6
pip install setuptools==59.5.0
pip install numpy==1.23.5

Install turbojpeg and pillow-simd to speed up data loading (optional but important):

sudo apt-get update
sudo apt-get install -y libturbojpeg
pip install pyturbojpeg
pip uninstall pillow
pip install pillow-simd==9.0.0.post1

Compile CUDA extensions:

cd models/csrc
python setup.py build_ext --inplace

Prepare Dataset

  1. Download nuScenes from https://www.nuscenes.org/nuscenes and put it in data/nuscenes.
  2. Download the generated info file from Google Drive and unzip it.
  3. Folder structure:
data/nuscenes
├── maps
├── nuscenes_infos_test_sweep.pkl
├── nuscenes_infos_train_sweep.pkl
├── nuscenes_infos_train_mini_sweep.pkl
├── nuscenes_infos_val_sweep.pkl
├── nuscenes_infos_val_mini_sweep.pkl
├── samples
├── sweeps
├── v1.0-test
└── v1.0-trainval

These *.pkl files can also be generated with our script: gen_sweep_info.py.

Training

Download pretrained weights and put it in directory pretrain/:

pretrain
├── cascade_mask_rcnn_r101_fpn_1x_nuim_20201024_134804-45215b1e.pth
├── cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth

Train SparseBEV with 8 GPUs:

torchrun --nproc_per_node 8 train.py --config configs/r50_nuimg_704x256.py

Train SparseBEV with 4 GPUs (i.e the last four GPUs):

export CUDA_VISIBLE_DEVICES=4,5,6,7
torchrun --nproc_per_node 4 train.py --config configs/r50_nuimg_704x256.py

The batch size for each GPU will be scaled automatically. So there is no need to modify the batch_size in config files.

Evaluation

Single-GPU evaluation:

export CUDA_VISIBLE_DEVICES=0
python val.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth

Multi-GPU evaluation:

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
torchrun --nproc_per_node 8 val.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth

Timing

FPS is measured with a single GPU:

export CUDA_VISIBLE_DEVICES=0
python timing.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth

Visualization

Visualize the predicted bbox:

python viz_bbox_predictions.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth

Visualize the sampling points (like Fig. 6 in the paper):

python viz_sample_points.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth

Acknowledgements

Many thanks to these excellent open-source projects:

sparsebev's People

Contributors

afterthat97 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sparsebev's Issues

Where in the code do you handle ego motion?

Dear author
Wonderful work! Where in the code do you handle ego motion? I can't find it.
It seems that after moving the sampling points with velocity, it directly uses the lidar2img matrix to project points

Question about merging SparseBEV into the MMDection3D Framework.

I am attempting to merge SparseBEV into the MMDection3D Framework and am training the model using the MMDection3D command:
bash tools/dist_train.sh configs/sparsebev/r50_nuimg_704x256.py 8
However, I've encountered an issue where the r50_nuimg_704x256.py configuration consumes over 48GB of GPU memory. Additionally, after I shorten the timing fusion length. There's a significant variation in memory usage across the GPUs. After training for 24 epochs, the NDS remains very low. I would like to inquire about the expected GPU memory usage for training with this configuration on eight GPUs and whether there might be an error in my approach to merging and training the model.

Deployment questions

Hi, Awesome work! I was exploring to do inference using TensorRT and was wondering if the layers are supported or if you guys had any plugins if you have tested it already!

Thanks.

Joint BEV Segmentation and 3D detection with SparseBEV

Hi SparseBEV authors,
Thank you for your excellent work and for releasing all (including test) models.

I wanted to do a joint detection and BEV segmentation on the SparseBEV framework similar to BEVerse and BEVFormer. BEVerse and BEVFormer take the dense BEV features and attach heads such as BEV Segmentation heads to achieve this goal. SparseBEV is a query-based pipeline; therefore, the outputs are also query-based.

Could you explain how I should extend SparseBEV for the BEV segmentation task? Please note that I do not need a working implementation from you. Suggestions/directions are fine. I can implement the stuff on my end :)

PS- My initial thought is to use the latent query_features from the SparseBEVTransformerDecoderLayer available at this line. However, the dimensions of query_feat is B x Q x C and not something with BEV width and BEV height.

Result differences

Why is it that the NDS of the model I trained using the "r50_nuimg_704x256" configuration file is only 53.5, while the one you wrote on Github is 55.6? Is this caused by unstable model training?

Cuda out of memory

I run inference use single A100 40G, at the begining it will work fine but the cuda out of memory will occur , could you please give me some advice to avoid the GPU usage growing up?

Online Inference

Thanks for sharing this great work!
I noticed that simple_test_online heavily depends on filename key in the input data dict. And uses it to check whether the image is already in the memory or not if I correclty understood this part of code. I wonder how to implement this for online streaming (i. e. video stream from mulitple cameras) without saving the frames into files. Is it costy to get rid of filenames in this function?

Thanks

关于三个模块

关于三个模块SASA, Spatio-temporal Sampling and Adaptive Mixing的实现

visualize

Could someone tell me how to visualize the results?

Pretrain weights download

Hi, I see the the part of Training says:

Download pretrained weights and put it in directory pretrain/:

pretrain
├── cascade_mask_rcnn_r101_fpn_1x_nuim_20201024_134804-45215b1e.pth
├── cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth

Where should I download these weights? Are there correspondence between these two weights and the four weights in Model Zoo

config problems

Which experimental configuration in the paper corresponds to the configuration file "r50_nuimg_704x256. py"?
image

Result differences

Why is it that the NDS of the model I trained using the "r50_nuimg_704x256" configuration file is only 44.83, but map is 44.47

SparseOcc questions

Hi, @afterthat97. SparseOcc is a wonderful work and I expect to expand the method. Do you have a timetable to release the code?
Looking forward to your reply.
Best wishes!

Change backbone

Hello, I use the swin as model's backbone, and I find that enen swin-t can not support to run . I use 4 A100 gpus and set batch size 4.

Question about absolute velo

Thanks for open-source great work!
I have a question about absolute velo computation

# calculate absolute velocity according to time difference
time_diff = img_metas[0]['time_diff'] # [B, F]
if time_diff.shape[1] > 1:
time_diff = time_diff.clone()
time_diff[time_diff < 1e-5] = 1.0
bbox_pred[..., 8:] = bbox_pred[..., 8:] / time_diff[:, 1:2, None]

Why I have to div the velo with time here? Thanks a lot!

CUDA out of menory when inference

Hello, I train the model and when I inference with the nuscenes v1.0-test, cuda out of memory, but the memory is enough fo r training, could you please help me ?

Question for the usage of temporal information in the paper

Hi! I am very interested in SparseBEV. Please point out my mistake, if I misunderstand the paper.

There are three components: SASA, Spatio-temporal Sampling and Adaptive Mixing. The first two components focus on spatial information and only Adaptive Mixing deals with temporal information.

Because in Spatio-temporal Sampling, the features maintain the dimension of Frames without interaction across frames.

Looking forward to your reply! Thank you!

loss problems

I have trained more than 10 epochs, and the loss value is around 10. Is this normal?

CUDA out of memory

Hello, I use your vov pretrained model ,and try to evaluate it with 4 A100, it reminds me that cuda out of memory, could please help me ?

pretrained weight

Thanks for your great work!

I want to train the model according to your Training steps, but there is no download link of pretrained weights . Can your provide one?

Evaluation Error

Hi, I'm trying to run the evaluation on the code but getting the below error:-

[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 28130/28130, 2.0 task/s, elapsed: 14405s, ETA: 0s
Formating bboxes of pts_bbox
Start to convert detection format...
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 28130/28130, 36.3 task/s, elapsed: 775s, ETA: 0ss
Results writes to submission/pts_bbox/results_nusc.json
Evaluating bboxes of pts_bbox
Traceback (most recent call last):
File "val.py", line 141, in
main()
File "val.py", line 137, in main
evaluate(val_dataset, results, -1)
File "val.py", line 20, in evaluate
metrics = dataset.evaluate(results, jsonfile_prefix='submission')
File "/netscratch/user/miniforge3/envs/sparsebev_torch_1_9/lib/python3.8/site-packages/mmdet3d/datasets/nuscenes_dataset.py", line 509, in evaluate
ret_dict = self._evaluate_single(result_files[name])
File "/netscratch/user/miniforge3/envs/sparsebev_torch_1_9/lib/python3.8/site-packages/mmdet3d/datasets/nuscenes_dataset.py", line 399, in _evaluate_single
nusc_eval = NuScenesEval(
File "/netscratch/user/miniforge3/envs/sparsebev_torch_1_9/lib/python3.8/site-packages/nuscenes/eval/detection/evaluate.py", line 84, in init
assert set(self.pred_boxes.sample_tokens) == set(self.gt_boxes.sample_tokens),
AssertionError: Samples in split doesn't match samples in predictions.
(sparsebev_torch_1_9) user@xxxx:~/SparseBEV$

I'm running validation with these options:-
--config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth

why there is a re-implementation of gradient checkpointing

HI @afterthat97, thanks for your awesome work. I noticed there is a checkpoint.py file that seems to re-implement torch.utils.checkpoint. What is the motivation for that?

Furthermore, could you please provide your reference pytorch implementation version so that I could get a quick diff (since pytorch is rapidly developing, the specific implementation may vary from version to version)? Thanks.

scale weights shape

Thank you for your pretty work. But
Why the shape of scale_weights is diff with sample_points_cam.
scale_weights [BGT, Q, P, 4]
sample_points_cam [BTG, Q, P, 3]

    # reorganize the tensor to stack T and G to the batch dim for better parallelism
    sample_points_cam = sample_points_cam.reshape(B, T, Q, G, P, 1, 3)
    sample_points_cam = sample_points_cam.permute(0, 1, 3, 2, 4, 5, 6)  # [B, T, G, Q, P, 1, 3]
    sample_points_cam = sample_points_cam.reshape(B*T*G, Q, P, 3)
    
    # reorganize the tensor to stack T and G to the batch dim for better parallelism
    scale_weights = scale_weights.reshape(B, Q, G, T, P, -1)
    scale_weights = scale_weights.permute(0, 2, 3, 1, 4, 5)
    scale_weights = scale_weights.reshape(B*G*T, Q, P, -1)

run error

When I try to start training, it returns the data load error in the picture.
1c6f6a0008b354a85b056f6c78adc6e

训练加速

请问
1.这一步 [Install turbojpeg and pillow-simd to speed up data loading (optional but important)] 配置与否有多大的影响呢?

2.每个iteration需要的time变化很大,有的只要0.1s,有的要60s,这是正常的吗?

非常感谢!🙏

training problem

Can I resume training from the point where the model training was interrupted?

Question about more frames

Hi! Sorry to bother you again~

I tried to input more frames (16 frames instead of 8 frames) on the ImageNet-pre-trained r50 model, but it resulted in worse results (40.5 mAP and 51.6 NDS). I just set the num_frames = 16 in the config. Is that something wrong?

Looking forward to your reply! Thank you!

还有一个问题

我看adamixing里面的6层参数不共享,你这个参数共享,参数不共享对性能有多大提升,你肯定跑过不共享的模型把

python setup.py build_ext --inplace 报错

感谢优秀的工作!
我按照GitHub的environment部分进行环境配置,选择的是PyTorch 1.10.2 + CUDA 10.2 for older GPUs版本,最后在Compile CUDA extensions部分报错,似乎是因PyTorch 1.10.2不兼容的原因?
Traceback (most recent call last):
File "/opt/conda/envs/sparsebev/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1717, in _run_ninja_build
subprocess.run(
File "/opt/conda/envs/sparsebev/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "setup.py", line 19, in
setup(
File "/opt/conda/envs/sparsebev/lib/python3.8/site-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/opt/conda/envs/sparsebev/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/opt/conda/envs/sparsebev/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/opt/conda/envs/sparsebev/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/opt/conda/envs/sparsebev/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/opt/conda/envs/sparsebev/lib/python3.8/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/opt/conda/envs/sparsebev/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 735, in build_extensions
build_ext.build_extensions(self)
File "/opt/conda/envs/sparsebev/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/opt/conda/envs/sparsebev/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "/opt/conda/envs/sparsebev/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
_build_ext.build_extension(self, ext)
File "/opt/conda/envs/sparsebev/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
objects = self.compiler.compile(sources,
File "/opt/conda/envs/sparsebev/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 556, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/opt/conda/envs/sparsebev/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1399, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/opt/conda/envs/sparsebev/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

It seems that use the split val dataset for training besides the original split trainset under the nuscenes leaderboard submission config setting ?

Such as vov99_dd3d_1600x640_trainval_future .py ,I want to know that how about the results without the validation set for training . Is there any performance great drop?

data = dict(
train=dict(
ann_file=['data/nuscenes/nuscenes_infos_train_sweep.pkl',
'data/nuscenes/nuscenes_infos_val_sweep.pkl'],
pipeline=train_pipeline),

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.