megvii-research / petr Goto Github PK

[ECCV2022] PETR: Position Embedding Transformation for Multi-View 3D Object Detection & [ICCV2023] PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images

License: Other

Python 99.80% Shell 0.20%

multi-camera multi-task-learning object-detection segmentation 3d-position-embedding

petr's Introduction

[ECCV2022] Position Embedding Transformation for Multi-View 3D Object Detection

[ICCV2023] PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images

This repository is an official implementation of PETR and PETRv2. The flash attention version can be find from the "flash" branch.

PETR develops position embedding transformation (PETR) for multi-view 3D object detection. PETR encodes the position information of 3D coordinates into image features, producing the 3D position-aware features. Object query can perceive the 3D position-aware features and perform end-to-end object detection. It can serve as a simple yet strong baseline for future research.

PETRv2 is a unified framework for 3D perception from multi-view images. Based on PETR, PETRv2 explores the effectiveness of temporal modeling, which utilizes the temporal information of previous frames to boost 3D object detection. The 3D PE achieves the temporal alignment on object position of different frames. A feature-guided position encoder is further introduced to improve the data adaptability of 3D PE. To support for high-quality BEV segmentation, PETRv2 provides a simply yet effective solution by adding a set of segmentation queries. Each segmentation query is responsible for segmenting one specific patch of BEV map. PETRv2 achieves state-of-the-art performance on 3D object detection and BEV segmentation.

News

2023.10.11 The 3D lane detection of PETRv2 has been released on TopoMLP. It support openlanev2 and won the 1st place in CVPR2023 workshop!.
2023.01.25 Our multi-view 3D detection framework StreamPETR (63.6% NDS and 55.0% mAP)** without TTA and future frames.
2023.01.04 Our multi-modal detection framework CMT is released on arxiv.
2022.11.04 The code of multi-scale improvement in PETRv2 is released.
2022.09.21 The code of query denoise improvement in PETRv2 is released.
2022.09.04 PETRv2 with VoVNet backbone and multi-scale achieves (59.1% NDS and 50.8% mAP).
2022.08.11 PETRv2 with GLOM-like backbone and query denoise achieves (59.2% NDS and 51.2% mAP) without extra data.
2022.07.04 PETR has been accepted by ECCV 2022.
2022.06.28 The code of BEV Segmentation in PETRv2 is released.
2022.06.16 The code of 3D object detection in PETRv2 is released.
2022.06.10 The code of PETR is released.
2022.06.06 PETRv2 is released on arxiv.
2022.06.01 PETRv2 achieves another SOTA performance on nuScenes dataset (58.2% NDS and 49.0% mAP) by the temporal modeling and supports BEV segmentation.
2022.03.10 PETR is released on arxiv.
2022.03.08 PETR achieves SOTA performance (50.4% NDS and 44.1% mAP) on standard nuScenes dataset.

Preparation

This implementation is built upon detr3d, and can be constructed as the install.md.

Environments
Linux, Python==3.6.8, CUDA == 11.2, pytorch == 1.9.0, mmdet3d == 0.17.1
Detection Data
Follow the mmdet3d to process the nuScenes dataset (https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/data_preparation.md).
Segmentation Data
Download Map expansion from nuScenes dataset (https://www.nuscenes.org/nuscenes#download). Extract the contents (folders basemap, expansion and prediction) to your nuScenes maps folder.
Then build Segmentation dataset:
```
cd tools
python build-dataset.py
```
If you want to train the segmentation task immediately, we privided the processed data ( HDmaps-final.tar ) at gdrive. The processed info files of segmentation can also be find at gdrive.
Pretrained weights
To verify the performance on the val set, we provide the pretrained V2-99 weights. The V2-99 is pretrained on DDAD15M (weights) and further trained on nuScenes train set with FCOS3D. For the results on test set in the paper, we use the DD3D pretrained weights. The ImageNet pretrained weights of other backbone can be found here. Please put the pretrained weights into ./ckpts/.

After preparation, you will be able to see the following directory structure:

PETR
├── mmdetection3d
├── projects
│   ├── configs
│   ├── mmdet3d_plugin
├── tools
├── data
│   ├── nuscenes
│     ├── HDmaps-nocover
│     ├── ...
├── ckpts
├── README.md

Train & inference

cd PETR

You can train the model following:

tools/dist_train.sh projects/configs/petr/petr_r50dcn_gridmask_p4.py 8 --work-dir work_dirs/petr_r50dcn_gridmask_p4/

You can evaluate the model following:

tools/dist_test.sh projects/configs/petr/petr_r50dcn_gridmask_p4.py work_dirs/petr_r50dcn_gridmask_p4/latest.pth 8 --eval bbox

Visualize

You can generate the reault json following:

./tools/dist_test.sh projects/configs/petr/petr_vovnet_gridmask_p4_800x320.py work_dirs/petr_vovnet_gridmask_p4_800x320/latest.pth 8 --out work_dirs/pp-nus/results_eval.pkl --format-only --eval-options 'jsonfile_prefix=work_dirs/pp-nus/results_eval'

You can visualize the 3D object detection following:

python3 tools/visualize.py

Main Results

PETR: We provide some results on nuScenes val set with pretrained models. These model are trained on 8x 2080ti without cbgs. Note that the models and logs are also available at Baidu Netdisk with code petr.

config	mAP	NDS	training	config	download
PETR-r50-c5-1408x512	30.5%	35.0%	18hours	config	log / gdrive
PETR-r50-p4-1408x512	31.70%	36.7%	21hours	config	log / gdrive
PETR-vov-p4-800x320	37.8%	42.6%	17hours	config	log / gdrive
PETR-vov-p4-1600x640	40.40%	45.5%	36hours	config	log / gdrive

PETRv2: We provide a 3D object detection baseline and a BEV segmentation baseline with two frames. The model is trained on 8x 2080ti without cbgs. The processed info files contain 30 previous frames, whose transformation matrix is aligned with the current frame. The info files, models and logs are also available at Baidu Netdisk with code petr.

config	mAP	NDS	training	config	download
PETRv2-vov-p4-800x320	41.0%	50.3%	30hours	config	log / gdrive

config	Drive	Lane	Vehicle	backbone	config	download
PETRv2_BEVseg	85.6%	49.0%	46.3%	V2-99	config	log / gdrive

config	F-score	X-near	X-far	Z-near	Z-far	backbone	config	download
PETRv2_3DLane	61.2%	0.400	0.573	0.265	0.413	V2-99

StreamPETR: Stream-PETR achieves significant performance improvements without introducing extra computation cost, compared to the single-frame baseline.

config	mAP	NDS	FPS-Pytorch	config	download
StreamPETR-r50-704x256	45.0%	55.0%	31.7/s

Acknowledgement

Many thanks to the authors of mmdetection3d and detr3d .

Citation

If you find this project useful for your research, please consider citing:

@article{liu2022petr,
  title={Petr: Position embedding transformation for multi-view 3d object detection},
  author={Liu, Yingfei and Wang, Tiancai and Zhang, Xiangyu and Sun, Jian},
  journal={arXiv preprint arXiv:2203.05625},
  year={2022}
}

@article{liu2022petrv2,
  title={PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images},
  author={Liu, Yingfei and Yan, Junjie and Jia, Fan and Li, Shuailin and Gao, Qi and Wang, Tiancai and Zhang, Xiangyu and Sun, Jian},
  journal={arXiv preprint arXiv:2206.01256},
  year={2022}
}

Contact

If you have any questions, feel free to open an issue or contact us at [email protected], [email protected] or [email protected].

petr's People

Contributors

Stargazers

Watchers

Forkers

schliffen vaesl ziqipang zhuokunyao ithink3iam yan811 gaopengpjlab pyten gg-bonds sty61010 l-net-1992 yunqilingzhen kriskrisliu dianch pkurainbow shengstar acrocorinth niko-zyf crashmoon lllzd guofenggitlearning chenyuege huamiao1012 yaozh66 hyeokreal superchong1987 qiuhuan tianjun-world collector-m zyc4me qhfan ifzhang tjqansthd barcelona16 aqsc dl19940602 renlancai kaixinbear julightzhong10 hongbo123467 jieli352 jordanresearch bennyustc mfkiwl aaronswei devoe-97 yingfei1016 mrmoore98 gongshichina brahimmade sophiezhou noticeable mengxingshifen1218 autra-weiliu templex98 eltociear avi9700 austinxy zhaokegg tiantianxuabc liulangxing firestonelib 21pirlo21 lidanbingo lianghan-zhang wangna11bd dmame krislu116420 onlyonewater twodragonbear silvercherry dachaoxc rqbrother lloveapple wangjuenew zjufkq www132409011 woquyuna zhengfangwu goodxue jianglingxin leftthink hakanardo wuyefeilin cosonjia aipakchoi whuhxb virtualxqf cv-det thejasonfisher cesarliu hzm-january xqpinitial wang20150419 zhangym127 gt-wang jasmine-97 goga1992 haoyus waynerqiu

petr's Issues

Reproduce PETR result

Thanks for sharing such wonderful & interesting work!!!
I'm trying to reproduce the result of "petr_r50dcn_gridmask_p4.py". At the end of this config file, the result is as followed:

mAP: 0.3174

mATE: 0.8397

mASE: 0.2796

mAOE: 0.6158

mAVE: 0.9543

mAAE: 0.2326

NDS: 0.3665

I train with this config file, because I have only 2 V100 cards, I change the batchsize as "samples_per_gpu=2, workers_per_gpu=2,", also use "--autoscale-lr" and not to use it. But my result is almost like this:
mAP: 0.2103
mATE: 1.0048
mASE: 0.3099
mAOE: 0.8165
mAVE: 1.1984
mAAE: 0.4087
NDS: 0.2516

I also check the training log you provided (20220606_223059.log), at the end of 24 epochs, your loss is 5.6355, but for my loss, it's about 7.xx. I test the model you provided, result is the same as that in "petr_r50dcn_gridmask_p4.py".

lr, batchsize, or other parameters? Any advices? Thanks!

How to generate HDmaps-nocover_infos_train/test.pkl? I want to generate this file in my directory rather than /data/Dataset/nuScenes/...

randomness during inference?

Hello,

I have a problem when I'm doing inference part. When I try to put the same data into the model, I got different outputs for the pred 3d bboxes. Could you explain that? Thansk.

Nuscenes TestSet Config

Hi, is the PETRv2 config the corresponding config of PETRv2 in the Nuscenes Leaderboard? Thanks~

Grad norm became Inf and Loss failed to decrease while training

Hi,
Thanks for sharing so wonderful work! I'm trying to reproduce your work now. But according to log file, grad norm became Inf and training loss failed to converge after at the third

epoch. Few changes of code was made but just data root path.

I used 4 GPUs of GeForce RTX 3090 for training. My environment is below:
CUDA: 11.2
python: 3.6.9
pytorch: 1.9.0
mmcv: 1.4.0
mmdet: 2.24.1
mmsegmentation: 0.20.2
mmdet3d: 0.17.1
tools/dist_train.sh projects/configs/petr/petr_r50dcn_gridmask_p4.py 4 --work-dir work_dirs/petr_r50dcn_gridmask_p4/

The first Inf happened at 4200 iters of first epoch, and after 2 epochs, grad norm was almost Inf and Nan

KeyError: 'HungarianAssigner3D is already registered in bbox_assigner'

Why IoU calculation is (I + I)/(I +U)

The IoU function seems to be (I + I)/(I +U) in mmdet3d_plugin/models/detectors/petr3d_seg.py, but not I/U that we common used.

This may lead to higher IoU value than that calculated by normal IoU metrics. So why not to use normal IoU metrics?

The source code of the IoU function is shown below:

def IOU (intputs,targets):
numerator = 2 * (intputs * targets).sum(dim=1)
denominator = intputs.sum(dim=1) + targets.sum(dim=1)
loss = (numerator + 0.01) / (denominator + 0.01)
return loss

use_lidar=True, use_camera=True ?

why set use_lidar=True ？

loading pre-training weights for training

Why is the loss still huge when loading pre-training weights for training, just like not loading them?

The data preparation of petrv2

hi, the train dataset used in petrv2 experiments is "mmdet3d_nuscenes_30f_infos_train.pkl", and is there any code to create the data? I follow the readme instruction to use the code offered by the official mmdet, but it can only generate "nuscenes_infos_train.pkl", but not "mmdet3d_nuscenes_30f_infos_train.pkl".

When to release the code for segmentation in PETRv2?

Hi, when to release the code for segmentation in PETRv2?

Can you provide the mmdet3d_nuscenes_30f_infos_train.pkl file generated from the data of the nuscenes mini version?

How to generate mmdet3d_nuscenes_30f_infos_train.pkl ?

regression of PETRv2 and its label

Hi,
Thanks for sharing your wonderful work! Here I have 2 uncertain questions.

Does PETRv2 regresses the relative offset between two frames rather than velocity?
If the corresponding label is the relative velocity? In other words, the label is subtracted from its own current velocity?

close issue

test.py和visualize_result.py可视化出错

按照readme下的提示，训练完模型后，我用python3 tools/test.py $config $ckpt --show --show-dir $showdir 命令时报错：
请问这个是什么原因？
Traceback (most recent call last):
File "tools/test.py", line 258, in
main()
File "tools/test.py", line 225, in main
outputs = single_gpu_test(model, data_loader, args.show, args.show_dir)
File "/disk5/shc/mmdetection3d/mmdet3d/apis/test.py", line 47, in single_gpu_test
model.module.show_results(data, result, out_dir=out_dir)
File "/disk5/shc/mmdetection3d/mmdet3d/models/detectors/mvx_two_stage.py", line 466, in show_results
if isinstance(data['points'][0], DC):
KeyError: 'points'

另外使用visualize_results.py时也遇到了问题
Traceback (most recent call last):
File "tools/misc/visualize_results.py", line 88, in
main()
File "tools/misc/visualize_results.py", line 78, in main
dataset.show(results, args.show_dir, pipeline=eval_pipeline)
File "/disk5/shc/mmdetection3d/mmdet3d/datasets/nuscenes_dataset.py", line 567, in show
show_result(points, show_gt_bboxes, show_pred_bboxes, out_dir,
File "/disk5/shc/mmdetection3d/mmdet3d/core/visualizer/show_result.py", line 99, in show_result
vis = Visualizer(points)
File "/disk5/shc/mmdetection3d/mmdet3d/core/visualizer/open3d_vis.py", line 379, in init
self.pcd, self.points_colors = _draw_points(
File "/disk5/shc/mmdetection3d/mmdet3d/core/visualizer/open3d_vis.py", line 35, in _draw_points
vis.get_render_option().point_size = points_size # set points size
AttributeError: 'NoneType' object has no attribute 'point_size'

Data augmentation with segmentation map/queries

Hi! I noticed that in your petrv2_BEVseg.py config file there's GlobalRotScaleTransImage which applies rotation and scaling to the bev space and correspondingly 3d ground truth boxes (which makes total sense for detection because boxes are also modified accordingly). However, the transformations are not applied on results['gt_map'] or results['maps'].

I'm confused here since when segmentation queries are generated using fixed locations in the bev space, so in my understanding either:

the queries should first be rotated & scaled, or
the gt map should be rotated & scaled.

I didn't see this in Petr3D_seg, PETRHeadseg, or GlobalRotScaleTransImage. Am I missing anything? If that's already done could you point me to where it is?

Thank you!

Any report for the computation cost?

Hello,

Thanks for your excellent work!
I'm wondering if you have any analysis for the computation cost (params and MACs) of petr models? Or if you have any convenient way to calculate that during inference? I found a "get_flops.py" file under tools folder, but it seems like only support simple input for images rather than a dict containing data and cam_info.

Thanks.

Could not reproduce results of PETR-vov-p4

Dear author:
Thanks for sharing such excellent work! I am trying to reproduce your results using the official code, config, and pre-trained weights.
I obtain NDS: 0.4480, mAP: 0.3983 for PETR-vov-p4-1600x640, and NDS: 0.4202, mAP: 0.3738 for PETR-vov-p4-800x320. These numbers are lower than what you report in README.
I could successfully reproduce the results on R50. Could you give me some advice? Thanks~

作者您好，可否告知一下模型训练的显卡型号以及训练时间？谢谢！

TypeError: can't pickle dict_keys objects

I am trying to reproduce the PETRv2 on bevseg.
But I face one error:
TypeError: can't pickle dict_keys objects

the whole log is as follow:
2022-08-16 06:30:37,780 - mmdet - INFO - workflow: [('train', 1)], max: 24 epochs 2022-08-16 06:30:37,783 - mmdet - INFO - Checkpoints will be saved to /home/gyang/data/PETR/work_dirs/petrv2_BEVseg by HardDiskBackend. Traceback (most recent call last): File "tools/train.py", line 255, in <module> main() File "tools/train.py", line 251, in main meta=meta) File "/home/gyang/data/PETR/mmdetection3d/mmdet3d/apis/train.py", line 351, in train_model meta=meta) File "/home/gyang/data/PETR/mmdetection3d/mmdet3d/apis/train.py", line 319, in train_detector runner.run(data_loaders, cfg.workflow) File "/home/gyang/anaconda3/envs/petr/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 130, in run epoch_runner(data_loaders[i], **kwargs) File "/home/gyang/anaconda3/envs/petr/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train for i, data_batch in enumerate(self.data_loader): File "/home/gyang/anaconda3/envs/petr/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 355, in __iter__ return self._get_iterator() File "/home/gyang/anaconda3/envs/petr/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 301, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "/home/gyang/anaconda3/envs/petr/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 914, in __init__ w.start() File "/home/gyang/anaconda3/envs/petr/lib/python3.7/multiprocessing/process.py", line 112, in start self._popen = self._Popen(self) File "/home/gyang/anaconda3/envs/petr/lib/python3.7/multiprocessing/context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/home/gyang/anaconda3/envs/petr/lib/python3.7/multiprocessing/context.py", line 284, in _Popen return Popen(process_obj) File "/home/gyang/anaconda3/envs/petr/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__ super().__init__(process_obj) File "/home/gyang/anaconda3/envs/petr/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__ self._launch(process_obj) File "/home/gyang/anaconda3/envs/petr/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File "/home/gyang/anaconda3/envs/petr/lib/python3.7/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: can't pickle dict_keys objects
Would you mind to give some suggestion about this?

Cannot reproduce PTERv2 results

Hi， @yingfei1016 ，I tried to reproduce the results with the default config on VoVNet on A100, but got a bit lower results. I got 49.6 NDS and 40.3 mAP. Results recorded in the config are 50.3 NDS and 41 mAP. Is it normal?

Why no reported the config of using swin as backbone and train log of PETR?

Hi, thanks for your excellent work! I'm interested in using swin as backbone, and I found the result in your paper is very good. But I cannot find any log or config. Thanks!

Dose "sweep_range" comply with the regulation of nuscenes?

Thanks for sharing this great work!

I have a question to discuss.
As nuScenes page of detection task mentioned:
"The maximum time window of past sensor data and ego poses that may be used at inference time is approximately 0.5s (at most 6 past camera images, 6 past radar sweeps and 10 past lidar sweeps). At training time there are no restrictions."

Dose "sweep_range=[3, 27]" comply with the regulation?
Is there a standard to use temporal information for camera methods？

About the bev segmentation branch.

Hi, thanks for your great work! I want to ask what is the meaning of the reshape operation in the bev segmentation branch in the paper and where is the corresponding code? Thank you!

Question in PetrV2_head

https://github.com/megvii-research/PETR/blob/main/projects/mmdet3d_plugin/models/dense_heads/petrv2_head.py#L511

When i set batch_size_per_gpu > 2, an error will be thrown here regarding to the mismatched shape: tmp[..., 8:] with shape(bs, num_obj, 2) while mean_time_stamp with shape bs.

Also, could you please explain what you are trying to do here?

Thanks a lot.

About the config of petrv2 to achieve the performance

Hello,
Thanks for your excellent work!
I'm wondering which config of petrv2 can achieve the performance (mAP0.490/NDS0.582) on the nuScenes test set? Is https://github.com/megvii-research/PETR/blob/main/projects/configs/petrv2/petrv2_vovnet_gridmask_p4_800x320.py this one? or it is not reported yet? And i saw the performance you discribe at the bottom of the config petrv2_vovnet_gridmask_p4_1600x640_trainval_cbgs.py https://github.com/megvii-research/PETR/blob/main/projects/configs/petrv2/petrv2_vovnet_gridmask_p4_1600x640_trainval_cbgs.py achieved such a good performance at mAP: 0.8412/NDS: 0.83, is that a camera only method? how about its performance on the nuScenes test set? why not report its result to the nuScenes leaderboard?

Log of loss curves

Hi, can you share the log files of training process? When I use the official log files to train models, the loss seems not to drop. I want to see how the loss curve should be when the training process is proper.

Code of PETRv2.

Thanks for you great work!
ARE there plans to release the code of PETRv2?

Do not find the methods proposed in PETR

Hi, in this repo, I do not find the two main contributions of PETR, i.e., 3D coordinate generator and 3D position encoder. There is no encoder in the PETR head, and the position coding seems to be the normal 2D position coding.

Segmentation visualization results

Dear authors,

Thanks for sharing this great work. When I tried to visualize the segmentation results on valset, I found that the visualized segmentation results do not appear to be as good as the paper shows. Despite the fact that the IOU results are quite good, the visualized lane, vehicle, and driving area are somewhat unrecognized. So could you please give me some suggestion to this? and you can find one of visualized segmentation results in the attached picture. Thanks in advance for your help.

(model output)

(Ground truth)

About the relationship between depth information and positional encoding

I mentioned that you have claimed "We argue that the 3D PE should be driven by the 2D features since the image feature can provide some informative guidance (e.g., depth). " So is there some probabilities that some explicit depth-like supervision can be combined into the PETRv2 series, or have you achieved some experiment results like this? THX!

Input data for training and inference process

Hi, I have the following 3 questions:

In PETRv2, the size of the img input in the training process is [12, 3, 320, 800]. Can 12 be understood as 6 pictures at the current moment and 6 pictures in the previous frame?
During the inference process, the input img size is [1, 12, 3, 320, 800]. Does the inference process input 12 pictures at two times at the same time?
When reading the dataloader, I did not find the process of adding the 6 pictures at the previous moment to the 6 pictures at the current moment. Which pipeline is this process reflected in?
thank you very much!

some question about position encoding in depth axis

Hi, thanks for sharing so wonderful work, after reading the paper, I have some maybe stupid question, how about set D=1? With my understanding, for each position in different view image's featuremap, we should use position encoding to distinguish the position from which view, so why you must use D's depth, even with D=1 depth, after coordinate transfom using image extrin matrix, I think it's very easy to jugde the position comes from which view. So my question is what's the difference when set D = 1 or D > 1, as I don't see the ablation study.

About PETRV2 detection results, why no reported results on Res101-dcn?

You only reported detection results based on V2-99 backbone, but I was wondering the results on res101 to make a comparision between PETRV2 and BEVFormer.

Does the PETR model require point cloud data?

What is the purpose of the reverse_angle of the GlobalRotScaleTransImage?

My understanding is that rot_angle is the rotation angle of 3D space, so the angle of rotation of gt_boxes should also be rot_angle, why should it be reversed?

About the config of Swin tiny

Thank you for releasing the code! I tried the swin tiny define in BEVDet in PETRv2, with 800*320 resolution, getting 40.6 NDS and 29.6 mAP, which is kind of low. Could you kindly release the config of swin tiny? Lots of thanks!

Initialization of ResNet50 Models and Paths in Config

Hi Guys,

Thanks for the nice work! I have some questions and not sure if my understanding is correct.

From the config file, the models for ResNet-50 are initialized from a special pth file. However, it is missing in the provided model weights.
I guess you want to change the data_root in the configuration files, such as this line, to ./data/nuscenes/ to fit the directory structure in README.

Thanks again for the help! Please help me check with the pretrained weights.

No loss displayed while training

just this. No loss displayed while training
2022-08-03 17:16:37,279 - mmdet - INFO - workflow: [('train', 1)], max: 24 epochs
2022-08-03 17:16:37,281 - mmdet - INFO - Checkpoints will be saved to /root/paddlejob/workspace/wangguangjie/PETR/work_dirs/petr_r50dcn_gridmask_p4 by HardDiskBackend.
2022-08-03 17:19:34,002 - mmdet - INFO - Saving checkpoint at 1 epochs
2022-08-03 17:22:33,118 - mmdet - INFO - Saving checkpoint at 2 epochs
2022-08-03 17:25:32,982 - mmdet - INFO - Saving checkpoint at 3 epochs

train.py引用库的问题

您好，在train.py中from mmdet.utils import get_device显示没有get_device这个函数，然后我在mmdet里面也没找到，请问这个问题该怎么解决呢？

Qestion about segmentation in PETRv2

Hi, does the released code contain the code of segmentation in PETRv2?

I can't find the code of the two-frames temporal representation learning of PETRv2

Dear author,
As described in the topic, could you tell me where I can find the code of the two-frames temporal representation learning of PETRv2?
thanks!

Cannot reproduce the training result of PETRv2

Hi, thanks for your great work.
I want to train the PETRv2 using a single GPU, with default config you have provided with nothing else altered, but get nan in grad_norm after several iters during the first epoch just as the below log file shows, Any suggestion?
Thanks in advance!
20220812_090303.log

About visualize of road segmentation

Thanks for the great work!
Since the object detection has a visualization, will segmentation also have a visualization demo?

Questions on extrinsics & intrinsics

Hi! Thank you for releasing a wonderful work. I have a few questions regarding the extrinsics & intrinsics processing in CustomNuScensDataset and data pipelines, and hope you could help answer:

I inspected the values of extrinsics of some samples, and found the translation part a bit confusing. For example, two samples have the translations:

>>> np.array([extr[-1,:].round(2) for extr in extrinsics])
array([[ 0.01, -0.33, -0.48,  1.  ],
       [ 0.05, -0.34, -0.63,  1.  ],
       [ 0.09, -0.33, -0.54,  1.  ],
       [-0.  , -0.28, -1.  ,  1.  ],
       [-0.24, -0.24, -0.44,  1.  ],
       [ 0.08, -0.27, -0.49,  1.  ]])
# step over to another sample
>>> np.array([extr[-1,:].round(2) for extr in extrinsics])
array([[ 0.01, -0.33, -0.58,  1.  ],
       [ 0.11, -0.34, -0.68,  1.  ],
       [-0.02, -0.33, -0.61,  1.  ],
       [-0.  , -0.28, -0.96,  1.  ],
       [-0.23, -0.24, -0.44,  1.  ],
       [ 0.14, -0.27, -0.47,  1.  ]])
>>> info['cams'].keys()
dict_keys(['CAM_FRONT', 'CAM_FRONT_RIGHT', 'CAM_FRONT_LEFT', 'CAM_BACK', 'CAM_BACK_LEFT', 'CAM_BACK_RIGHT'])
>>> info['lidar_path']
'./data/nuscenes/samples/LIDAR_TOP/n015-2018-07-18-11-50-34+0800__LIDAR_TOP__1531885881798485.pcd.bin'

Why are the y part of the 3 FRONT cameras so close to each other? I mean if the vehicle is moving (roughly forward) and each camera is triggered when lidar scan pass through the FOV center, shouldn't the y part show the forward movement after a clockwise full scan?
Why does the y part stay almost the same, while z part varies noticebly? Is it x-right, y-forward, z-upward?

In the customized transform_3d.py, why is line 527 here commented out? My understanding is that, even though results["lidar2img"] is modified accordingly, the extrinsics should also be updated if anyone wants to make use of the extrinsics.
Similar to 2. in scale_xyz of GlobalRotScaleTransImage, if I want to make use of intrinsics myself I should also update intrinsics, right? Currently at line 546 the intrinsics is not modified.

Looking forward to hearing from you. Thanks in advance!

The code for training PETRV2, the network loss does not drop, has anyone encountered this situation?

memory increase and then crash when start dist

thank you for sharing the code~
i met the problom when start dist_train.sh

swap mem increase to full, then crashed, got error

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 2 (pid: 8530) of binary: /opt/miniconda3/envs/petr/bin/python3
Traceback (most recent call last):
File "/opt/miniconda3/envs/petr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/miniconda3/envs/petr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/miniconda3/envs/petr/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/opt/miniconda3/envs/petr/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/opt/miniconda3/envs/petr/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/opt/miniconda3/envs/petr/lib/python3.7/site-packages/torch/distributed/run.py", line 692, in run
)(*cmd_args)
File "/opt/miniconda3/envs/petr/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 116, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/miniconda3/envs/petr/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

         tools/train.py FAILED

================================================
Root Cause:
[0]:
time: 2022-09-15_05:47:52
rank: 2 (local_rank: 2)
exitcode: -9 (pid: 8530)
error_file: <N/A>
msg: "Signal 9 (SIGKILL) received by PID 8530"

Other Failures:
<NO_OTHER_FAILURES>

my environment:

C++ Version: 201402 [1847/1847]
Intel(R) Math Kernel Library Version 2020.0.1 Product Build 20200208 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_
75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
CuDNN 8.0.5
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUS
E_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-
missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-type
defs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-col
or=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PE
RF_WITH_AVX512=1, TORCH_VERSION=1.9.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.9.0+cu111
OpenCV: 4.2.0
MMCV: 1.4.0
MMCV Compiler: GCC 5.4
MMCV CUDA Compiler: 10.2
MMDetection: 2.24.1
MMSegmentation: 0.20.2
MMDetection3D: 0.17.1+9a62051

How to generate info files for PETRv2 in 3d detection task？

Thanks for your nice job! I want to know How to generate info files for PETRv2 in 3d detection task？