tusimple / centerformer Goto Github PK

View Code? Open in Web Editor NEW

282.0 12.0 27.0 650 KB

Implementation for CenterFormer: Center-based Transformer for 3D Object Detection (ECCV 2022)

License: MIT License

Python 83.46% Shell 0.08% C++ 4.76% Cuda 11.70%

lidar-point-cloud transformer

centerformer's People

Contributors

Stargazers

Watchers

centerformer's Issues

为什么只对regression head进行transformer操作？如果对CenterHead也进行transformer操作效果会提升吗？

2022.11.30

previous frame transformed to current frame?

Thanks for your great work and opensource, I have a question about the coordinate transformation.
I have checked the code, you have transformed the previous pointclouds to current frame coordinate during training, right?

if sweep["transform_matrix"] is not None:
        points_sweep[:3, :] = sweep["transform_matrix"].dot( 
            np.vstack((points_sweep[:3, :], np.ones(nbr_points)))
        )[:3, :]

But in deployment inference, we will just save and use previous featuremap in the memory bank, and the feature map has not transformed to current frame. So there is a gap here.

Please correct me, if I am wrong, thanks!

Use CenterFormer on other datasets

Hello, really impressive work! I wonder whether I could use the method on other datasets apart from WOD and Nuscenes. I want to use my own dataset for which I already wrote a data pipeline that works. So if I want to make CenterFormer to have the cheering performance on my own dataset, is there any essential change that I should or it's ready for that?

Redundant boxes after post processing

It seems like the configuration: "use_rotate_nms = False, use_multi_class_nms = True" cannot remove all redundant boxes and there are still lots of boxes at the same position. Is this normal?
Also, though I set score_threshold = 0.1 in test_cfg, there are lots of boxes with score less than 0.1 in the final output

If the positions "x_coor" and "y_coor" should be swapped in Line 466 and 468 det3d/models/necks/rpn_transformer?

Those two lines seem flawless.
But coincidently, I changed the range of PC, which made H unequal to W. I encountered this:

Afte debugging, in get_multi_scale_feature, center_pos sometimes fell out of the range of feat. After checking backwards, I found some elements of y_coor > W exist. in Line 468.
After some experiments I tried, this problem can be fixed by swaping x_coord and y_coord.

torch.distributed.elastic.multiprocessing.errors.ChildFailedError

when I run:
python -m torch.distributed.launch --nproc_per_node=2 ./tools/train.py configs/waymo/voxelnet/waymo_centerformer.py

It shows the following error:

`2022-10-23 14:27:35,879 - INFO - Start running, work_dir: /dkliang/projects/synchronous/centerformer/work_dirs/waymo_centerformer
2022-10-23 14:27:35,880 - INFO - workflow: [('train', 1)], max: 20 epochs
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -11) local_rank: 0 (pid: 44260) of binary: /dkliang/miniconda3/envs/centerformer/bin/python
Traceback (most recent call last):
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/site-packages/torch/distributed/run.py", line 689, in run
elastic_launch(
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 116, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 244, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

         ./tools/train.py FAILED

==================================================
Root Cause:
[0]:
time: 2022-10-23_14:27:44
rank: 0 (local_rank: 0)
exitcode: -11 (pid: 44260)
error_file: <N/A>
msg: "Signal 11 (SIGSEGV) received by PID 44260"

Other Failures:
[1]:
time: 2022-10-23_14:27:44
rank: 1 (local_rank: 1)
exitcode: -11 (pid: 44261)
error_file: <N/A>
msg: "Signal 11 (SIGSEGV) received by PID 44261"
**************************************************`

Is there any test in Nuscenes

hello, i wonder is there any test in Nuscenes?

python3.9不支持pillow6.2.1

requirements中要求pillow版本不高于6.2.1，而Install里的环境是在python3.9.12测试的，可Python3.9不支持安装6.2.1的pillow版本

Question about why the add&norm structure of the tranformer network differ from the typical transformer one

centerformer/det3d/models/utils/transformer.py

Lines 267 to 279 in 96aa375

    
               if pos_embedding is not None: 
        
                   x_att = self_attn(x + center_pos_embedding) 
        
                   x = x_att + x 
        
                   x_att = cross_attn( 
        
                       x + center_pos_embedding, y + neighbor_pos_embedding 
        
                   ) 
        
               else: 
        
                   x_att = self_attn(x) 
        
                   x = x_att + x 
        
                   x_att = cross_attn(x, y) 
        
           x = x_att + x 
        
           x = ff(x) + x

In the code, the residual in transformer is only the input after add and does not pass through the norm layer. add and norm are not taken as a whole, which is different from the typical transformer structure (the result of add and norm in series as a new level of input). Is there any special consideration for the design here?

Positional embedding in RPN_transformer_deformable_multitask

Hello,

I would like to know that is there any specific reason for using task_id along with x_coor, y_coor while creating pos_embedding ?

    if self.pos_embedding_type == "linear":
        if len(self.tasks)>1:
            self.pos_embedding = nn.Linear(3, self._num_filters[-1] * 2)

Anyhow we know that 6 task_id ct_feats are concatenated next to each other and are sliced accordingly later in the below code snippet.

    for idx, task in enumerate(self.tasks):
        out_dict_list[idx]["ct_feat"] = ct_feat[:, :, idx * self.obj_num : (idx+1) * self.obj_num]

what is the purpose of diluting ct_feat dimensions (256) with task_id.

Thanking you in advance.

Why only selected the 0-th data in "example["ind"][0]", I think there are 6 task head?

Hi,
I'm confused about that why only selected the 0-th data in "example["ind"][0]" in the line , I think there are 6 task head?

About `disable_dbsampler`

Hi, I succesfully reproduced base centerformer 68.06 in nuscenes.
Thanks a lot.

One thing I have noticed difference from CenterPoint base code is,
you code contains disable_dbsampler option.
Could you explain what's the motivation of this part? Is it simply turning off augmentation from epoch 15?

nuScenes result?

Thanks for you fantastic work. I'm so interested in this project. But I can not get your AP&NDS on nuScenes. Could u upload your training result on nuScenes? I wanna to do some stretching work based on that. Wish your replay~

/usr/include/stdio.h(189): error: attribute "malloc" does not take arguments

Hello，when I execute the setup. sh file, there is an error：

/usr/local/cuda-11.5/bin/nvcc -I/root/anaconda3/envs/centerformer/lib/python3.9/site-packages/torch/include -I/root/anaconda3/envs/centerformer/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/root/anaconda3/envs/centerformer/lib/python3.9/site-packages/torch/include/TH -I/root/anaconda3/envs/centerformer/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda-11.5/include -I/root/anaconda3/envs/centerformer/include/python3.9 -c src/iou3d_nms_kernel.cu -o build/temp.linux-x86_64-cpython-39/src/iou3d_nms_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O2 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=iou3d_nms_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
/usr/include/stdio.h(189): error: attribute "malloc" does not take arguments

/usr/include/stdio.h(201): error: attribute "malloc" does not take arguments

/usr/include/stdio.h(223): error: attribute "malloc" does not take arguments

/usr/include/stdio.h(260): error: attribute "malloc" does not take arguments

/usr/include/stdio.h(285): error: attribute "malloc" does not take arguments

/usr/include/stdio.h(294): error: attribute "malloc" does not take arguments

/usr/include/stdio.h(303): error: attribute "malloc" does not take arguments

/usr/include/stdio.h(309): error: attribute "malloc" does not take arguments

/usr/include/stdio.h(315): error: attribute "malloc" does not take arguments

/usr/include/stdio.h(830): error: attribute "malloc" does not take arguments

/usr/include/stdlib.h(566): error: attribute "malloc" does not take arguments

/usr/include/stdlib.h(570): error: attribute "malloc" does not take arguments

/usr/include/stdlib.h(799): error: attribute "malloc" does not take arguments

13 errors detected in the compilation of "src/iou3d_nms_kernel.cu".
error: command '/usr/local/cuda-11.5/bin/nvcc' failed with exit code 1

Some details to discuss

Thank you for open-sourcing your work. I was wondering, why you use x_up(the current frame's bev feature) other than x_up_fuse(the sequential frames through spatial-aware fusion) as center query embedding ? Apologies if I missed it in the paper.

About Lidar and image fusion

Hello, first of all, thank you for your work. I have read your paper, do you think it is necessary to fuse image features on Lidar, but at the same time, I also know that the process of image transfer to BEV is time cost, do you think it is necessary (for nuscenes data set), Alternatively, 500 predicted location points can be projected according to calib to obtain the corresponding location neighborhood features of the image for fusion. Do you think these two ways of merging are worth it, or do you have a better way of merging or it doesn't make much sense at the moment.

CenterFormer on kitti

hello, it is very nice of your work. I try to use it on kitti, but I found it has very poor performance, the Car mAP3D @0.7 is only between 10~20. Have you tried it on KITTI ever?

waht speed does this mode with 3090 or other latest GPU?

thank you for you excellent work, and i want to know waht speed does this mode with 3090 or other latest GPU? My gpu is very poor, so i want to know the speed with a better GPU.I would appreciate it if anyone could answer me .

waymo coordinates

centerformer/det3d/datasets/waymo/waymo_common.py

Line 269 in 5a949b8

gt_boxes[:, -1] = -np.pi / 2 - gt_boxes[:, -1]

centerformer/det3d/datasets/waymo/waymo_common.py

Line 270 in 5a949b8

gt_boxes[:, [3, 4]] = gt_boxes[:, [4, 3]]

Pillar-based centerformer

Hi, i wonder have you ever tried the pillar-based centerformer?

Evaluation on waymo opendataset

Hello, in order to reproduce the waymo results by my own, I trained CenterFormer on waymo and tried to get the performance evaluation like that shown in the README:

I follwed the instruction from https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md, I already have the gt.bin and preds.bin of CenterFormer, but I ran into this error:

I wonder whether you encounted this issue before, or maybe I've gone to the wrong way? Really need some help here. Please. Thanks in advance.

Minimum configuration requirements

I want to know whether you know the minimum GPU computing power required, and how many gigabytes

Implementation of CorssAttention

Hello, I found that you used ChannelAttention and SpatialAttention in your code to replace the cross attention used by the cross attention layer mentioned in original paper, which was done to take into account the computational cost of cross-attention？

`ValueError: /workplace/spconv/src/spconv/spconv_ops.cc 87 unknown device type` error

Hi, thanks for sharing code.
I am leaving an issue since I have trouble on running your code.
I run a code without ddp python ./tools/train.py ./configs/nusc/nuscenes_centerformer_separate_detection_head.py,
sh setup.sh works nicely. but here is follwing error when running train.py.

Traceback (most recent call last):
  File "./tools/train.py", line 137, in <module>
    main()
  File "./tools/train.py", line 132, in main
    logger=logger,
  File "/workspace/det3d/torchie/apis/train.py", line 335, in train_detector
    trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, local_rank=cfg.local_rank)
  File "/workspace/det3d/torchie/trainer/trainer.py", line 546, in run
    epoch_runner(data_loaders[i], self.epoch, **kwargs)
  File "/workspace/det3d/torchie/trainer/trainer.py", line 413, in train
    self.model, data_batch, train_mode=True, **kwargs
  File "/workspace/det3d/torchie/trainer/trainer.py", line 371, in batch_processor_inline
    losses = model(example, return_loss=True)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/det3d/models/detectors/voxelnet_dynamic.py", line 52, in forward
    x, _ = self.extract_feat(example)
  File "/workspace/det3d/models/detectors/voxelnet_dynamic.py", line 38, in extract_feat
    data['voxels'], data["coors"], data["batch_size"], data["input_shape"]
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/det3d/models/backbones/scn.py", line 156, in forward
    x = self.conv_input(ret)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/spconv/modules.py", line 134, in forward
    input = module(input)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/spconv/conv.py", line 181, in forward
    use_hash=self.use_hash)
  File "/opt/conda/lib/python3.7/site-packages/spconv/ops.py", line 95, in get_indice_pairs
    int(use_hash))
ValueError: /workplace/spconv/src/spconv/spconv_ops.cc 87
unknown device type

I have tried hard to run your code on nuscenes dataset. We also have 8gpus of A100 settting as you do.
One difference would be that I use docker image.
Here is dockerfile.

FROM pytorch/pytorch:1.9.1-cuda11.1-cudnn8-devel
MAINTAINER Junho Cho <[email protected]>

RUN rm /etc/apt/sources.list.d/cuda.list
RUN rm /etc/apt/sources.list.d/nvidia-ml.list
RUN apt-get update

RUN apt-get install git -y
RUN git clone https://github.com/TuSimple/centerformer.git

RUN cd centerformer && pip install -r requirements.txt

RUN apt-get install wget libboost-all-dev libgl1 -y

# Install cmake v3.13.2
RUN apt-get purge -y cmake && \
    mkdir /root/temp && \
    cd /root/temp && \
    wget https://github.com/Kitware/CMake/releases/download/v3.13.2/cmake-3.13.2.tar.gz && \
    tar -xzvf cmake-3.13.2.tar.gz && \
    cd cmake-3.13.2 && \
    bash ./bootstrap && \
    make && \
    make install && \
    cmake --version && \
    rm -rf /root/temp

RUN git clone --branch v1.2.1  https://github.com/traveller59/spconv.git --recursive
RUN cd spconv && python setup.py bdist_wheel && cd ./dist && pip install *whl

WORKDIR /workspace
ENV PYTHONPATH="${PYTHONPATH}:/workspace"

Through this dockerfile, we build spconv v1.2.1 on cuda 11.1 and pytorch 1.9.1 environment.
This makes exact pytorch, cuda version as your setting. Only difference is python, but I think is not a big difference. (also tried python 3.9.12, but no luck).

sh setup.sh always works nicely.

seems following error

ValueError: /root/spconv/src/spconv/spconv_ops.cc 87
unknown device type

might be solved with using other spconv (according to traveller59/spconv#58) , but I have not tried because you specified only spconv 1.2.1 works.

Would there be any idea to sort this issue?

Probably, spconv 1.2.1 does not work in docker accordint to this, but I confirmed spconv 2.2 worked in docker.

If this so, is there any chance this repo be able to support spconv 2.2? (I already tried spconv 2.2 for centerformer and failed a lot)

global_translate_noise in CenterForm is different from that in CenterPoint.

Interesting work!
The translation of data aug in CenterForm is 0.5,
https://github.com/TuSimple/centerformer/blob/master/configs/waymo/voxelnet/waymo_centerformer.py#L132, while the translation in CenterPoint is 0. Also, I noticed that you used the np.random.uniform rather than np.random.normal like rotation and scale parameters. Could you explain the motivation of these modification and performance influence about them？

trainning error about the new released nus

Thank you for releasing the nuscenes dataset support , but when I run trainning, it run into this problem RuntimeError: CUDA error: device-side assert triggered . Is there any array out of bounds problem? But when i debug it, it seems fine.

Can I reproduce your great work with MMDetection3d？

Thanks for your great work! I have ever tried CenterPoint based on MMDetection3d. I wonder which parts were changed compared with the orignal CenterPoint.
Looking forward to your early reply. Many thanks!

some questions about nuscenes multi-task support

Thanks for releasing the nuscenes dataset code support. I have some questions about the implement of the multi-tasks. I see in the code that you define obj_num=500 for each task and then the task_id will be added to the pos embedding to identify each task in rpn transformer. But unfortunately, the computation increases, and my machine directly throw the error that the cuda memory OOM. As for the implement of multi-task, my intuitive idea is that each task has its own head during the generation of heatmap. Then, all heatmaps are contacted to one tensor and generate top500 center queries, then sent to rpn transformer, Meanwhile, the pos feature is also the regular x and y coordinates. In the final output detection head, each task have their own detection head applying to transformer output features, which can reduce the increasing computation in transformer layer. This is my first thought, I wonder if you has experimented this way, is there any drawbacks? Could you share the effects or conclusions or something like that? It is very important to me. Thank you ~

AUTOMATIC MIXED PRECISION

Has anyone tried torch.cuda.amp?
Seems that ms_attention doesn't support fp16 even after I modified ms_deform_attn_forward_cuda
Any other way to implement amp? Or is there any ways to reduce the GPU memory? I got cuda OOM for bs=4 every time

Is it correct that Nan appears in the loss?

I ran the code as written on github.
However, after a certain point, the loss is all Nan.
I think it's a loss of the dataset, so I recreated the pkl file with create_data.py, but Nan comes out as it is. Is it correct to run the training to the end even if Nan comes out?

Hello,
Thanks for the open-source code.
The s_point_list is always empty in my case, the random_crop is set False in https://github.com/TuSimple/centerformer/blob/master/det3d/core/sampler/sample_ops.py#L195, even if set to True, doesn't give me s_points. Also, from the prev. condition check here https://github.com/TuSimple/centerformer/blob/master/det3d/core/sampler/sample_ops.py#L173, the s_points is empty [].
So trying to concatenate an empty array gives me an error.
What could be the issue? I'm trying using the NuScenes mini dataset, I was able to prepare date successfully.

	if pos_embedding is not None:
	x_att = self_attn(x + center_pos_embedding)
	x = x_att + x
	x_att = cross_attn(
	x + center_pos_embedding, y + neighbor_pos_embedding
	)
	else:
	x_att = self_attn(x)
	x = x_att + x
	x_att = cross_attn(x, y)

	x = x_att + x
	x = ff(x) + x

tusimple / centerformer Goto Github PK

centerformer's People

Contributors

Stargazers

Watchers

Forkers

centerformer's Issues

================================================== Root Cause: [0]: time: 2022-10-23_14:27:44 rank: 0 (local_rank: 0) exitcode: -11 (pid: 44260) error_file: <N/A> msg: "Signal 11 (SIGSEGV) received by PID 44260"

Recommend Projects

Recommend Topics

Recommend Org

==================================================
Root Cause:
[0]:
time: 2022-10-23_14:27:44
rank: 0 (local_rank: 0)
exitcode: -11 (pid: 44260)
error_file: <N/A>
msg: "Signal 11 (SIGSEGV) received by PID 44260"