wzzheng / tpvformer Goto Github PK

View Code? Open in Web Editor NEW

1.1K 1.1K 100.0 183.99 MB

An academic alternative to Tesla's occupancy network for autonomous driving.

License: Apache License 2.0

Python 99.98% Shell 0.02%

tpvformer's People

Contributors

Stargazers

Watchers

Forkers

arslan-z zhangzw12319 chaomath collector-m poodarchu daxiongpro mengxingshifen1218 yuhuang-ca dl19940602 neophack springking1990 lji72 24werewolf tanjingme anhquancao daydreamer2023 kd6696 hongbo123467 bingxinhu chandanpanda tuskaw shoudle linxiyuan junyang0412 hiepbk jinjinhong mccandless bearx huaifeng1993 mjmjmtl chengwei920412 bennyustc weisui-ad gulpfire gameinskysky wkmvg george0407 jsutcheng kejingjing88212 vijaydl ai-jie01 jiaxin-pan flankersky brqiankun lunwk brf0915 sean-wade haoxiaoshuai1024 vin9196 cityrainflow kongan semantic-flicar kp-forks superdiode ngsford zhuangmingxi jdc08161063 charmve huang-yh whuhxb avinashay lidarsensorman jie311 mariosailor mr-d-self-driving xuexigua jeffei virtualxqf plum7ree amazingzby seabird-go mondylau8 lycsqq zhangym127 gitzhaoo ladissonlai chisyliu zhumingxu erichoang rootzzp dongshen6 avi9700 docedrus qshuiqing sathiiii j6mesqian byounghun24 kwonyoung9120 jo-wang masterhow gerty168 ultrahertzz

tpvformer's Issues

OOM when training TPVFormer for 3D semantic occupancy prediction task on A100 with 40G GPU memory

Hi! I want to train the semantic occupancy prediciton task, but after I run the following command:

bash launcher.sh config/tpv04_occupancy.py out/tpv_occupancy --lovasz-input voxel

I got an OOM error, I'm a little confused because in the README you say that you successfully trained with RTX3090, any suggestions?

Question on hybrid ref pt 2d

        ref_2d_hw = self.ref_2d_hw.clone().expand(bs, -1, -1, -1)
        hybird_ref_2d = torch.cat([ref_2d_hw, ref_2d_hw], 0)

Firstly appreciate your work,
and I wonder why you concatenate the ref pts of hw for 2 layers as shown above.
Thanks a lot.

Can the model run on a small dataset（like version='v1.0-mini'）?

Hi, I have been trying to use your model for Lidar Segmentation task. However, I only have download a small dataset nuscenes v1.0-mini, and I am wondering if this is enough to test the performance of the model. Have you tested the model on a small dataset before? If so, could you provide me with some advice or guidance?

Thank you!

why you have so many name for the transformer

Hi, thanks for sharing the code and paper. After reading the code and the paper, I'm confused with many name like ICA、CVHA、HCAB、HAB。but in code，it looks like there are only two kind transformer layer，selfattention and crossattention。what‘s more，why the transformer config have different in lidarseg and occupancy config，should they only have difference on output？
thanks for kindly reply。

train on v1.0-mini ??

I want train on v1.0-mini dataset, how can I generate thenuscenes_infos_train.pkl/nuscenes_infos_val.pkl of v1.0-mini dataset

Question about Cross-view Hybrid attention

Thanks for sharing the great work.

Regarding to Cross-view Hybrid attention, is it only apllied for the HW top plane?

TPVFormer/tpvformer04/modules/tpvformer_layer.py

Line 172 in 2073589

query[0],

The query is itself, key and value are both none while later in cross-view hybrid attention the value is set to be the concatenation of queries

TPVFormer/tpvformer04/modules/cross_view_hybrid_attention.py

Line 163 in 2073589

value = torch.cat([query, query], 0)

Core dump in visulization

Meet coredump in visulization

(open-mmlab) ~/TPVFormer$ python visualization/vis_scene.py     --py-config config/tpv04_occupancy.py     --work-dir out/tpv_occupancy     --ckpt-path ckpts/tpv04_occupancy_v2.pth     --save-path out/tpv_occupancy/videos     --scene-name scene-0916
QObject::moveToThread: Current thread (0x49973b0) is not the object's thread (0x53adbe0).
Cannot move to target thread (0x49973b0)

qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/home/qihaoh/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/cv2/qt/plugins" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: xcb, eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-egl, wayland-xcomposite-glx, webgl.

Aborted (core dumped)

When will the ssc codes be released?

We want to follow your exciting work.

When can you release the ssc part?

Question about the label

What is the label of the network? How do you deal with it? Thanks to the author.

ERR| vtkXOpenGLRenderWindow (0x11f80320): Could not find a decent config

请问有遇到这个问题么，在运行“2. generate individual video frames”部分的指令时,碰到了这个问题。
`
visualizing scene-0916
/home/zzh/DataStack_2T/workSpace/TPVFormer/visualization/dataset.py:73: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
sweep_cams = np.array(sweep_cams)
/home/zzh/DataStack_2T/workSpace/TPVFormer/visualization/dataset.py:74: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
sweep_tss = np.array(sweep_tss)
236
processing frame 0 of scene 0
/home/zzh/anaconda3/envs/TPVFormer/lib/python3.8/site-packages/torch/utils/checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
2955
2023-05-01 17:19:17.196 ( 36.717s) [ E9797740]vtkXOpenGLRenderWindow.:266 ERR| vtkXOpenGLRenderWindow (0x11f80320): Could not find a decent config

2023-05-01 17:19:17.196 ( 36.717s) [ E9797740]vtkXOpenGLRenderWindow.:484 ERR| vtkXOpenGLRenderWindow (0x11f80320): Could not find a decent visual

Aborted (core dumped)
`

Dataset Organization

Hello, I have a specific question about the organization for dataset folders.
Inside lidarseg folder, it is correct to have following folders also?

├── TPVFormer/data
│      ├── nuscenes
│              └── lidarseg
│                      └── v1.0-trainval
│                      └── v1.0-test
│                      └── v1.0-mini

I'm asking you because it keeps producing the following error:
assert table_name in self.table_names, "Table {} not found".format(table_name)

How to apply TPVFormer to a different set of camera configurations?

Hi,

I'm trying to run TPVFormer on a vehicle with only 5 cameras for occupancy prediction, but the code is throwing size mismatch error while loading model weights:

" copying a param with shape torch.Size([6, 256]) from checkpoint, the shape in current model is torch.Size([5, 256])."

Any suggestions on how to bypass this problem, other than creating a dummy image with rgb values being 0?

Many thanks!

data structure

Hi I am trying to train tpvformer. Can you provide the detailed data structure for nuscnes (inculding lidarseg)?

Question about train/val pickle files.

Thanks for your great work! I have a question about train/val pickle files, is there any difference between the pickle file you provided and the pickle file generated by mmdet3d? Can I use the mmdet3d pickle files instead?

Loss and grad_norm are 'nan' when training

Hi! I want to train the semantic occupancy prediciton task, but after I run the following command:

python train.py --py-config config/tpv04_occupancy.py --work-dir out/tpv_my_occupancy

I meet a nan and some strange grad_norm as follow，could you give me some suggestions?

why do val first?

Maybe I don't understand your paper well. In the train.py, why you do val first and do (train,val) again? What's the use of the val procedure before train?

How to debug TPVFormer with ipdb?

I need to make improvements to the TPVFormer. How can I debug the code?

Minimum computer configuration for inference stage？

@huang-yh Thanks for your nice work!
if I just want to run the inference code and visualize the result, whether my PC configuration is enough?
Memory: 32G , GPU: 3070 8G?

tpv04 setting Lidar Segmentation Pretrain

Hi, thanks for the great job.

could you release the weights for Lidar Segmentation in tpv04 setting? tpv10 setting is too large for 3090.

Request for model weights

I really like your project, and I believe these weights would be very helpful for my research. If it's convenient, could you provide a weights file（your best model pth）?

TypeError: No matching definition for argument type(s) array(uint8, 3d, C), array(int32, 2d, C).

Thanks for your excellent work.
But, I am unable to successfully run any scripts on v1.0-mini.
I made sure to look through previous issues, downloaded the corresponding .pkl file and followed the instructions to execute the script. However, I always encounter a problem on line 100 of dataset_wrapper.py (update to line 91):
processed_label = nb_process_label(np.copy(processed_label), label_voxel_pair)
raise TypeError: No matching definition for argument type(s) array(uint8, 3d, C), array(int32, 2d, C).

Semantic Scene Completion

Can you release the code in semantic scene completion?

FileNotFoundError:

运行visualization时FileNotFoundError: [Errno 2] 没有那个文件或目录: 'out/tpv_occupancy/latest.pth'
如果把路径设置成pretrain.pth就会 Unexpected key(s) erro
请问有用预训练好的ckpt可视化的方法吗? 感谢

TPVFormer's proposed CVHA which is in the paper is not implemented in the code

Hi, Firstly, I would like to express my appreciation for the impressive work you have presented in your recent paper on TPVFormer. The concept of utilizing a tri-perspective view (TPV) representation and the proposed CVHA (Cross-View Hybrid Attention) mechanism for information exchange between different views are both novel and intriguing.

After carefully examining the code implementation provided in the TPVFormer repository, I noticed that the CVHA mechanism, as described in the paper, is not fully implemented (this was also asked in #29). The code only includes the self-attention mechanism on the HW plane but does not incorporate the cross-view hybrid attention (TPV self-attention) as outlined in the paper. I would like to kindly inquire about the following questions (different from #29):

Have you tried implementing the CVHA? (If you did, why didn't you include it in the code even as a commented-out part so that people could test it?)
Doesn't CVHA make any difference in terms of performance?
Or else is it GPU memory consumption reasons that you decided to avoid CVHA?

Thanks!

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/A9dCpjHPfE or add me on WeChat (ID: van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

	OpenMMLab 1.0 branch	OpenMMLab 2.0 branch
MMEngine		0.x
MMCV	1.x	2.x
MMDetection	0.x 、1.x、2.x	3.x
MMAction2	0.x	1.x
MMClassification	0.x	1.x
MMSegmentation	0.x	1.x
MMDetection3D	0.x	1.x
MMEditing	0.x	1.x
MMPose	0.x	1.x
MMDeploy	0.x	1.x
MMTracking	0.x	1.x
MMOCR	0.x	1.x
MMRazor	0.x	1.x
MMSelfSup	0.x	1.x
MMRotate	0.x	1.x
MMYOLO		0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

Problem with visualization

Hello, I am trying to reproduce the result with yours weights, but i have a problem with the output. This is the result, that change a bit from yours result. Why could it be? Thanks.

I am encountering the following error. What should I do? (regarding nuScenes-lidarseg)

Hello, I have read your message.
I would like to visualize the data following the instructions in the Visualization/readme.md file. I have executed the following command:

python visualization/dump_pkl.py --src-path data/nuscenes_infos_val.pkl --dst-path data/nuscenes_infos_val_scene.pkl --data-path data/nuscenes

I have a question regarding the site you provided where I downloaded the nuScenes-lidarseg data and moved it to the "data" folder. However, I am encountering the following error. What should I do?

If there are any additional data files that need to be downloaded, please let me know

[Errno 2] No such file or directory: 'data/nuscenes/v1.0-trainval/visibility.json',
[Errno 2] No such file or directory: 'data/nuscenes/v1.0-trainval/attribute.json'

Applicability in the Maritime Domain (Open Waters or Docks)

Hi @wzzheng,

I'm very interested in your work, as I think it paves the way for more open-source camera-only experimentation in 3D occupancy mapping.

I have two questions for you:

In the comparison table with Tesla occupancy network, you reported a large discrepancy in terms of inference speed (10 ms vs. 290 ms), which would make the model not suitable for downstream real-time tasks. Why does the forward pass take so long? Maybe it's just due to the hardware and there is no actual noticeable discrepancy in place.
Could the methodology be applicable outside the urban domain, where the appearance of the outer environment could have drastically different properties (besides updating the semantic labels for the 3D occupancy)? In particular, we are experimenting autonomous navigation for unmmaned surface vehicles (USVs), which should be able to handle both docks scenarios as well as open waters. Thus far, we have always been quite limited in the complexity of the models, as the dataset that we have has roughly the same cardinality as yours, which was regarded as too small. Focusing on the maritime domain, if docks is not too dissimilar from an urban scenario in terms of point cloud density, open waters is orders of magnitude sparser, while retaining few objects that could provide meaningful training signal, besides the water surface, albeit being simpler to model. Would that cope well with the TPVformer model, or are there any concerns in the implementation you think would need to be addressed?

请问怎么理解 num_points_in_pillar=[4, 32, 32] 这个参数？

 ref_3d_hw = self.get_reference_points(tpv_h, tpv_w, pc_range[5]-pc_range[2], num_points_in_pillar[0], '3d', device='cpu')#【1,4,10000,3】

        ref_3d_zh = self.get_reference_points(tpv_z, tpv_h, pc_range[3]-pc_range[0], num_points_in_pillar[1], '3d', device='cpu')
        ref_3d_zh = ref_3d_zh.permute(3, 0, 1, 2)[[2, 0, 1]]
        ref_3d_zh = ref_3d_zh.permute(1, 2, 3, 0)#【1,32,800,3】

        ref_3d_wz = self.get_reference_points(tpv_w, tpv_z, pc_range[4]-pc_range[1], num_points_in_pillar[2], '3d', device='cpu')
        ref_3d_wz = ref_3d_wz.permute(3, 0, 1, 2)[[1, 2, 0]]
        ref_3d_wz = ref_3d_wz.permute(1, 2, 3, 0)#【1,32,800,3】

最后的3表示的是xyz，但是前面的4,32,32这个维度表示的是？

Unable to increase `batch_size`

I noticed that your code was written with the assumption that batch_size = 1, but when I increased the batch_size, it resulted in dimension errors. I want to know why batch_size is limited to 1.
If it cannot be increased, it will not be possible to more efficiently utilize my device resources.

TPVFormer/dataloader/dataset_wrapper.py

Lines 116 to 127 in bbed188

    
           def custom_collate_fn(data): 
        
               img2stack = np.stack([d[0] for d in data]).astype(np.float32) 
        
               meta2stack = [d[1] for d in data] 
        
               label2stack = np.stack([d[2] for d in data]).astype(np.int) 
        
               # because we use a batch size of 1, so we can stack these tensor together. 
        
               grid_ind_stack = np.stack([d[3] for d in data]).astype(np.float) 
        
               point_label = np.stack([d[4] for d in data]).astype(np.int) 
        
               return torch.from_numpy(img2stack), \ 
        
                   meta2stack, \ 
        
                   torch.from_numpy(label2stack), \ 
        
                   torch.from_numpy(grid_ind_stack), \ 
        
                   torch.from_numpy(point_label)

在mini数据集上训练

assert table_name in self.table_names, "Table {} not found".format(table_name)

AssertionError: Table lidarseg not found
出现下面的报错信息，请问有办法解决吗

the performance of your released occupancy model tpv04_occupancy

hello, i run the inference of tpv04_occupancy with command, the gpu i used is v100

python eval.py --py-config config/tpv04_occupancy.py --ckpts ckpt/tpv04_occupancy_v2.pth

the performance on nuscenes is shown in the picture.

the performance is not matched with paper released and could you please explain the evaluation metrics of miou vox/pts, i'm confused about it.

Code for SemanticKITTI

The paper includes a table comparing performance on the SemanticKITTI dataset. Are there plans to release the code for training using the SemanticKITTI dataset?

Question about 3D OCC task training

Hi Author,

For the 3D occupancy prediction task training, do we still need to set ignore_index=0 when initiate the cross_entroy loss function? In the paper, you said "pseudo-per-voxel labels were generated from sparse point cloud by assigning a new label of empty to any voxel that does not contain any point, and we use voxel predictions as input to both lovasz-softmax and cross-entropy losses." Does that mean for the [100,100,8] 3D volume, we set all rest voxel 's label to 0 as empty label? I found the occupied voxels whose label generate from sparse lidar point, only has around 1000~2300 in total. This approach will cause serious class imbalance. Can author provide more detail information about empty voxel label generation for the 3D occupancy prediction task?

decoder部分为什么要用ckpt进行导入fc layer

如题，这样会有什么显存节省上的差异吗？

How to apply the BEVFormer to the 3D OCC task？

Thanks for your great work !
I am particularly interested in details about how to apply the 2D BEVFormer to the 3D OCC task？

License?

Hi, could you add a software license to this repo? Thank you!

论文中的一个笔误

作者你好，在你们论文的4.5小节中说TPVFormer参数量是6.0M，Monoscene参数量是15.7M，我认为这应该是写错了，Backbone都已经不止这个参数量了，另外我自己也测过，semanticKitti的Monoscene模型参数量是149M。

八卡3090训练的问题

首先感谢您非常好的工作。
我复现的过程中，按照Train TPVFormer for lidar segmentation task on 3090 with 24G GPU memory.的步骤，四卡GPU训练时可以正常训练，但是使用八卡3090时会出现“_pickle.UnpicklingError: pickle data was truncated”的问题，请问是哪里出了问题呢

TPVFormer-Small config files

Dear authors,

first, please let me thank you for your great work.
Can I please ask you to share the config files for TPVFormer-Small for the tasks of 3D semantic occupancy prediction?

Thank you very much in advance.

FileNotFoundError: [Errno 2] No such file or directory: 'data/nuscenes/v1.0-trainval/attribute.json'

I followed the installation instructions in the README, but I encountered some issues during the dataset installation step.

I downloaded the dataset from the nuScenes website, but I wasn't sure which files I needed to download. I made an educated guess and downloaded the following:

nuimages-v1.0-all-samples.tgz
nuimages-v1.0-all-sweeps-cam-back-left.tgz
nuimages-v1.0-all-sweeps-cam-back-right.tgz
nuimages-v1.0-all-sweeps-cam-back.tgz
nuimages-v1.0-all-sweeps-cam-front-left.tgz
nuimages-v1.0-all-sweeps-cam-front-right.tgz
nuimages-v1.0-all-sweeps-cam-front.tgz
nuplan-maps-v1.0.zip
nuScenes-lidarseg-all-v1.0.tar

I extracted these datasets to the specified directory.

However, when I tried to run either the train or eval.py scripts, I encountered the following issue:

Namespace(gpus=8, py_config='config/tpv04_occupancy.py', resume_from='', work_dir='out/tpv_occupancy')
tcp://127.0.0.1:20506
tcp://127.0.0.1:20506
tcp://127.0.0.1:20506
tcp://127.0.0.1:20506
tcp://127.0.0.1:20506
tcp://127.0.0.1:20506
tcp://127.0.0.1:20506
tcp://127.0.0.1:20506

..................

2023-04-21 00:11:26,596 - mmseg - INFO - Config:
2023-04-21 00:13:02,052 - mmseg - INFO - initialize FPN with init_cfg {'type': 'Xavier', 'layer': 'Conv2d', 'distribution': 'uniform'}
2023-04-21 00:13:03,511 - mmseg - INFO - Number of params: 62465552
done ddp model

Loading NuScenes tables for version v1.0-trainval...
Traceback (most recent call last):
File "train.py", line 401, in
torch.multiprocessing.spawn(main, args=(args,), nprocs=args.gpus)
File "/root/miniconda3/envs/cat_TPV/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/root/miniconda3/envs/cat_TPV/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/root/miniconda3/envs/cat_TPV/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

Process 6 terminated with the following error:
Traceback (most recent call last):
File "/root/miniconda3/envs/cat_TPV/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/root/data01/zzy/TPVFormer/train.py", line 99, in main
data_builder.build(
File "/root/data01/zzy/TPVFormer/builder/data_builder.py", line 20, in build
nusc = NuScenes(version=version, dataroot=data_path, verbose=True)
File "/root/miniconda3/envs/cat_TPV/lib/python3.8/site-packages/nuscenes/nuscenes.py", line 70, in init
self.attribute = self.load_table('attribute')
File "/root/miniconda3/envs/cat_TPV/lib/python3.8/site-packages/nuscenes/nuscenes.py", line 136, in load_table
with open(osp.join(self.table_root, '{}.json'.format(table_name))) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/nuscenes/v1.0-trainval/attribute.json'

Since I am not entirely certain if I downloaded the correct dataset, I would like to ask if what I did was correct.

No module named 'mmcv._ext'

我按照给的环境配置，最后一直出现这个错误，mmcv-full==1.4.0我通过pip安装不上，最后是通过mim install mmcv==1.4.0装上的，但是一直报错No module named 'mmcv._ext'，请问怎么解决

Code

Hello author, I'm very interested in your research, and I want to know when the code will go to open source？

Enquiry the paper and code

Hello author, I'm very interested in your research, and I want to know when this paper will be published, and the code will go to open source—looking forward to seeing the paper and source code.

请问occupancy有大model的config么？

作者您好，首先非常感谢做出这样优秀的作品。
paper中提到训练occupancy时的TPV resolution是200x200x16，并且dim是128，然而在tpv04_occupancy.py中，TPV resolution是100x100x8, dim是256:

请问可以重新上传一个config么？最好和paper保持一致，这样方便大家复现。
万分感谢~！

（顺便说一下，新上传的可视化代码在visualization文件夹中，但是有一些package import用的是visualize这个词，一个minor bug，请知晓）

	def custom_collate_fn(data):
	img2stack = np.stack([d[0] for d in data]).astype(np.float32)
	meta2stack = [d[1] for d in data]
	label2stack = np.stack([d[2] for d in data]).astype(np.int)
	# because we use a batch size of 1, so we can stack these tensor together.
	grid_ind_stack = np.stack([d[3] for d in data]).astype(np.float)
	point_label = np.stack([d[4] for d in data]).astype(np.int)
	return torch.from_numpy(img2stack), \
	meta2stack, \
	torch.from_numpy(label2stack), \
	torch.from_numpy(grid_ind_stack), \
	torch.from_numpy(point_label)