Giter Site home page Giter Site logo

tpvformer's People

Contributors

huang-yh avatar wzzheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tpvformer's Issues

Question on hybrid ref pt 2d

        ref_2d_hw = self.ref_2d_hw.clone().expand(bs, -1, -1, -1)
        hybird_ref_2d = torch.cat([ref_2d_hw, ref_2d_hw], 0)

Firstly appreciate your work,
and I wonder why you concatenate the ref pts of hw for 2 layers as shown above.
Thanks a lot.

Can the model run on a small dataset(like version='v1.0-mini')?

Hi, I have been trying to use your model for Lidar Segmentation task. However, I only have download a small dataset nuscenes v1.0-mini, and I am wondering if this is enough to test the performance of the model. Have you tested the model on a small dataset before? If so, could you provide me with some advice or guidance?

Thank you!

why you have so many name for the transformer

Hi, thanks for sharing the code and paper. After reading the code and the paper, I'm confused with many name like ICA、CVHA、HCAB、HAB。but in code,it looks like there are only two kind transformer layer,selfattention and crossattention。what‘s more,why the transformer config have different in lidarseg and occupancy config,should they only have difference on output?
thanks for kindly reply。

train on v1.0-mini ??

I want train on v1.0-mini dataset, how can I generate thenuscenes_infos_train.pkl/nuscenes_infos_val.pkl of v1.0-mini dataset

Question about Cross-view Hybrid attention

Thanks for sharing the great work.

Regarding to Cross-view Hybrid attention, is it only apllied for the HW top plane?

The query is itself, key and value are both none while later in cross-view hybrid attention the value is set to be the concatenation of queries

value = torch.cat([query, query], 0)

Core dump in visulization

Meet coredump in visulization

(open-mmlab) ~/TPVFormer$ python visualization/vis_scene.py     --py-config config/tpv04_occupancy.py     --work-dir out/tpv_occupancy     --ckpt-path ckpts/tpv04_occupancy_v2.pth     --save-path out/tpv_occupancy/videos     --scene-name scene-0916
QObject::moveToThread: Current thread (0x49973b0) is not the object's thread (0x53adbe0).
Cannot move to target thread (0x49973b0)

qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/home/qihaoh/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/cv2/qt/plugins" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: xcb, eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-egl, wayland-xcomposite-glx, webgl.

Aborted (core dumped)

ERR| vtkXOpenGLRenderWindow (0x11f80320): Could not find a decent config

请问有遇到这个问题么,在运行“2. generate individual video frames”部分的指令时,碰到了这个问题。
`
visualizing scene-0916
/home/zzh/DataStack_2T/workSpace/TPVFormer/visualization/dataset.py:73: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
sweep_cams = np.array(sweep_cams)
/home/zzh/DataStack_2T/workSpace/TPVFormer/visualization/dataset.py:74: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
sweep_tss = np.array(sweep_tss)
236
processing frame 0 of scene 0
/home/zzh/anaconda3/envs/TPVFormer/lib/python3.8/site-packages/torch/utils/checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
2955
2023-05-01 17:19:17.196 ( 36.717s) [ E9797740]vtkXOpenGLRenderWindow.:266 ERR| vtkXOpenGLRenderWindow (0x11f80320): Could not find a decent config

2023-05-01 17:19:17.196 ( 36.717s) [ E9797740]vtkXOpenGLRenderWindow.:484 ERR| vtkXOpenGLRenderWindow (0x11f80320): Could not find a decent visual

Aborted (core dumped)
`

Dataset Organization

Hello, I have a specific question about the organization for dataset folders.
Inside lidarseg folder, it is correct to have following folders also?

├── TPVFormer/data
│      ├── nuscenes
│              └── lidarseg
│                      └── v1.0-trainval
│                      └── v1.0-test
│                      └── v1.0-mini

I'm asking you because it keeps producing the following error:
assert table_name in self.table_names, "Table {} not found".format(table_name)

How to apply TPVFormer to a different set of camera configurations?

Hi,

I'm trying to run TPVFormer on a vehicle with only 5 cameras for occupancy prediction, but the code is throwing size mismatch error while loading model weights:

" copying a param with shape torch.Size([6, 256]) from checkpoint, the shape in current model is torch.Size([5, 256])."

Any suggestions on how to bypass this problem, other than creating a dummy image with rgb values being 0?

Many thanks!

data structure

Hi I am trying to train tpvformer. Can you provide the detailed data structure for nuscnes (inculding lidarseg)?

Question about train/val pickle files.

Thanks for your great work! I have a question about train/val pickle files, is there any difference between the pickle file you provided and the pickle file generated by mmdet3d? Can I use the mmdet3d pickle files instead?

Loss and grad_norm are 'nan' when training

Hi! I want to train the semantic occupancy prediciton task, but after I run the following command:

python train.py --py-config config/tpv04_occupancy.py --work-dir out/tpv_my_occupancy

I meet a nan and some strange grad_norm as follow,could you give me some suggestions?
image

why do val first?

Maybe I don't understand your paper well. In the train.py, why you do val first and do (train,val) again? What's the use of the val procedure before train?

Request for model weights

I really like your project, and I believe these weights would be very helpful for my research. If it's convenient, could you provide a weights file(your best model pth)?

TypeError: No matching definition for argument type(s) array(uint8, 3d, C), array(int32, 2d, C).

Thanks for your excellent work.
But, I am unable to successfully run any scripts on v1.0-mini.
I made sure to look through previous issues, downloaded the corresponding .pkl file and followed the instructions to execute the script. However, I always encounter a problem on line 100 of dataset_wrapper.py (update to line 91):
processed_label = nb_process_label(np.copy(processed_label), label_voxel_pair)
raise TypeError: No matching definition for argument type(s) array(uint8, 3d, C), array(int32, 2d, C).

FileNotFoundError:

运行visualization时FileNotFoundError: [Errno 2] 没有那个文件或目录: 'out/tpv_occupancy/latest.pth'
如果把路径设置成pretrain.pth就会 Unexpected key(s) erro
请问有用预训练好的ckpt可视化的方法吗? 感谢

TPVFormer's proposed CVHA which is in the paper is not implemented in the code

Hi, Firstly, I would like to express my appreciation for the impressive work you have presented in your recent paper on TPVFormer. The concept of utilizing a tri-perspective view (TPV) representation and the proposed CVHA (Cross-View Hybrid Attention) mechanism for information exchange between different views are both novel and intriguing.

After carefully examining the code implementation provided in the TPVFormer repository, I noticed that the CVHA mechanism, as described in the paper, is not fully implemented (this was also asked in #29). The code only includes the self-attention mechanism on the HW plane but does not incorporate the cross-view hybrid attention (TPV self-attention) as outlined in the paper. I would like to kindly inquire about the following questions (different from #29):

  1. Have you tried implementing the CVHA? (If you did, why didn't you include it in the code even as a commented-out part so that people could test it?)
  2. Doesn't CVHA make any difference in terms of performance?
  3. Or else is it GPU memory consumption reasons that you decided to avoid CVHA?

Thanks!

Welcome update to OpenMMLab 2.0

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/A9dCpjHPfE or add me on WeChat (ID: van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

OpenMMLab 1.0 branch OpenMMLab 2.0 branch
MMEngine 0.x
MMCV 1.x 2.x
MMDetection 0.x 、1.x、2.x 3.x
MMAction2 0.x 1.x
MMClassification 0.x 1.x
MMSegmentation 0.x 1.x
MMDetection3D 0.x 1.x
MMEditing 0.x 1.x
MMPose 0.x 1.x
MMDeploy 0.x 1.x
MMTracking 0.x 1.x
MMOCR 0.x 1.x
MMRazor 0.x 1.x
MMSelfSup 0.x 1.x
MMRotate 0.x 1.x
MMYOLO 0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

Problem with visualization

Hello, I am trying to reproduce the result with yours weights, but i have a problem with the output. This is the result, that change a bit from yours result. Why could it be? Thanks.
vis_1533151605047769
vis_1533151605548192

I am encountering the following error. What should I do? (regarding nuScenes-lidarseg)

Hello, I have read your message.
I would like to visualize the data following the instructions in the Visualization/readme.md file. I have executed the following command:

python visualization/dump_pkl.py --src-path data/nuscenes_infos_val.pkl --dst-path data/nuscenes_infos_val_scene.pkl --data-path data/nuscenes

I have a question regarding the site you provided where I downloaded the nuScenes-lidarseg data and moved it to the "data" folder. However, I am encountering the following error. What should I do?

If there are any additional data files that need to be downloaded, please let me know

[Errno 2] No such file or directory: 'data/nuscenes/v1.0-trainval/visibility.json',
[Errno 2] No such file or directory: 'data/nuscenes/v1.0-trainval/attribute.json'

Applicability in the Maritime Domain (Open Waters or Docks)

Hi @wzzheng,

I'm very interested in your work, as I think it paves the way for more open-source camera-only experimentation in 3D occupancy mapping.

I have two questions for you:

  1. In the comparison table with Tesla occupancy network, you reported a large discrepancy in terms of inference speed (10 ms vs. 290 ms), which would make the model not suitable for downstream real-time tasks. Why does the forward pass take so long? Maybe it's just due to the hardware and there is no actual noticeable discrepancy in place.
  2. Could the methodology be applicable outside the urban domain, where the appearance of the outer environment could have drastically different properties (besides updating the semantic labels for the 3D occupancy)? In particular, we are experimenting autonomous navigation for unmmaned surface vehicles (USVs), which should be able to handle both docks scenarios as well as open waters. Thus far, we have always been quite limited in the complexity of the models, as the dataset that we have has roughly the same cardinality as yours, which was regarded as too small. Focusing on the maritime domain, if docks is not too dissimilar from an urban scenario in terms of point cloud density, open waters is orders of magnitude sparser, while retaining few objects that could provide meaningful training signal, besides the water surface, albeit being simpler to model. Would that cope well with the TPVformer model, or are there any concerns in the implementation you think would need to be addressed?

请问怎么理解 num_points_in_pillar=[4, 32, 32] 这个参数?

 ref_3d_hw = self.get_reference_points(tpv_h, tpv_w, pc_range[5]-pc_range[2], num_points_in_pillar[0], '3d', device='cpu')#【1,4,10000,3】

        ref_3d_zh = self.get_reference_points(tpv_z, tpv_h, pc_range[3]-pc_range[0], num_points_in_pillar[1], '3d', device='cpu')
        ref_3d_zh = ref_3d_zh.permute(3, 0, 1, 2)[[2, 0, 1]]
        ref_3d_zh = ref_3d_zh.permute(1, 2, 3, 0)#【1,32,800,3】

        ref_3d_wz = self.get_reference_points(tpv_w, tpv_z, pc_range[4]-pc_range[1], num_points_in_pillar[2], '3d', device='cpu')
        ref_3d_wz = ref_3d_wz.permute(3, 0, 1, 2)[[1, 2, 0]]
        ref_3d_wz = ref_3d_wz.permute(1, 2, 3, 0)#【1,32,800,3】

最后的3表示的是xyz,但是前面的4,32,32这个维度表示的是?

Unable to increase `batch_size`

I noticed that your code was written with the assumption that batch_size = 1, but when I increased the batch_size, it resulted in dimension errors. I want to know why batch_size is limited to 1.
If it cannot be increased, it will not be possible to more efficiently utilize my device resources.

def custom_collate_fn(data):
img2stack = np.stack([d[0] for d in data]).astype(np.float32)
meta2stack = [d[1] for d in data]
label2stack = np.stack([d[2] for d in data]).astype(np.int)
# because we use a batch size of 1, so we can stack these tensor together.
grid_ind_stack = np.stack([d[3] for d in data]).astype(np.float)
point_label = np.stack([d[4] for d in data]).astype(np.int)
return torch.from_numpy(img2stack), \
meta2stack, \
torch.from_numpy(label2stack), \
torch.from_numpy(grid_ind_stack), \
torch.from_numpy(point_label)

在mini数据集上训练

assert table_name in self.table_names, "Table {} not found".format(table_name)

AssertionError: Table lidarseg not found
出现下面的报错信息,请问有办法解决吗

the performance of your released occupancy model tpv04_occupancy

image
hello, i run the inference of tpv04_occupancy with command, the gpu i used is v100

python eval.py --py-config config/tpv04_occupancy.py --ckpts ckpt/tpv04_occupancy_v2.pth

the performance on nuscenes is shown in the picture.

the performance is not matched with paper released and could you please explain the evaluation metrics of miou vox/pts, i'm confused about it.

Code for SemanticKITTI

The paper includes a table comparing performance on the SemanticKITTI dataset. Are there plans to release the code for training using the SemanticKITTI dataset?

Question about 3D OCC task training

Hi Author,

For the 3D occupancy prediction task training, do we still need to set ignore_index=0 when initiate the cross_entroy loss function? In the paper, you said "pseudo-per-voxel labels were generated from sparse point cloud by assigning a new label of empty to any voxel that does not contain any point, and we use voxel predictions as input to both lovasz-softmax and cross-entropy losses." Does that mean for the [100,100,8] 3D volume, we set all rest voxel 's label to 0 as empty label? I found the occupied voxels whose label generate from sparse lidar point, only has around 1000~2300 in total. This approach will cause serious class imbalance. Can author provide more detail information about empty voxel label generation for the 3D occupancy prediction task?

License?

Hi, could you add a software license to this repo? Thank you!

论文中的一个笔误

作者你好,在你们论文的4.5小节中说TPVFormer参数量是6.0M,Monoscene参数量是15.7M,我认为这应该是写错了,Backbone都已经不止这个参数量了,另外我自己也测过,semanticKitti的Monoscene模型参数量是149M。
360截图20230414112553845

八卡3090训练的问题

首先感谢您非常好的工作。
我复现的过程中,按照Train TPVFormer for lidar segmentation task on 3090 with 24G GPU memory.的步骤,四卡GPU训练时可以正常训练,但是使用八卡3090时会出现“_pickle.UnpicklingError: pickle data was truncated”的问题,请问是哪里出了问题呢

TPVFormer-Small config files

Dear authors,

first, please let me thank you for your great work.
Can I please ask you to share the config files for TPVFormer-Small for the tasks of 3D semantic occupancy prediction?

Thank you very much in advance.

FileNotFoundError: [Errno 2] No such file or directory: 'data/nuscenes/v1.0-trainval/attribute.json'

I followed the installation instructions in the README, but I encountered some issues during the dataset installation step.

I downloaded the dataset from the nuScenes website, but I wasn't sure which files I needed to download. I made an educated guess and downloaded the following:

nuimages-v1.0-all-samples.tgz
nuimages-v1.0-all-sweeps-cam-back-left.tgz
nuimages-v1.0-all-sweeps-cam-back-right.tgz
nuimages-v1.0-all-sweeps-cam-back.tgz
nuimages-v1.0-all-sweeps-cam-front-left.tgz
nuimages-v1.0-all-sweeps-cam-front-right.tgz
nuimages-v1.0-all-sweeps-cam-front.tgz
nuplan-maps-v1.0.zip
nuScenes-lidarseg-all-v1.0.tar

I extracted these datasets to the specified directory.

However, when I tried to run either the train or eval.py scripts, I encountered the following issue:

Namespace(gpus=8, py_config='config/tpv04_occupancy.py', resume_from='', work_dir='out/tpv_occupancy')
tcp://127.0.0.1:20506
tcp://127.0.0.1:20506
tcp://127.0.0.1:20506
tcp://127.0.0.1:20506
tcp://127.0.0.1:20506
tcp://127.0.0.1:20506
tcp://127.0.0.1:20506
tcp://127.0.0.1:20506

..................

2023-04-21 00:11:26,596 - mmseg - INFO - Config:
2023-04-21 00:13:02,052 - mmseg - INFO - initialize FPN with init_cfg {'type': 'Xavier', 'layer': 'Conv2d', 'distribution': 'uniform'}
2023-04-21 00:13:03,511 - mmseg - INFO - Number of params: 62465552
done ddp model

Loading NuScenes tables for version v1.0-trainval...
Traceback (most recent call last):
File "train.py", line 401, in
torch.multiprocessing.spawn(main, args=(args,), nprocs=args.gpus)
File "/root/miniconda3/envs/cat_TPV/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/root/miniconda3/envs/cat_TPV/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/root/miniconda3/envs/cat_TPV/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

Process 6 terminated with the following error:
Traceback (most recent call last):
File "/root/miniconda3/envs/cat_TPV/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/root/data01/zzy/TPVFormer/train.py", line 99, in main
data_builder.build(
File "/root/data01/zzy/TPVFormer/builder/data_builder.py", line 20, in build
nusc = NuScenes(version=version, dataroot=data_path, verbose=True)
File "/root/miniconda3/envs/cat_TPV/lib/python3.8/site-packages/nuscenes/nuscenes.py", line 70, in init
self.attribute = self.load_table('attribute')
File "/root/miniconda3/envs/cat_TPV/lib/python3.8/site-packages/nuscenes/nuscenes.py", line 136, in load_table
with open(osp.join(self.table_root, '{}.json'.format(table_name))) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/nuscenes/v1.0-trainval/attribute.json'

Since I am not entirely certain if I downloaded the correct dataset, I would like to ask if what I did was correct.

No module named 'mmcv._ext'

我按照给的环境配置,最后一直出现这个错误,mmcv-full==1.4.0我通过pip安装不上,最后是通过mim install mmcv==1.4.0装上的,但是一直报错No module named 'mmcv._ext',请问怎么解决

Code

Hello author, I'm very interested in your research, and I want to know when the code will go to open source?

Enquiry the paper and code

Hello author, I'm very interested in your research, and I want to know when this paper will be published, and the code will go to open source—looking forward to seeing the paper and source code.

请问occupancy有大model的config么?

作者您好,首先非常感谢做出这样优秀的作品。
paper中提到训练occupancy时的TPV resolution是200x200x16,并且dim是128,然而在tpv04_occupancy.py中,TPV resolution是100x100x8, dim是256:
image

请问可以重新上传一个config么?最好和paper保持一致,这样方便大家复现。
万分感谢~!

(顺便说一下,新上传的可视化代码在visualization文件夹中,但是有一些package import用的是visualize这个词,一个minor bug,请知晓)

Request for visualization code.

Hello! I wonder if you can release the code and colormap about the visualization of output 3D occupancy results. Thank you.

Question for pretrain model

Thanks for your great job
I want to test the performance in resnet34. But in your code, there is no branch for no-cpkt file.
can i just modify the code like this? if work, How many epoches of training do you recommend.
截图

Table lidarseg not found

File "/opt/conda/lib/python3.7/site-packages/nuscenes/nuscenes.py", line 214, in get
assert table_name in self.table_names, "Table {} not found".format(table_name)
AssertionError: Table lidarseg not found

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.