youmi-zym / completionformer Goto Github PK

View Code? Open in Web Editor NEW

212.0 212.0 20.0 210.58 MB

[CVPR2023] CompletionFormer: Depth Completion with Convolutions and Vision Transformers

License: MIT License

Python 60.48% Shell 0.01% C++ 8.43% Cuda 31.07%

completionformer's Introduction

👋 Hi, I’m Youmin Zhang
👀 I’m interested in Stereo Matching, SLAM, Monodepth Prediction/Completion
🌱 I’m currently a PhD student at University of Bologna
💞️ I’m looking for a research scientist position ...
📫 How to reach me: [email protected]

completionformer's People

Contributors

Stargazers

Watchers

Forkers

xiandaguo jlqzzz alexandor91 ch5374 felixzhang7 junjh byeongjun1022 kinggreat24 giaobruce arghyachatterjee liujiaxing7 hologerry hipforth chandlj

completionformer's Issues

kitti pretrained model mismatch

The model and loaded state dict do not match exactly

size mismatch for pos_embed1: copying a param with shape torch.Size([1, 3136, 64]) from checkpoint, the shape in current model is torch.Size([1, 12544, 64]).
size mismatch for patch_embed1.proj.weight: copying a param with shape torch.Size([64, 3, 4, 4]) from checkpoint, the shape in current model is torch.Size([64, 128, 2, 2]).
unexpected key in source state_dict: cls_token, norm.weight, norm.bias, head.weight, head.bias

missing keys in source state_dict: embed_layer1.0.conv1.weight, embed_layer1.0.bn1.weight, embed_layer1.0.bn1.bias, embed_layer1.0.bn1.running_mean, embed_layer1.0.bn1.running_var, embed_layer1.0.conv2.weight, embed_layer1.0.bn2.weight, embed_layer1.0.bn2.bias, embed_layer1.0.bn2.running_mean, embed_layer1.0.bn2.running_var, embed_layer1.1.conv1.weight, embed_layer1.1.bn1.weight, embed_layer1.1.bn1.bias, embed_layer1.1.bn1.running_mean, embed_layer1.1.bn1.running_var, embed_layer1.1.conv2.weight, embed_layer1.1.bn2.weight, embed_layer1.1.bn2.bias, embed_layer1.1.bn2.running_mean, embed_layer1.1.bn2.running_var, embed_layer1.2.conv1.weight, embed_layer1.2.bn1.weight, embed_layer1.2.bn1.bias, embed_layer1.2.bn1.running_mean, embed_layer1.2.bn1.running_var, embed_layer1.2.conv2.weight, embed_layer1.2.bn2.weight, embed_layer1.2.bn2.bias, embed_layer1.2.bn2.running_mean, embed_layer1.2.bn2.running_var, embed_layer2.0.conv1.weight, embed_layer2.0.bn1.weight, embed_layer2.0.bn1.bias, embed_layer2.0.bn1.running_mean, embed_layer2.0.bn1.running_var, embed_layer2.0.conv2.weight, embed_layer2.0.bn2.weight, embed_layer2.0.bn2.bias, embed_layer2.0.bn2.running_mean, embed_layer2.0.bn2.running_var, embed_layer2.0.downsample.0.weight, embed_layer2.0.downsample.1.weight, embed_layer2.0.downsample.1.bias, embed_layer2.0.downsample.1.running_mean, embed_layer2.0.downsample.1.running_var, embed_layer2.1.conv1.weight, embed_layer2.1.bn1.weight, embed_layer2.1.bn1.bias, embed_layer2.1.bn1.running_mean, embed_layer2.1.bn1.running_var, embed_layer2.1.conv2.weight, embed_layer2.1.bn2.weight, embed_layer2.1.bn2.bias, embed_layer2.1.bn2.running_mean, embed_layer2.1.bn2.running_var, embed_layer2.2.conv1.weight, embed_layer2.2.bn1.weight, embed_layer2.2.bn1.bias, embed_layer2.2.bn1.running_mean, embed_layer2.2.bn1.running_var, embed_layer2.2.conv2.weight, embed_layer2.2.bn2.weight, embed_layer2.2.bn2.bias, embed_layer2.2.bn2.running_mean, embed_layer2.2.bn2.running_var, embed_layer2.3.conv1.weight, embed_layer2.3.bn1.weight, embed_layer2.3.bn1.bias, embed_layer2.3.bn1.running_mean, embed_layer2.3.bn1.running_var, embed_layer2.3.conv2.weight, embed_layer2.3.bn2.weight, embed_layer2.3.bn2.bias, embed_layer2.3.bn2.running_mean, embed_layer2.3.bn2.running_var, block1.0.resblock.conv1.weight, block1.0.resblock.bn1.weight, block1.0.resblock.bn1.bias, block1.0.resblock.bn1.running_mean, block1.0.resblock.bn1.running_var, block1.0.resblock.conv2.weight, block1.0.resblock.bn2.weight, block1.0.resblock.bn2.bias, block1.0.resblock.bn2.running_mean, block1.0.resblock.bn2.running_var, block1.0.resblock.ca.fc.0.weight, block1.0.resblock.ca.fc.2.weight, block1.0.resblock.sa.conv1.weight, block1.0.concat_conv.weight, block1.1.resblock.conv1.weight, block1.1.resblock.bn1.weight, block1.1.resblock.bn1.bias, block1.1.resblock.bn1.running_mean, block1.1.resblock.bn1.running_var, block1.1.resblock.conv2.weight, block1.1.resblock.bn2.weight, block1.1.resblock.bn2.bias, block1.1.resblock.bn2.running_mean, block1.1.resblock.bn2.running_var, block1.1.resblock.ca.fc.0.weight, block1.1.resblock.ca.fc.2.weight, block1.1.resblock.sa.conv1.weight, block1.1.concat_conv.weight, block1.2.resblock.conv1.weight, block1.2.resblock.bn1.weight, block1.2.resblock.bn1.bias, block1.2.resblock.bn1.running_mean, block1.2.resblock.bn1.running_var, block1.2.resblock.conv2.weight, block1.2.resblock.bn2.weight, block1.2.resblock.bn2.bias, block1.2.resblock.bn2.running_mean, block1.2.resblock.bn2.running_var, block1.2.resblock.ca.fc.0.weight, block1.2.resblock.ca.fc.2.weight, block1.2.resblock.sa.conv1.weight, block1.2.concat_conv.weight, block2.0.resblock.conv1.weight, block2.0.resblock.bn1.weight, block2.0.resblock.bn1.bias, block2.0.resblock.bn1.running_mean, block2.0.resblock.bn1.running_var, block2.0.resblock.conv2.weight, block2.0.resblock.bn2.weight, block2.0.resblock.bn2.bias, block2.0.resblock.bn2.running_mean, block2.0.resblock.bn2.running_var, block2.0.resblock.ca.fc.0.weight, block2.0.resblock.ca.fc.2.weight, block2.0.resblock.sa.conv1.weight, block2.0.concat_conv.weight, block2.1.resblock.conv1.weight, block2.1.resblock.bn1.weight, block2.1.resblock.bn1.bias, block2.1.resblock.bn1.running_mean, block2.1.resblock.bn1.running_var, block2.1.resblock.conv2.weight, block2.1.resblock.bn2.weight, block2.1.resblock.bn2.bias, block2.1.resblock.bn2.running_mean, block2.1.resblock.bn2.running_var, block2.1.resblock.ca.fc.0.weight, block2.1.resblock.ca.fc.2.weight, block2.1.resblock.sa.conv1.weight, block2.1.concat_conv.weight, block2.2.resblock.conv1.weight, block2.2.resblock.bn1.weight, block2.2.resblock.bn1.bias, block2.2.resblock.bn1.running_mean, block2.2.resblock.bn1.running_var, block2.2.resblock.conv2.weight, block2.2.resblock.bn2.weight, block2.2.resblock.bn2.bias, block2.2.resblock.bn2.running_mean, block2.2.resblock.bn2.running_var, block2.2.resblock.ca.fc.0.weight, block2.2.resblock.ca.fc.2.weight, block2.2.resblock.sa.conv1.weight, block2.2.concat_conv.weight, block2.3.resblock.conv1.weight, block2.3.resblock.bn1.weight, block2.3.resblock.bn1.bias, block2.3.resblock.bn1.running_mean, block2.3.resblock.bn1.running_var, block2.3.resblock.conv2.weight, block2.3.resblock.bn2.weight, block2.3.resblock.bn2.bias, block2.3.resblock.bn2.running_mean, block2.3.resblock.bn2.running_var, block2.3.resblock.ca.fc.0.weight, block2.3.resblock.ca.fc.2.weight, block2.3.resblock.sa.conv1.weight, block2.3.concat_conv.weight, block3.0.resblock.conv1.weight, block3.0.resblock.bn1.weight, block3.0.resblock.bn1.bias, block3.0.resblock.bn1.running_mean, block3.0.resblock.bn1.running_var, block3.0.resblock.conv2.weight, block3.0.resblock.bn2.weight, block3.0.resblock.bn2.bias, block3.0.resblock.bn2.running_mean, block3.0.resblock.bn2.running_var, block3.0.resblock.ca.fc.0.weight, block3.0.resblock.ca.fc.2.weight, block3.0.resblock.sa.conv1.weight, block3.0.concat_conv.weight, block3.1.resblock.conv1.weight, block3.1.resblock.bn1.weight, block3.1.resblock.bn1.bias, block3.1.resblock.bn1.running_mean, block3.1.resblock.bn1.running_var, block3.1.resblock.conv2.weight, block3.1.resblock.bn2.weight, block3.1.resblock.bn2.bias, block3.1.resblock.bn2.running_mean, block3.1.resblock.bn2.running_var, block3.1.resblock.ca.fc.0.weight, block3.1.resblock.ca.fc.2.weight, block3.1.resblock.sa.conv1.weight, block3.1.concat_conv.weight, block3.2.resblock.conv1.weight, block3.2.resblock.bn1.weight, block3.2.resblock.bn1.bias, block3.2.resblock.bn1.running_mean, block3.2.resblock.bn1.running_var, block3.2.resblock.conv2.weight, block3.2.resblock.bn2.weight, block3.2.resblock.bn2.bias, block3.2.resblock.bn2.running_mean, block3.2.resblock.bn2.running_var, block3.2.resblock.ca.fc.0.weight, block3.2.resblock.ca.fc.2.weight, block3.2.resblock.sa.conv1.weight, block3.2.concat_conv.weight, block3.3.resblock.conv1.weight, block3.3.resblock.bn1.weight, block3.3.resblock.bn1.bias, block3.3.resblock.bn1.running_mean, block3.3.resblock.bn1.running_var, block3.3.resblock.conv2.weight, block3.3.resblock.bn2.weight, block3.3.resblock.bn2.bias, block3.3.resblock.bn2.running_mean, block3.3.resblock.bn2.running_var, block3.3.resblock.ca.fc.0.weight, block3.3.resblock.ca.fc.2.weight, block3.3.resblock.sa.conv1.weight, block3.3.concat_conv.weight, block3.4.resblock.conv1.weight, block3.4.resblock.bn1.weight, block3.4.resblock.bn1.bias, block3.4.resblock.bn1.running_mean, block3.4.resblock.bn1.running_var, block3.4.resblock.conv2.weight, block3.4.resblock.bn2.weight, block3.4.resblock.bn2.bias, block3.4.resblock.bn2.running_mean, block3.4.resblock.bn2.running_var, block3.4.resblock.ca.fc.0.weight, block3.4.resblock.ca.fc.2.weight, block3.4.resblock.sa.conv1.weight, block3.4.concat_conv.weight, block3.5.resblock.conv1.weight, block3.5.resblock.bn1.weight, block3.5.resblock.bn1.bias, block3.5.resblock.bn1.running_mean, block3.5.resblock.bn1.running_var, block3.5.resblock.conv2.weight, block3.5.resblock.bn2.weight, block3.5.resblock.bn2.bias, block3.5.resblock.bn2.running_mean, block3.5.resblock.bn2.running_var, block3.5.resblock.ca.fc.0.weight, block3.5.resblock.ca.fc.2.weight, block3.5.resblock.sa.conv1.weight, block3.5.concat_conv.weight, block4.0.resblock.conv1.weight, block4.0.resblock.bn1.weight, block4.0.resblock.bn1.bias, block4.0.resblock.bn1.running_mean, block4.0.resblock.bn1.running_var, block4.0.resblock.conv2.weight, block4.0.resblock.bn2.weight, block4.0.resblock.bn2.bias, block4.0.resblock.bn2.running_mean, block4.0.resblock.bn2.running_var, block4.0.resblock.ca.fc.0.weight, block4.0.resblock.ca.fc.2.weight, block4.0.resblock.sa.conv1.weight, block4.0.concat_conv.weight, block4.1.resblock.conv1.weight, block4.1.resblock.bn1.weight, block4.1.resblock.bn1.bias, block4.1.resblock.bn1.running_mean, block4.1.resblock.bn1.running_var, block4.1.resblock.conv2.weight, block4.1.resblock.bn2.weight, block4.1.resblock.bn2.bias, block4.1.resblock.bn2.running_mean, block4.1.resblock.bn2.running_var, block4.1.resblock.ca.fc.0.weight, block4.1.resblock.ca.fc.2.weight, block4.1.resblock.sa.conv1.weight, block4.1.concat_conv.weight, block4.2.resblock.conv1.weight, block4.2.resblock.bn1.weight, block4.2.resblock.bn1.bias, block4.2.resblock.bn1.running_mean, block4.2.resblock.bn1.running_var, block4.2.resblock.conv2.weight, block4.2.resblock.bn2.weight, block4.2.resblock.bn2.bias, block4.2.resblock.bn2.running_mean, block4.2.resblock.bn2.running_var, block4.2.resblock.ca.fc.0.weight, block4.2.resblock.ca.fc.2.weight, block4.2.resblock.sa.conv1.weight, block4.2.concat_conv.weight

Testing in indoor new dataset

Hello,
I have an indoor (large hallway) dataset captured by intelrealsense camera attached to a robot publishing all the topics in ROS. So, I have extracted the rgb images (424 x 240) and rgb aligned depth images (424 x 240), and all are synced. Both of their frame ids are camera_front/camera_color_optical_frame. The total number of rgb and depth images are 1200 and all are in png file format. The camera info topic which carries the intrinsic parameters are producing the same message for the whole time. Here is the camera info topic :

header: 
  seq: 49136
  stamp: 
    secs: 1582327384
    nsecs: 769735564
  frame_id: "camera_front/camera_color_optical_frame"
height: 240
width: 424
distortion_model: "plumb_bob"
D: [0.0, 0.0, 0.0, 0.0, 0.0]
K: [304.7262878417969, 0.0, 214.2415313720703, 0.0, 304.77935791015625, 121.70726013183594, 0.0, 0.0, 1.0]
R: [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0]
P: [304.7262878417969, 0.0, 214.2415313720703, 0.0, 0.0, 304.77935791015625, 121.70726013183594, 0.0, 0.0, 0.0, 1.0, 0.0]
binning_x: 0
binning_y: 0
roi: 
  x_offset: 0
  y_offset: 0
  height: 0
  width: 0
  do_rectify: False
---

Here is a sample picture and corresponding depth image:

Now, my questions are:

How the calibration file.txt would look like for each set of pairs ? (should be constant for each set up to 1200, only one will be enough)
Which inference file should I use ? (KITTI or NYU-V2)
Please provide the test script to run in this case (with appropriate patch width, patch height and maximum depth flags).

Thanks in advance.

Where to get pre-trained resnet34 and PVT

Congratulations to your great work. I find the code requires pre-trained ResNet34 and PVT checkpoints, which are not mentioned in the README. Where can I get these checkpoints?

About KITTI dataset evaluation and reporting

Hello,

I am trying to understand how do you go for the KITTI evaluation and reporting?

Is it that, there are some ground truth depth images that you compare against (which are they)? Say your method predicted depth for 100 images from KITTI dataset, how do you compare against the ground truth depth images ? Where is the script for generating the RMSE, MAE, iRMSE and iMAE ? Is it generated by you or by KITTI themselves for you?
How do you report the depth to the website: https://www.cvlibs.net/datasets/kitti/eval_depth.php?benchmark=depth_completion ? I mean how do you submit your work and who validates the evaluation matrics ? Is it you who report the evaluation metrics by yourself or is it them (KITTI) who do it for each and every new methods that come up ? Where and how do you submit the result as I didn't see any submission page?

Sorry, I am trying to go deeper so I need to know this process. Thanks in advance.

Some important questions regarding Completion former

Hello, thanks for your work. I have some questions. Here are they:

When I am giving your model a try with kitti test dataset that you mentioned in the readme, without --save_result_only flag, I saw there are around 5-6 intermediate images with final predicted image and error bar calculated. Now I was trying to also save the gt.min() and gt.max() values and the gt.min() value is always 0 where gt.max() value is somewhere between 60.0-80.0 for different images. Now, I assume this value is basically uint16 bit where I have to multiply that with 256 to get the actual value which is somewhere around 15360.0-20480.0.
a. The question is uint16 bit depth image range is from 0 to 65535. Why is this stuck on values only between 0-20480.0 rather expanding it to 0-65535?
b. You have set the max depth flag to 90.0 whereas for the kitti dataset, as long as I know, they are using HDL-64 which has a max depth of 200m and a minimum range of 2.0m.
c. In your code, are you using the minimum depth (~2m) of HDL-64 anywhere?Are you using minimum depth anywhere at all ?
b. How to get distance from these uint values ? If I assume that minimum depth for the data that you are referring as kitti_test is 2m and maximum depth is 200m for each and every sample, then for a single sample, each depth pixel depth in meters, D = (200 - 2) / (20480 - 0) * B + 2, is that correct ?
Why do you need K matrix ? In your work, I can see you are feeding RGB image and RGB aligned depth image. Then, what's the necessity to use K matrix ?
I have provided a case below where I have to do bottom cropping but I see you are not doing that. I can see you are doing top cropping. Can you give an example for bottom cropping problem ?

How to solve this problem?AttributeError: 'numpy.ndarray' object has no attribute 'detach'

Hello, I encountered the following issue while running. It should be a data issue between Tensor and Numpy. How can I modify this issue?
AttributeError: 'numpy.ndarray' object has no attribute 'detach'

About skip connection on model output

Hi @mattpoggi and @youmi-zym,

Thanks for the great work!

The model architecture in paper Fig. 2, it seems the model output is directly a depth map, however, in src/model/completionformer.py line 31, the model output is added by the input sparse depth.

I'm wondering if there's something I miss in the paper? Since in the code, the model seems to predict the residual depth but not a entire depth map.

Thanks!

Regarding depth conventions and weird outputs

Hi,

Thank you for your awesome work! I am trying to run your model on scannet RGB-D images to do depth completion. The depth in scannet is in mm, so I am dividing my depth by 1000.0 to convert it in metres. Next, I used the pre-trained NYUv2 checkpoint to complete the depth but I am noticing the predicted depth residual pred_init to be pretty big -- between [0, 2.7931] while my input depth (in m) is between [0, 2]. Could you let me know what I am missing? Thank you so much!

The command I ran:
python main.py --data_name scannet --test_only --log_dir /projects/katefgroup/language_grounding/completion_former/scannet \ --save_image --gpus 0 --pretrain /projects/katefgroup/language_grounding/NYUv2.pt

I am attaching the 04_pred_prop_5.jpg for your reference.

Inferring on any instance in the kitti-raw dataset

Thank you for your great work. I want to infer on any instance in the kitti-raw dataset, but the structure of kitti-raw and KITTI-DC is not consistent. That is to say, given a frame of raw image, raw lidar data and the relationship parameters between the two, can I directly obtain the completed result with the pretrained model? Can you give me some help, such as a script? thank you!

Some questions about train new dataset

Hi,
I would like to inquire about the rgb images I read while training my dataset. The rgb images saved during the training process all have a gray layer, but the output was normal during testing.
Also, I would like to ask about the crop_size in the code. What is the specific function of size? Can it be removed.
In the process of data processing, after adjusting the image size, the problem of the tensor size will occur during training.

No module named 'DCN'

Results in the paper can not be reproduced.

Hi, thanks for the great work. I tried to reproduce the results on the NYU dataset. However, the RMSE I got is 0.092. My torch is 1.10.0, torchvision is 0.11.0 and torchaudio is 0.10.0.

Does the warning ''Warning: using Python fallback for SyncBatchNorm, possibly because apex was installed without --cuda_ext. The exception raised when attempting to import the cuda backend was: No module named 'syncbn' Warning: apex was installed without --cpp_ext. Falling back to Python flatten and unflatten.'' reported by apex affect the accuracy?

Thanks a lot.

some error in kitti_dc.json file

Thanks for your good work.

I am following data_preparing as illustrated in readme.md.

I note that there is "2011_09_28_drive_0016_sync" in the kitti_dc.json.

However, the dataset downloaded from https://github.com/youmizym/CompletionFormer/files/11832324/kitti_archives_to_download.txt does not contain this folder.

It will cause some problems in the training.

Coudl you help me to solve this problem?

Inference Time

Hi! Thanks for your great work @youmi-zym :)
I have some questions about your experiment results.

In Table 4, the performance results of NLSPN are different from the original paper (ECCV'20). Can you tell me how you tested it and why it differs from the original paper results?
In Section 5 Conclusion and Limitations, you mentioned CompletionFormer runs at about 10 FPS. Is it a result of using 4 3090 GPUs?

Training time

Hi, thank you for your open source code.

I have noticed that the epoch of training is 250! I would like to ask how long does it take in total, and why does it need to train so many epochs? BTW, The training sample appears to use only part of the data(76842/85896). Could you tell me why and how you did it?
Thank you so much.

Best,
Hu Li

Is it possible to use deform_conv2d and torch.cuda.amp in your code?

Hello,

Thank you for providing this great code. I am wondering if your code can be easily modified to incorporate the following changes:

Instead of using NVIDIA Apex, which has now been integrated into PyTorch torch.amp, can the code be easily transferred to use torch.cuda.amp for automatic mixed precision?
The PyTorch implementation of Deformable Convolution v2 is available in torchvision.ops.deform_conv2d. How can I modify modulated_deform_conv_func.py to use this instead of the DCN module? Specifically, I see you are using DCN.modulated_deform_conv_forward and DCN.modulated_deform_conv_backward in the forward and backward` functions respectively. What changes do I need to make to use the new implementation?

model without NLSPN

Dear author:
I am very appreciate for your work and very grateful for your sharing you code for us. I encountered some issues while replicating your work and would like to seek your guidance.
When I train the model without the NLSPN mudule, we found that the pred depth map is showing noticeable 'spots,' and upon observation, these points happen to be exactly where the original sparse points are located. Have you encountered this situation, or do you have any suggestions?
Best wishes!

KITTI Online Server Submission

Hi,

Thank you for your work!

I think that the code does not top crop the images during the inference for KITTI Online Server submission.

It means that the resolution during the evaluation for KITTI Online Server submission is different from during the training and evaluation on the validation set.

Does it still produce plausible performance? (I mean that why do not training the same resolution as the image resolution for KITTI Online Server Submission? Does it make different performance?)

Thanks!

RuntimeError: Function 'PowBackward0' returned nan values in its 0th output.

During the learning process, the following error occurs and learning is interrupted.

The only change I made was changing the gpus from [0,1,2,3] to [0].The following is a description of the problem, thank you！

Train | 231111@15:22:36 | Loss = 14.8048 | Lr Warm Up : [0.000405]: 41%|████ | 34797/85896 [5:09:27<7:34:25, 1.87it/s] Traceback (most recent call last): File "/home/user/download/pycharm-community-2023.1.1/plugins/python-ce/helpers/pydev/pydevd.py", line 1496, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "/home/user/download/pycharm-community-2023.1.1/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/data/xyy/envs/CompletionFormer-main/src/main.py", line 446, in <module> main(args_main) File "/data/xyy/envs/CompletionFormer-main/src/main.py", line 421, in main while not spawn_context.join(): File "/data/conda/envs/user/envs/completionformer/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException: -- Process 0 terminated with the following error: Traceback (most recent call last): File "/data/conda/envs/user/envs/completionformer/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, *args) File "/data/xyy/envs/CompletionFormer-main/src/main.py", line 221, in train scaled_loss.backward() File "/data/conda/envs/user/envs/completionformer/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/data/conda/envs/user/envs/completionformer/lib/python3.8/site-packages/torch/autograd/__init__.py", line 154, in backward Variable._execution_engine.run_backward( RuntimeError: Function 'PowBackward0' returned nan values in its 0th output. python-BaseException Backend TkAgg is interactive backend. Turning interactive mode on.

Having problems with kitti dataset evaluation

Hello,
I was implementing your work on kitti dataset. As you have mentioned in your work that the kitti_raw directory should have the below structure:

|   |   ├── 2011_09_26
|   |   ├── 2011_09_28
|   |   ├── 2011_09_29
|   |   ├── 2011_09_30
|   |   ├── 2011_10_03

I can't see the kitti dataset like that. Inside the website, there are categories like: City, Residential, Road, Campus, Person, Calibration. Which one are you referring to ? Say inside city, there are sequences like 2011_09_26_drive_0001, ... and so on (26, 28, 29). Inside residential, there are (26, 30, 3); inside road (26, 29, 30, 03) and so on. Also, you didn't mention that do I have to download unsynced+unrectified dataset or synced+rectified dataset along with/without calibration file ??

Please clarify. Thanks in advance.

Not understanding the testing script

Hello, I tried with your testing script for NYU-v2 dataset, it generates the following output.

arghya@arghya-Pulse-GL66-12UEK:~/CompletionFormer/src$ python3 main.py --dir_data ../data/nyudepthv2 --data_name NYU  --split_json ../data_json/nyu.json --gpus 0 --max_depth 10.0 --num_sample 500 --test_only --pretrain ../src/pretrained/NYUv2.pt --save ../results 


=== Arguments ===
address : localhost  |  affinity : TGASS  |  affinity_gamma : 0.5  |  augment : True  |  
batch_size : 12  |  betas : (0.9, 0.999)  |  conf_prop : True  |  data_name : NYU  |  dir_data : ../data/nyudepthv2  |  
epochs : 72  |  epsilon : 1e-08  |  from_scratch : False  |  gamma : 0.5  |  gpus : 0  |  
legacy : False  |  lidar_lines : 64  |  log_dir : ../experiments/  |  loss : 1.0*L1+1.0*L2  |  lr : 0.001  |  
max_depth : 10.0  |  milestones : [36, 48, 56, 64]  |  model : CompletionFormer  |  momentum : 0.9  |  no_multiprocessing : False  |  
num_gpus : 1  |  num_sample : 500  |  num_summary : 4  |  num_threads : 4  |  opt_level : O0  |  
optimizer : ADAMW  |  patch_height : 228  |  patch_width : 304  |  port : 29500  |  preserve_input : False  |  
pretrain : ../src/pretrained/NYUv2.pt  |  print_freq : 1  |  prop_kernel : 3  |  prop_time : 6  |  resume : False  |  
save : ../results  |  save_dir : ../experiments/230620_152501_../results  |  save_full : False  |  save_image : False  |  save_result_only : False  |  
seed : 43  |  split_json : ../data_json/nyu.json  |  test_crop : False  |  test_only : True  |  top_crop : 0  |  
warm_up : True  |  weight_decay : 0.01  |  

2023-06-20 15:25:03,638 - mmseg - INFO - load checkpoint from local path: ./pretrained/pvt.pth
2023-06-20 15:25:03,716 - mmseg - WARNING - The model and loaded state dict do not match exactly

size mismatch for pos_embed1: copying a param with shape torch.Size([1, 3136, 64]) from checkpoint, the shape in current model is torch.Size([1, 12544, 64]).
size mismatch for patch_embed1.proj.weight: copying a param with shape torch.Size([64, 3, 4, 4]) from checkpoint, the shape in current model is torch.Size([64, 128, 2, 2]).
unexpected key in source state_dict: cls_token, norm.weight, norm.bias, head.weight, head.bias

missing keys in source state_dict: embed_layer1.0.conv1.weight, embed_layer1.0.bn1.weight, embed_layer1.0.bn1.bias, embed_layer1.0.bn1.running_mean, embed_layer1.0.bn1.running_var, embed_layer1.0.conv2.weight, embed_layer1.0.bn2.weight, embed_layer1.0.bn2.bias, embed_layer1.0.bn2.running_mean, embed_layer1.0.bn2.running_var, embed_layer1.1.conv1.weight, embed_layer1.1.bn1.weight, embed_layer1.1.bn1.bias, embed_layer1.1.bn1.running_mean, embed_layer1.1.bn1.running_var, embed_layer1.1.conv2.weight, embed_layer1.1.bn2.weight, embed_layer1.1.bn2.bias, embed_layer1.1.bn2.running_mean, embed_layer1.1.bn2.running_var, embed_layer1.2.conv1.weight, embed_layer1.2.bn1.weight, embed_layer1.2.bn1.bias, embed_layer1.2.bn1.running_mean, embed_layer1.2.bn1.running_var, embed_layer1.2.conv2.weight, embed_layer1.2.bn2.weight, embed_layer1.2.bn2.bias, embed_layer1.2.bn2.running_mean, embed_layer1.2.bn2.running_var, embed_layer2.0.conv1.weight, embed_layer2.0.bn1.weight, embed_layer2.0.bn1.bias, embed_layer2.0.bn1.running_mean, embed_layer2.0.bn1.running_var, embed_layer2.0.conv2.weight, embed_layer2.0.bn2.weight, embed_layer2.0.bn2.bias, embed_layer2.0.bn2.running_mean, embed_layer2.0.bn2.running_var, embed_layer2.0.downsample.0.weight, embed_layer2.0.downsample.1.weight, embed_layer2.0.downsample.1.bias, embed_layer2.0.downsample.1.running_mean, embed_layer2.0.downsample.1.running_var, embed_layer2.1.conv1.weight, embed_layer2.1.bn1.weight, embed_layer2.1.bn1.bias, embed_layer2.1.bn1.running_mean, embed_layer2.1.bn1.running_var, embed_layer2.1.conv2.weight, embed_layer2.1.bn2.weight, embed_layer2.1.bn2.bias, embed_layer2.1.bn2.running_mean, embed_layer2.1.bn2.running_var, embed_layer2.2.conv1.weight, embed_layer2.2.bn1.weight, embed_layer2.2.bn1.bias, embed_layer2.2.bn1.running_mean, embed_layer2.2.bn1.running_var, embed_layer2.2.conv2.weight, embed_layer2.2.bn2.weight, embed_layer2.2.bn2.bias, embed_layer2.2.bn2.running_mean, embed_layer2.2.bn2.running_var, embed_layer2.3.conv1.weight, embed_layer2.3.bn1.weight, embed_layer2.3.bn1.bias, embed_layer2.3.bn1.running_mean, embed_layer2.3.bn1.running_var, embed_layer2.3.conv2.weight, embed_layer2.3.bn2.weight, embed_layer2.3.bn2.bias, embed_layer2.3.bn2.running_mean, embed_layer2.3.bn2.running_var, block1.0.resblock.conv1.weight, block1.0.resblock.bn1.weight, block1.0.resblock.bn1.bias, block1.0.resblock.bn1.running_mean, block1.0.resblock.bn1.running_var, block1.0.resblock.conv2.weight, block1.0.resblock.bn2.weight, block1.0.resblock.bn2.bias, block1.0.resblock.bn2.running_mean, block1.0.resblock.bn2.running_var, block1.0.resblock.ca.fc.0.weight, block1.0.resblock.ca.fc.2.weight, block1.0.resblock.sa.conv1.weight, block1.0.concat_conv.weight, block1.1.resblock.conv1.weight, block1.1.resblock.bn1.weight, block1.1.resblock.bn1.bias, block1.1.resblock.bn1.running_mean, block1.1.resblock.bn1.running_var, block1.1.resblock.conv2.weight, block1.1.resblock.bn2.weight, block1.1.resblock.bn2.bias, block1.1.resblock.bn2.running_mean, block1.1.resblock.bn2.running_var, block1.1.resblock.ca.fc.0.weight, block1.1.resblock.ca.fc.2.weight, block1.1.resblock.sa.conv1.weight, block1.1.concat_conv.weight, block1.2.resblock.conv1.weight, block1.2.resblock.bn1.weight, block1.2.resblock.bn1.bias, block1.2.resblock.bn1.running_mean, block1.2.resblock.bn1.running_var, block1.2.resblock.conv2.weight, block1.2.resblock.bn2.weight, block1.2.resblock.bn2.bias, block1.2.resblock.bn2.running_mean, block1.2.resblock.bn2.running_var, block1.2.resblock.ca.fc.0.weight, block1.2.resblock.ca.fc.2.weight, block1.2.resblock.sa.conv1.weight, block1.2.concat_conv.weight, block2.0.resblock.conv1.weight, block2.0.resblock.bn1.weight, block2.0.resblock.bn1.bias, block2.0.resblock.bn1.running_mean, block2.0.resblock.bn1.running_var, block2.0.resblock.conv2.weight, block2.0.resblock.bn2.weight, block2.0.resblock.bn2.bias, block2.0.resblock.bn2.running_mean, block2.0.resblock.bn2.running_var, block2.0.resblock.ca.fc.0.weight, block2.0.resblock.ca.fc.2.weight, block2.0.resblock.sa.conv1.weight, block2.0.concat_conv.weight, block2.1.resblock.conv1.weight, block2.1.resblock.bn1.weight, block2.1.resblock.bn1.bias, block2.1.resblock.bn1.running_mean, block2.1.resblock.bn1.running_var, block2.1.resblock.conv2.weight, block2.1.resblock.bn2.weight, block2.1.resblock.bn2.bias, block2.1.resblock.bn2.running_mean, block2.1.resblock.bn2.running_var, block2.1.resblock.ca.fc.0.weight, block2.1.resblock.ca.fc.2.weight, block2.1.resblock.sa.conv1.weight, block2.1.concat_conv.weight, block2.2.resblock.conv1.weight, block2.2.resblock.bn1.weight, block2.2.resblock.bn1.bias, block2.2.resblock.bn1.running_mean, block2.2.resblock.bn1.running_var, block2.2.resblock.conv2.weight, block2.2.resblock.bn2.weight, block2.2.resblock.bn2.bias, block2.2.resblock.bn2.running_mean, block2.2.resblock.bn2.running_var, block2.2.resblock.ca.fc.0.weight, block2.2.resblock.ca.fc.2.weight, block2.2.resblock.sa.conv1.weight, block2.2.concat_conv.weight, block2.3.resblock.conv1.weight, block2.3.resblock.bn1.weight, block2.3.resblock.bn1.bias, block2.3.resblock.bn1.running_mean, block2.3.resblock.bn1.running_var, block2.3.resblock.conv2.weight, block2.3.resblock.bn2.weight, block2.3.resblock.bn2.bias, block2.3.resblock.bn2.running_mean, block2.3.resblock.bn2.running_var, block2.3.resblock.ca.fc.0.weight, block2.3.resblock.ca.fc.2.weight, block2.3.resblock.sa.conv1.weight, block2.3.concat_conv.weight, block3.0.resblock.conv1.weight, block3.0.resblock.bn1.weight, block3.0.resblock.bn1.bias, block3.0.resblock.bn1.running_mean, block3.0.resblock.bn1.running_var, block3.0.resblock.conv2.weight, block3.0.resblock.bn2.weight, block3.0.resblock.bn2.bias, block3.0.resblock.bn2.running_mean, block3.0.resblock.bn2.running_var, block3.0.resblock.ca.fc.0.weight, block3.0.resblock.ca.fc.2.weight, block3.0.resblock.sa.conv1.weight, block3.0.concat_conv.weight, block3.1.resblock.conv1.weight, block3.1.resblock.bn1.weight, block3.1.resblock.bn1.bias, block3.1.resblock.bn1.running_mean, block3.1.resblock.bn1.running_var, block3.1.resblock.conv2.weight, block3.1.resblock.bn2.weight, block3.1.resblock.bn2.bias, block3.1.resblock.bn2.running_mean, block3.1.resblock.bn2.running_var, block3.1.resblock.ca.fc.0.weight, block3.1.resblock.ca.fc.2.weight, block3.1.resblock.sa.conv1.weight, block3.1.concat_conv.weight, block3.2.resblock.conv1.weight, block3.2.resblock.bn1.weight, block3.2.resblock.bn1.bias, block3.2.resblock.bn1.running_mean, block3.2.resblock.bn1.running_var, block3.2.resblock.conv2.weight, block3.2.resblock.bn2.weight, block3.2.resblock.bn2.bias, block3.2.resblock.bn2.running_mean, block3.2.resblock.bn2.running_var, block3.2.resblock.ca.fc.0.weight, block3.2.resblock.ca.fc.2.weight, block3.2.resblock.sa.conv1.weight, block3.2.concat_conv.weight, block3.3.resblock.conv1.weight, block3.3.resblock.bn1.weight, block3.3.resblock.bn1.bias, block3.3.resblock.bn1.running_mean, block3.3.resblock.bn1.running_var, block3.3.resblock.conv2.weight, block3.3.resblock.bn2.weight, block3.3.resblock.bn2.bias, block3.3.resblock.bn2.running_mean, block3.3.resblock.bn2.running_var, block3.3.resblock.ca.fc.0.weight, block3.3.resblock.ca.fc.2.weight, block3.3.resblock.sa.conv1.weight, block3.3.concat_conv.weight, block3.4.resblock.conv1.weight, block3.4.resblock.bn1.weight, block3.4.resblock.bn1.bias, block3.4.resblock.bn1.running_mean, block3.4.resblock.bn1.running_var, block3.4.resblock.conv2.weight, block3.4.resblock.bn2.weight, block3.4.resblock.bn2.bias, block3.4.resblock.bn2.running_mean, block3.4.resblock.bn2.running_var, block3.4.resblock.ca.fc.0.weight, block3.4.resblock.ca.fc.2.weight, block3.4.resblock.sa.conv1.weight, block3.4.concat_conv.weight, block3.5.resblock.conv1.weight, block3.5.resblock.bn1.weight, block3.5.resblock.bn1.bias, block3.5.resblock.bn1.running_mean, block3.5.resblock.bn1.running_var, block3.5.resblock.conv2.weight, block3.5.resblock.bn2.weight, block3.5.resblock.bn2.bias, block3.5.resblock.bn2.running_mean, block3.5.resblock.bn2.running_var, block3.5.resblock.ca.fc.0.weight, block3.5.resblock.ca.fc.2.weight, block3.5.resblock.sa.conv1.weight, block3.5.concat_conv.weight, block4.0.resblock.conv1.weight, block4.0.resblock.bn1.weight, block4.0.resblock.bn1.bias, block4.0.resblock.bn1.running_mean, block4.0.resblock.bn1.running_var, block4.0.resblock.conv2.weight, block4.0.resblock.bn2.weight, block4.0.resblock.bn2.bias, block4.0.resblock.bn2.running_mean, block4.0.resblock.bn2.running_var, block4.0.resblock.ca.fc.0.weight, block4.0.resblock.ca.fc.2.weight, block4.0.resblock.sa.conv1.weight, block4.0.concat_conv.weight, block4.1.resblock.conv1.weight, block4.1.resblock.bn1.weight, block4.1.resblock.bn1.bias, block4.1.resblock.bn1.running_mean, block4.1.resblock.bn1.running_var, block4.1.resblock.conv2.weight, block4.1.resblock.bn2.weight, block4.1.resblock.bn2.bias, block4.1.resblock.bn2.running_mean, block4.1.resblock.bn2.running_var, block4.1.resblock.ca.fc.0.weight, block4.1.resblock.ca.fc.2.weight, block4.1.resblock.sa.conv1.weight, block4.1.concat_conv.weight, block4.2.resblock.conv1.weight, block4.2.resblock.bn1.weight, block4.2.resblock.bn1.bias, block4.2.resblock.bn1.running_mean, block4.2.resblock.bn1.running_var, block4.2.resblock.conv2.weight, block4.2.resblock.bn2.weight, block4.2.resblock.bn2.bias, block4.2.resblock.bn2.running_mean, block4.2.resblock.bn2.running_var, block4.2.resblock.ca.fc.0.weight, block4.2.resblock.ca.fc.2.weight, block4.2.resblock.sa.conv1.weight, block4.2.concat_conv.weight

===pretrained weight loaded===
Checkpoint loaded from ../src/pretrained/NYUv2.pt!
230620@15:25:45 | Test: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 654/654 [00:37<00:00, 17.23it/s]
 Metric   |  RMSE: 0.09013  MAE: 0.03519  iRMSE: 0.01378  iMAE: 0.00508  REL: 0.01189  D^1: 0.99585  D^2: 0.99940  D^3: 0.99988  D102: 0.87466  D105: 0.95325  D110: 0.98049  
Elapsed time : 35.93873906135559 sec, Average processing time : 0.05495220039962628 sec

So, according to my understanding, it's taking the validation h5 files (between train and val folder, total 654 files) as input each containing RGB and depth image (dimension for rgb=??, dimension for depth=??) information and outputting RMSE: 0.09013 MAE: 0.03519 iRMSE: 0.01378 iMAE: 0.00508 REL: 0.01189 D^1: 0.99585 D^2: 0.99940 D^3: 0.99988 D102: 0.87466 D105: 0.95325 D110: 0.98049 this information.

How to output the corresponding completed depth images ? I mean as per the method for each rgb+depth (h5, input) --> completed depth (h5, output) till 654 images, isn't this the process ? I can't find the output completed 654 depth images.