brandleyzhou / diffnet Goto Github PK

View Code? Open in Web Editor NEW

109.0 109.0 21.0 185.69 MB

[BMVC 2021] ''Self-Supervised Monocular Depth Estimation with Internal Feature Fusion''

Python 99.56% Shell 0.44%

bmvc cityscapes kitti monocular-depth-estimation representation-learning self-supervised

diffnet's People

Contributors

Stargazers

Watchers

diffnet's Issues

Missing Keys in Pretrained Weights

Hi @brandleyzhou, thank you for your great work!

I met the following problem when testing your pretrained models:

Exception has occurred: RuntimeError
Error(s) in loading state_dict for HRDepthDecoder:
	Missing key(s) in state_dict: "convs.up_x9_0.conv.conv.weight", "convs.up_x9_0.conv.conv.bias", "convs.up_x9_1.conv.conv.weight", "convs.up_x9_1.conv.conv.bias", "convs.72.ca.fc.0.weight", "convs.72.ca.fc.2.weight", "convs.72.conv_se.weight", "convs.72.conv_se.bias", "convs.36.ca.fc.0.weight", "convs.36.ca.fc.2.weight", "convs.36.conv_se.weight", "convs.36.conv_se.bias", "convs.18.ca.fc.0.weight", "convs.18.ca.fc.2.weight", "convs.18.conv_se.weight", "convs.18.conv_se.bias", "convs.9.ca.fc.0.weight", "convs.9.ca.fc.2.weight", "convs.9.conv_se.weight", "convs.9.conv_se.bias", "convs.dispConvScale0.conv.weight", "convs.dispConvScale0.conv.bias", "convs.dispConvScale1.conv.weight", "convs.dispConvScale1.conv.bias", "convs.dispConvScale2.conv.weight", "convs.dispConvScale2.conv.bias", "convs.dispConvScale3.conv.weight", "convs.dispConvScale3.conv.bias", "decoder.0.conv.conv.weight", "decoder.0.conv.conv.bias", "decoder.1.conv.conv.weight", "decoder.1.conv.conv.bias", "decoder.2.ca.fc.0.weight", "decoder.2.ca.fc.2.weight", "decoder.2.conv_se.weight", "decoder.2.conv_se.bias", "decoder.3.ca.fc.0.weight", "decoder.3.ca.fc.2.weight", "decoder.3.conv_se.weight", "decoder.3.conv_se.bias", "decoder.4.ca.fc.0.weight", "decoder.4.ca.fc.2.weight", "decoder.4.conv_se.weight", "decoder.4.conv_se.bias", "decoder.5.ca.fc.0.weight", "decoder.5.ca.fc.2.weight", "decoder.5.conv_se.weight", "decoder.5.conv_se.bias", "decoder.6.conv.weight", "decoder.6.conv.bias", "decoder.7.conv.weight", "decoder.7.conv.bias", "decoder.8.conv.weight", "decoder.8.conv.bias", "decoder.9.conv.weight", "decoder.9.conv.bias".

The pretrained weights are downloaded from this repository page. Specifically, I was testing two pretrained models:

Could you please have a look at this and upload the complete models? Thanks in advance!

About torch::jit::trace

Hello,
Thank you for sharing your work, and I want to use libtorch to deploy this network in C++, but when using torch::jit::trace(), I get this error(executing test_sample.py can run successfully):

Because torch::jit::trace() cannot handle dictionary, I changed the output of depth_decoder to list, and there is a line "import hr_networks" in test_sample.py, but I did not find hr_networks, I don't know if this affectstorch::jit::trace().

Thank you very much！

Loss

Hello，
When I run the code, I wonder whether you used the options of uncertain_mask and flipping_loss. Because I can't reproduce the accuracy in your paper at the resolution of 1024*320. Thanks for your reply.

run-time FPS

This is a great work. I have a question about "run-time FPS". In Table.3 of your paper, you claim that the run-time is 87FPS. Under what circumstances do you get this value? It takes at least 53ms for me to use GPU Nvidia RTX3090 to process a picture (640x192).

test on video

Hello, would like to ask you a question.
The input to the model is a video, how to generate a video after a depth estimate?
Can you tell me how to fix it? Thanks!

the best,
Rui Zhang

cannot reproduced results mentioned in the paper

Hi, I trained your model with 640x192 and 1025x320 input sizes, but the results are different from what you mentioned in the paper.

Here are the results I got:

-> Computing predictions with size 640x192
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.108 & 0.792 & 4.589 & 0.186 & 0.889 & 0.963 & 0.982 \

-> Computing predictions with size 1024x320
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.103 & 0.909 & 4.642 & 0.183 & 0.899 & 0.965 & 0.982 \

And here are the results mentioned in the paper:

I don't know what cause this difference, because when I used your pre-trained weights for evaluation, I got the same results as yours. do you have any idea why? maybe the code has slightly changed? or a different version of the torch can cause this?

Environment file

#8
Can we still get that environment file, though?

Cityscapes model

Hi. First, thank you for opening your nice paper and source code.

Could you share checkpoints that were pretrained on Cityscapes and fine-tuned on KITTI (i.e., CS → K)?

I would like to know whether DiffNet that I pretrained on Cityscapes is correct.

Thanks!

About the license for this model

Thank you for sharing your great code. 😄

What is the license for this model? I'd like to cite it to the repository I'm working on if possible, but I want to post the license correctly.

Thank you.

viz_map is not found

Hi:
Thanks for your code. When I am going to evaluate my models after training, I found there is no file called viz_map. In evaluate_depth.py the code "from viz_map import save_depth, save_visualization,save_error_visualization" is wrong.

About model's FPS

Hello
Thank you for your good work!!

I'm calculating the DIFFNet's FPS in RTX2080ti to fairly compare our works.
But the DIFFNet and monodepth2 's fps are so different from those reported in your paper. Can I get your code to calculate the fps, please?

I measured the fps with the following code.

import torch
import networks
en = networks.test_hr_encoder.hrnet18(False)
en.num_ch_enc = [ 64, 18, 36, 72, 144 ]
de= networks.HRDepthDecoder(en.num_ch_enc, [0])
# depth_net=DepthResNet(version="101pt")
=
device = torch.device('cuda')
en.to(device)
en.eval()
de.to(device)
de.eval()
optimal_batch_size=1
dummy_input = torch.randn(optimal_batch_size, 3,192,640, dtype=torch.float).to(device)
repetitions=10000
total_time = 0
print("start calculate")
with torch.no_grad():
      for rep in range(repetitions):
             starter, ender = torch.cuda.Event(enable_timing=True), torch.cuda.Event(enable_timing=True)
             starter.record()
             _ = de(en(dummy_input))
             ender.record()
             torch.cuda.synchronize()
             curr_time = starter.elapsed_time(ender)/1000
             if rep!=0:
                 total_time += curr_time
repetitions=repetitions-1
print(total_time)
Throughput = (repetitions*optimal_batch_size)/total_time
print('Final FPS:',Throughput,' total_time:',total_time)
print("weight num: ",sum(p.numel() for p in en.parameters())+sum(p.numel() for p in de.parameters()))

And the following results were obtained for each models.

Model	FPS
DIFFNet	34.92
Monodepth2	282.25

Multi-GPU training hangs

Hello,
When I start multi gpu training. I run the following command.
python -m torch.distributed.launch --nproc_per_node=2 train.py --split eigen_zhou --learning_rate 1e-4 --height 320 --width 1024 --scheduler_step_size 14 --batch_size 2 --model_name mono_model --png --data_path ../4_monodepth2/data/KITTI/ --num_epochs 40 --log_dir weights_logs

If I set --nproc_per_node=1, then it runs alright on single GPU, but if I set --nproc_per_node=2, then it just prints the comments before it initializes distributed training but after that, it just stucks.
From nvidia-smi, I can see the GPUs are 100% occupied, but training does not start (weight_logs also does not get created)

I have attached screenshot where it gets stuck.
Can you please help me with knowing what this might be?

Thank you for you time.

Changing Input Size

Hi,
The provided code gives the results for 640x192 image size. where can I change it to the original size input (1024x320) and train with that?
Also, it seems that you add an internal feature fusion to the original HRNet, I would like to remove that and test it with the original HRNet. In the "test_hr_encoder.py" I tried to remove "mixed_features" and only return "features", but in the decoder, I get an error. Is there any way to train your model with the original HRNet?

imagenet pretrained model

Hi, I am so touched to see your paper. I would like to ask how to train hrnet on imagenet.

Training

Thanks for your working.
Here are something detials i want 2 ask you . Here are my torch
torch 1.7.1+cu110 torchaudio 0.7.2 torchsummary 1.5.1
torchvision 0.8.2+cu110
I found when i set the initial learning rate as 10−4 for the first 14 epochs and then 10−5 for last 5 epochs ,my experimental results are very different from yours . Is it the reason for different PyTorch versions？Or my training process wrong?

issue about downloading hrnet pretrained on ImageNet

First of all, thank you for sharing this cool work.

I faced an issue that an error occurred when I run the start2train.sh.

The error is below:

I ran start2train.sh on another computer that has a different IP address,
but the error also occurred.

Thank you.

test file missing

Thanks for your work of DIFFNet!
I want to evaluate the results of the training in my PC, but the file "splits/eigen/gt_depths.npz" is required. I can't find it in the document. Could you please provide this file? Thanks!

Saved trained models

Hi, Thank you for sharing your amazing code. I run the training code and it was trained for 20 epochs but I don't know where the models are saved? also your code save each epoch results separately or only save the last epoch? and the last question, where can I change the number of epochs for training?

Missing Keys in 1024x320 Pretrained Weights

Hi @brandleyzhou! Thank you for your work!

I met a problem when testing your pretrained models:

RuntimeError: Error(s) in loading state_dict for HRDepthDecoder:
       Missing key(s) in state_dict: "convs.72.ca.fc.0.weight", "convs.72.ca.fc.2.weight", "convs.72.conv_se.weight", "convs.72.conv_se.bias", "convs.36.ca.fc.0.weight", "convs.36.ca.fc.2.weight", "convs.36.conv_se.weight", "convs.36.conv_se.bias", "convs.18.ca.fc.0.weight", "convs.18.ca.fc.2.weight", "convs.18.conv_se.weight", "convs.18.conv_se.bias", "convs.9.ca.fc.0.weight", "convs.9.ca.fc.2.weight", "convs.9.conv_se.weight", "convs.9.conv_se.bias", "decoder.2.ca.fc.0.weight", "decoder.2.ca.fc.2.weight", "decoder.3.ca.fc.0.weight", "decoder.3.ca.fc.2.weight", "decoder.4.ca.fc.0.weight", "decoder.4.ca.fc.2.weight", "decoder.5.ca.fc.0.weight", "decoder.5.ca.fc.2.weight".
       Unexpected key(s) in state_dict: "convs.72fSE.fc.0.weight", "convs.72fSE.fc.2.weight", "convs.72fSE.conv_se.weight", "convs.72fSE.conv_se.bias", "convs.36fSE.fc.0.weight", "convs.36fSE.fc.2.weight", "convs.36fSE.conv_se.weight", "convs.36fSE.conv_se.bias", "convs.18fSE.fc.0.weight", "convs.18fSE.fc.2.weight", "convs.18fSE.conv_se.weight", "convs.18fSE.conv_se.bias", "convs.9fSE.fc.0.weight", "convs.9fSE.fc.2.weight", "convs.9fSE.conv_se.weight", "convs.9fSE.conv_se.bias", "decoder.2.fc.0.weight", "decoder.2.fc.2.weight", "decoder.3.fc.0.weight", "decoder.3.fc.2.weight", "decoder.4.fc.0.weight", "decoder.4.fc.2.weight", "decoder.5.fc.0.weight", "decoder.5.fc.2.weight".

The pretrained weights are downloaded from this repository, I was testing 4 models:

I only experienced the problem with the 1024x320 model. Could you please have a look at what the problem might be? Thanks in advance!

where is the supplementary material mentioned in paper

At the bottom of page 9 said "The corresponding images are shown in the supplementary material." but I can't find the supplementary material section in this paper.
is there any misunderstanding on my part?
thanks for your time

a question about '1024x320_ms_ttr'

I find the result of '1024x320_ms_ttr' in the README is so good( abs rel=0.079). But what does 'ttr' mean?

Environment

Hi, thank you for sharing your nice work.

Could you share the environment setting such as versions of packages for this work?

I cannot reproduce the results of this paper even if using the pretrained model that is provided in this repo.

In my evaluation:
0.1024 0.7632 4.482 1.799 0.8954 0.9645 0.9831

In the paper:
0.102 0.764 4.483 0.180 0.896 0.965 0.983

Training and testing issue

Hi,

When I'm trying to test a simple image by running the " sh test_sample.sh " code, I get this error: " ModuleNotFoundError: No module named 'hr_networks' "

Would you please let me know how I can get "hr_networks"?

Also when I tried to train the model, this error popup:

from .hrnet_config import MODEL_CONFIGS
File "/media/armin/DATA/DIFFNet/networks/hrnet_config.py", line 5, in
from yacs.config import CfgNode as CN
ModuleNotFoundError: No module named 'yacs'

Do I miss something here?

STEREO SCALE FACTOR

Hello
Can the network trained by MS method use STEREO_SCALE_FACTOR(5.4) to get the real scale like MonoDepth2?
Thank you very much!

the layer's names of the decoder model and the depth.pth in diffnet_1024x320_ttr are not the same

hello, it seems that the layer's names of the decoder model and the depth.pth in diffnet_1024x320_ttr are not the same and cause error when running evaluate_depth.py

brandleyzhou / diffnet Goto Github PK

diffnet's People

Contributors

Stargazers

Watchers

Forkers

diffnet's Issues

Recommend Projects

Recommend Topics

Recommend Org