brandleyzhou / diffnet Goto Github PK
View Code? Open in Web Editor NEW[BMVC 2021] ''Self-Supervised Monocular Depth Estimation with Internal Feature Fusion''
[BMVC 2021] ''Self-Supervised Monocular Depth Estimation with Internal Feature Fusion''
Hi @brandleyzhou, thank you for your great work!
I met the following problem when testing your pretrained models:
Exception has occurred: RuntimeError
Error(s) in loading state_dict for HRDepthDecoder:
Missing key(s) in state_dict: "convs.up_x9_0.conv.conv.weight", "convs.up_x9_0.conv.conv.bias", "convs.up_x9_1.conv.conv.weight", "convs.up_x9_1.conv.conv.bias", "convs.72.ca.fc.0.weight", "convs.72.ca.fc.2.weight", "convs.72.conv_se.weight", "convs.72.conv_se.bias", "convs.36.ca.fc.0.weight", "convs.36.ca.fc.2.weight", "convs.36.conv_se.weight", "convs.36.conv_se.bias", "convs.18.ca.fc.0.weight", "convs.18.ca.fc.2.weight", "convs.18.conv_se.weight", "convs.18.conv_se.bias", "convs.9.ca.fc.0.weight", "convs.9.ca.fc.2.weight", "convs.9.conv_se.weight", "convs.9.conv_se.bias", "convs.dispConvScale0.conv.weight", "convs.dispConvScale0.conv.bias", "convs.dispConvScale1.conv.weight", "convs.dispConvScale1.conv.bias", "convs.dispConvScale2.conv.weight", "convs.dispConvScale2.conv.bias", "convs.dispConvScale3.conv.weight", "convs.dispConvScale3.conv.bias", "decoder.0.conv.conv.weight", "decoder.0.conv.conv.bias", "decoder.1.conv.conv.weight", "decoder.1.conv.conv.bias", "decoder.2.ca.fc.0.weight", "decoder.2.ca.fc.2.weight", "decoder.2.conv_se.weight", "decoder.2.conv_se.bias", "decoder.3.ca.fc.0.weight", "decoder.3.ca.fc.2.weight", "decoder.3.conv_se.weight", "decoder.3.conv_se.bias", "decoder.4.ca.fc.0.weight", "decoder.4.ca.fc.2.weight", "decoder.4.conv_se.weight", "decoder.4.conv_se.bias", "decoder.5.ca.fc.0.weight", "decoder.5.ca.fc.2.weight", "decoder.5.conv_se.weight", "decoder.5.conv_se.bias", "decoder.6.conv.weight", "decoder.6.conv.bias", "decoder.7.conv.weight", "decoder.7.conv.bias", "decoder.8.conv.weight", "decoder.8.conv.bias", "decoder.9.conv.weight", "decoder.9.conv.bias".
The pretrained weights are downloaded from this repository page. Specifically, I was testing two pretrained models:
Could you please have a look at this and upload the complete models? Thanks in advance!
Hello,
Thank you for sharing your work, and I want to use libtorch to deploy this network in C++, but when using torch::jit::trace(), I get this error(executing test_sample.py can run successfully):
Because torch::jit::trace() cannot handle dictionary, I changed the output of depth_decoder to list, and there is a line "import hr_networks" in test_sample.py, but I did not find hr_networks, I don't know if this affectstorch::jit::trace().
Thank you very much!
Hello,
When I run the code, I wonder whether you used the options of uncertain_mask and flipping_loss. Because I can't reproduce the accuracy in your paper at the resolution of 1024*320. Thanks for your reply.
This is a great work. I have a question about "run-time FPS". In Table.3 of your paper, you claim that the run-time is 87FPS. Under what circumstances do you get this value? It takes at least 53ms for me to use GPU Nvidia RTX3090 to process a picture (640x192).
Hello, would like to ask you a question.
The input to the model is a video, how to generate a video after a depth estimate?
Can you tell me how to fix it? Thanks!
the best,
Rui Zhang
Hi, I trained your model with 640x192 and 1025x320 input sizes, but the results are different from what you mentioned in the paper.
Here are the results I got:
-> Computing predictions with size 640x192
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.108 & 0.792 & 4.589 & 0.186 & 0.889 & 0.963 & 0.982 \
-> Computing predictions with size 1024x320
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.103 & 0.909 & 4.642 & 0.183 & 0.899 & 0.965 & 0.982 \
And here are the results mentioned in the paper:
I don't know what cause this difference, because when I used your pre-trained weights for evaluation, I got the same results as yours. do you have any idea why? maybe the code has slightly changed? or a different version of the torch can cause this?
#8
Can we still get that environment file, though?
Hi. First, thank you for opening your nice paper and source code.
Could you share checkpoints that were pretrained on Cityscapes and fine-tuned on KITTI (i.e., CS → K)?
I would like to know whether DiffNet that I pretrained on Cityscapes is correct.
Thanks!
Thank you for sharing your great code. 😄
What is the license for this model? I'd like to cite it to the repository I'm working on if possible, but I want to post the license correctly.
Thank you.
Hi:
Thanks for your code. When I am going to evaluate my models after training, I found there is no file called viz_map. In evaluate_depth.py the code "from viz_map import save_depth, save_visualization,save_error_visualization" is wrong.
Hello
Thank you for your good work!!
I'm calculating the DIFFNet's FPS in RTX2080ti to fairly compare our works.
But the DIFFNet and monodepth2 's fps are so different from those reported in your paper. Can I get your code to calculate the fps, please?
I measured the fps with the following code.
import torch
import networks
en = networks.test_hr_encoder.hrnet18(False)
en.num_ch_enc = [ 64, 18, 36, 72, 144 ]
de= networks.HRDepthDecoder(en.num_ch_enc, [0])
# depth_net=DepthResNet(version="101pt")
=
device = torch.device('cuda')
en.to(device)
en.eval()
de.to(device)
de.eval()
optimal_batch_size=1
dummy_input = torch.randn(optimal_batch_size, 3,192,640, dtype=torch.float).to(device)
repetitions=10000
total_time = 0
print("start calculate")
with torch.no_grad():
for rep in range(repetitions):
starter, ender = torch.cuda.Event(enable_timing=True), torch.cuda.Event(enable_timing=True)
starter.record()
_ = de(en(dummy_input))
ender.record()
torch.cuda.synchronize()
curr_time = starter.elapsed_time(ender)/1000
if rep!=0:
total_time += curr_time
repetitions=repetitions-1
print(total_time)
Throughput = (repetitions*optimal_batch_size)/total_time
print('Final FPS:',Throughput,' total_time:',total_time)
print("weight num: ",sum(p.numel() for p in en.parameters())+sum(p.numel() for p in de.parameters()))
And the following results were obtained for each models.
Model | FPS |
---|---|
DIFFNet | 34.92 |
Monodepth2 | 282.25 |
Hello,
When I start multi gpu training. I run the following command.
python -m torch.distributed.launch --nproc_per_node=2 train.py --split eigen_zhou --learning_rate 1e-4 --height 320 --width 1024 --scheduler_step_size 14 --batch_size 2 --model_name mono_model --png --data_path ../4_monodepth2/data/KITTI/ --num_epochs 40 --log_dir weights_logs
If I set --nproc_per_node=1, then it runs alright on single GPU, but if I set --nproc_per_node=2, then it just prints the comments before it initializes distributed training but after that, it just stucks.
From nvidia-smi, I can see the GPUs are 100% occupied, but training does not start (weight_logs also does not get created)
I have attached screenshot where it gets stuck.
Can you please help me with knowing what this might be?
Thank you for you time.
Hi,
The provided code gives the results for 640x192 image size. where can I change it to the original size input (1024x320) and train with that?
Also, it seems that you add an internal feature fusion to the original HRNet, I would like to remove that and test it with the original HRNet. In the "test_hr_encoder.py" I tried to remove "mixed_features" and only return "features", but in the decoder, I get an error. Is there any way to train your model with the original HRNet?
Hi, I am so touched to see your paper. I would like to ask how to train hrnet on imagenet.
Thanks for your working.
Here are something detials i want 2 ask you . Here are my torch
torch 1.7.1+cu110 torchaudio 0.7.2 torchsummary 1.5.1
torchvision 0.8.2+cu110
I found when i set the initial learning rate as 10−4 for the first 14 epochs and then 10−5 for last 5 epochs ,my experimental results are very different from yours . Is it the reason for different PyTorch versions?Or my training process wrong?
Thanks for your work of DIFFNet!
I want to evaluate the results of the training in my PC, but the file "splits/eigen/gt_depths.npz" is required. I can't find it in the document. Could you please provide this file? Thanks!
Hi, Thank you for sharing your amazing code. I run the training code and it was trained for 20 epochs but I don't know where the models are saved? also your code save each epoch results separately or only save the last epoch? and the last question, where can I change the number of epochs for training?
Hi @brandleyzhou! Thank you for your work!
I met a problem when testing your pretrained models:
RuntimeError: Error(s) in loading state_dict for HRDepthDecoder:
Missing key(s) in state_dict: "convs.72.ca.fc.0.weight", "convs.72.ca.fc.2.weight", "convs.72.conv_se.weight", "convs.72.conv_se.bias", "convs.36.ca.fc.0.weight", "convs.36.ca.fc.2.weight", "convs.36.conv_se.weight", "convs.36.conv_se.bias", "convs.18.ca.fc.0.weight", "convs.18.ca.fc.2.weight", "convs.18.conv_se.weight", "convs.18.conv_se.bias", "convs.9.ca.fc.0.weight", "convs.9.ca.fc.2.weight", "convs.9.conv_se.weight", "convs.9.conv_se.bias", "decoder.2.ca.fc.0.weight", "decoder.2.ca.fc.2.weight", "decoder.3.ca.fc.0.weight", "decoder.3.ca.fc.2.weight", "decoder.4.ca.fc.0.weight", "decoder.4.ca.fc.2.weight", "decoder.5.ca.fc.0.weight", "decoder.5.ca.fc.2.weight".
Unexpected key(s) in state_dict: "convs.72fSE.fc.0.weight", "convs.72fSE.fc.2.weight", "convs.72fSE.conv_se.weight", "convs.72fSE.conv_se.bias", "convs.36fSE.fc.0.weight", "convs.36fSE.fc.2.weight", "convs.36fSE.conv_se.weight", "convs.36fSE.conv_se.bias", "convs.18fSE.fc.0.weight", "convs.18fSE.fc.2.weight", "convs.18fSE.conv_se.weight", "convs.18fSE.conv_se.bias", "convs.9fSE.fc.0.weight", "convs.9fSE.fc.2.weight", "convs.9fSE.conv_se.weight", "convs.9fSE.conv_se.bias", "decoder.2.fc.0.weight", "decoder.2.fc.2.weight", "decoder.3.fc.0.weight", "decoder.3.fc.2.weight", "decoder.4.fc.0.weight", "decoder.4.fc.2.weight", "decoder.5.fc.0.weight", "decoder.5.fc.2.weight".
The pretrained weights are downloaded from this repository, I was testing 4 models:
I only experienced the problem with the 1024x320 model. Could you please have a look at what the problem might be? Thanks in advance!
At the bottom of page 9 said "The corresponding images are shown in the supplementary material." but I can't find the supplementary material section in this paper.
is there any misunderstanding on my part?
thanks for your time
I find the result of '1024x320_ms_ttr' in the README is so good( abs rel=0.079). But what does 'ttr' mean?
Hi, thank you for sharing your nice work.
Could you share the environment setting such as versions of packages for this work?
I cannot reproduce the results of this paper even if using the pretrained model that is provided in this repo.
In my evaluation:
0.1024 0.7632 4.482 1.799 0.8954 0.9645 0.9831
In the paper:
0.102 0.764 4.483 0.180 0.896 0.965 0.983
Hi,
When I'm trying to test a simple image by running the " sh test_sample.sh " code, I get this error: " ModuleNotFoundError: No module named 'hr_networks' "
Would you please let me know how I can get "hr_networks"?
Also when I tried to train the model, this error popup:
from .hrnet_config import MODEL_CONFIGS
File "/media/armin/DATA/DIFFNet/networks/hrnet_config.py", line 5, in
from yacs.config import CfgNode as CN
ModuleNotFoundError: No module named 'yacs'
Do I miss something here?
Hello
Can the network trained by MS method use STEREO_SCALE_FACTOR(5.4) to get the real scale like MonoDepth2?
Thank you very much!
hello, it seems that the layer's names of the decoder model and the depth.pth in diffnet_1024x320_ttr are not the same and cause error when running evaluate_depth.py
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.