Giter Site home page Giter Site logo

brandleyzhou / diffnet Goto Github PK

View Code? Open in Web Editor NEW
109.0 6.0 21.0 185.69 MB

[BMVC 2021] ''Self-Supervised Monocular Depth Estimation with Internal Feature Fusion''

Python 99.56% Shell 0.44%
self-supervised monocular-depth-estimation representation-learning bmvc cityscapes kitti

diffnet's Introduction

DIFFNet

This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021

A new backbone for self-supervised depth estimation.

PWC

If you think it is a useful work, please consider citing it.

@inproceedings{zhou_diffnet,
    title={Self-Supervised Monocular Depth Estimation with Internal Feature Fusion},
    author={Zhou, Hang and Greenwood, David and Taylor, Sarah},
    booktitle={British Machine Vision Conference (BMVC)},
    year={2021}
    }

Update:

  • [16-05-2022] Adding cityscapes trainining and testing based on Manydepth.

  • [22-01-2022] A model diffnet_649x192 uploaded (slightly improved than that of orginal paper)

  • [07-12-2021] A multi-gpu training version availible on multi-gpu branch.

Comparing with others

Evaluation on selected hard cases:

Trained weights on KITTI

  • Please Note: the results of diffnet_1024x320_ms are not reported in paper *
Methods abs rel sq rel RMSE rmse log D1 D2 D3
1024x320 0.097 0.722 4.345 0.174 0.907 0.967 0.984
1024_320_ms 0.094 0.678 4.250 0.172 0.911 0.968 0.984
1024x320_ms_ttr 0.079 0.640 3.934 0.159 0.932 0.971 0.984
640x192 0.102 0.753 4.459 0.179 0.897 0.965 0.983
640x192_ms 0.101 0.749 4.445 0.179 0.898 0.965 0.983

Setting up before training and testing

Training:

sh start2train.sh

Testing:

sh disp_evaluation.sh

Infer a single depth map from a RGB:

sh test_sample.sh

Acknowledgement

Thanks the authors for their works:

diffnet's People

Contributors

brandleyzhou avatar davegreenwood avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

diffnet's Issues

Multi-GPU training hangs

Hello,
When I start multi gpu training. I run the following command.
python -m torch.distributed.launch --nproc_per_node=2 train.py --split eigen_zhou --learning_rate 1e-4 --height 320 --width 1024 --scheduler_step_size 14 --batch_size 2 --model_name mono_model --png --data_path ../4_monodepth2/data/KITTI/ --num_epochs 40 --log_dir weights_logs

If I set --nproc_per_node=1, then it runs alright on single GPU, but if I set --nproc_per_node=2, then it just prints the comments before it initializes distributed training but after that, it just stucks.
From nvidia-smi, I can see the GPUs are 100% occupied, but training does not start (weight_logs also does not get created)

I have attached screenshot where it gets stuck.
Can you please help me with knowing what this might be?
diffnet_multigpuStuck

Thank you for you time.

viz_map is not found

Hi:
Thanks for your code. When I am going to evaluate my models after training, I found there is no file called viz_map. In evaluate_depth.py the code "from viz_map import save_depth, save_visualization,save_error_visualization" is wrong.

About model's FPS

Hello
Thank you for your good work!!

I'm calculating the DIFFNet's FPS in RTX2080ti to fairly compare our works.
But the DIFFNet and monodepth2 's fps are so different from those reported in your paper. Can I get your code to calculate the fps, please?

I measured the fps with the following code.

import torch
import networks
en = networks.test_hr_encoder.hrnet18(False)
en.num_ch_enc = [ 64, 18, 36, 72, 144 ]
de= networks.HRDepthDecoder(en.num_ch_enc, [0])
# depth_net=DepthResNet(version="101pt")
=
device = torch.device('cuda')
en.to(device)
en.eval()
de.to(device)
de.eval()
optimal_batch_size=1
dummy_input = torch.randn(optimal_batch_size, 3,192,640, dtype=torch.float).to(device)
repetitions=10000
total_time = 0
print("start calculate")
with torch.no_grad():
      for rep in range(repetitions):
             starter, ender = torch.cuda.Event(enable_timing=True), torch.cuda.Event(enable_timing=True)
             starter.record()
             _ = de(en(dummy_input))
             ender.record()
             torch.cuda.synchronize()
             curr_time = starter.elapsed_time(ender)/1000
             if rep!=0:
                 total_time += curr_time
repetitions=repetitions-1
print(total_time)
Throughput = (repetitions*optimal_batch_size)/total_time
print('Final FPS:',Throughput,' total_time:',total_time)
print("weight num: ",sum(p.numel() for p in en.parameters())+sum(p.numel() for p in de.parameters()))

And the following results were obtained for each models.

Model FPS
DIFFNet 34.92
Monodepth2 282.25

About the license for this model

Thank you for sharing your great code. 😄

What is the license for this model? I'd like to cite it to the repository I'm working on if possible, but I want to post the license correctly.

Thank you.

Cityscapes model

Hi. First, thank you for opening your nice paper and source code.

Could you share checkpoints that were pretrained on Cityscapes and fine-tuned on KITTI (i.e., CS → K)?

I would like to know whether DiffNet that I pretrained on Cityscapes is correct.

Thanks!

Changing Input Size

Hi,
The provided code gives the results for 640x192 image size. where can I change it to the original size input (1024x320) and train with that?
Also, it seems that you add an internal feature fusion to the original HRNet, I would like to remove that and test it with the original HRNet. In the "test_hr_encoder.py" I tried to remove "mixed_features" and only return "features", but in the decoder, I get an error. Is there any way to train your model with the original HRNet?

Loss

Hello,
When I run the code, I wonder whether you used the options of uncertain_mask and flipping_loss. Because I can't reproduce the accuracy in your paper at the resolution of 1024*320. Thanks for your reply.

test on video

Hello, would like to ask you a question.
The input to the model is a video, how to generate a video after a depth estimate?
Can you tell me how to fix it? Thanks!

the best,
Rui Zhang

Training

Thanks for your working.
Here are something detials i want 2 ask you . Here are my torch
torch 1.7.1+cu110 torchaudio 0.7.2 torchsummary 1.5.1
torchvision 0.8.2+cu110
I found when i set the initial learning rate as 10−4 for the first 14 epochs and then 10−5 for last 5 epochs ,my experimental results are very different from yours . Is it the reason for different PyTorch versions?Or my training process wrong?

Environment

Hi, thank you for sharing your nice work.

Could you share the environment setting such as versions of packages for this work?

I cannot reproduce the results of this paper even if using the pretrained model that is provided in this repo.

In my evaluation:
0.1024 0.7632 4.482 1.799 0.8954 0.9645 0.9831

In the paper:
0.102 0.764 4.483 0.180 0.896 0.965 0.983

issue about downloading hrnet pretrained on ImageNet

First of all, thank you for sharing this cool work.

I faced an issue that an error occurred when I run the start2train.sh.

The error is below:
image

I ran start2train.sh on another computer that has a different IP address,
but the error also occurred.

Thank you.

Saved trained models

Hi, Thank you for sharing your amazing code. I run the training code and it was trained for 20 epochs but I don't know where the models are saved? also your code save each epoch results separately or only save the last epoch? and the last question, where can I change the number of epochs for training?

test file missing

Thanks for your work of DIFFNet!
I want to evaluate the results of the training in my PC, but the file "splits/eigen/gt_depths.npz" is required. I can't find it in the document. Could you please provide this file? Thanks!

Missing Keys in Pretrained Weights

Hi @brandleyzhou, thank you for your great work!

I met the following problem when testing your pretrained models:

Exception has occurred: RuntimeError
Error(s) in loading state_dict for HRDepthDecoder:
	Missing key(s) in state_dict: "convs.up_x9_0.conv.conv.weight", "convs.up_x9_0.conv.conv.bias", "convs.up_x9_1.conv.conv.weight", "convs.up_x9_1.conv.conv.bias", "convs.72.ca.fc.0.weight", "convs.72.ca.fc.2.weight", "convs.72.conv_se.weight", "convs.72.conv_se.bias", "convs.36.ca.fc.0.weight", "convs.36.ca.fc.2.weight", "convs.36.conv_se.weight", "convs.36.conv_se.bias", "convs.18.ca.fc.0.weight", "convs.18.ca.fc.2.weight", "convs.18.conv_se.weight", "convs.18.conv_se.bias", "convs.9.ca.fc.0.weight", "convs.9.ca.fc.2.weight", "convs.9.conv_se.weight", "convs.9.conv_se.bias", "convs.dispConvScale0.conv.weight", "convs.dispConvScale0.conv.bias", "convs.dispConvScale1.conv.weight", "convs.dispConvScale1.conv.bias", "convs.dispConvScale2.conv.weight", "convs.dispConvScale2.conv.bias", "convs.dispConvScale3.conv.weight", "convs.dispConvScale3.conv.bias", "decoder.0.conv.conv.weight", "decoder.0.conv.conv.bias", "decoder.1.conv.conv.weight", "decoder.1.conv.conv.bias", "decoder.2.ca.fc.0.weight", "decoder.2.ca.fc.2.weight", "decoder.2.conv_se.weight", "decoder.2.conv_se.bias", "decoder.3.ca.fc.0.weight", "decoder.3.ca.fc.2.weight", "decoder.3.conv_se.weight", "decoder.3.conv_se.bias", "decoder.4.ca.fc.0.weight", "decoder.4.ca.fc.2.weight", "decoder.4.conv_se.weight", "decoder.4.conv_se.bias", "decoder.5.ca.fc.0.weight", "decoder.5.ca.fc.2.weight", "decoder.5.conv_se.weight", "decoder.5.conv_se.bias", "decoder.6.conv.weight", "decoder.6.conv.bias", "decoder.7.conv.weight", "decoder.7.conv.bias", "decoder.8.conv.weight", "decoder.8.conv.bias", "decoder.9.conv.weight", "decoder.9.conv.bias". 

The pretrained weights are downloaded from this repository page. Specifically, I was testing two pretrained models:

Could you please have a look at this and upload the complete models? Thanks in advance!

run-time FPS

This is a great work. I have a question about "run-time FPS". In Table.3 of your paper, you claim that the run-time is 87FPS. Under what circumstances do you get this value? It takes at least 53ms for me to use GPU Nvidia RTX3090 to process a picture (640x192).

About torch::jit::trace

Hello,
Thank you for sharing your work, and I want to use libtorch to deploy this network in C++, but when using torch::jit::trace(), I get this error(executing test_sample.py can run successfully):
image
image
Because torch::jit::trace() cannot handle dictionary, I changed the output of depth_decoder to list, and there is a line "import hr_networks" in test_sample.py, but I did not find hr_networks, I don't know if this affectstorch::jit::trace().

Thank you very much!

cannot reproduced results mentioned in the paper

Hi, I trained your model with 640x192 and 1025x320 input sizes, but the results are different from what you mentioned in the paper.

Here are the results I got:

-> Computing predictions with size 640x192
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.108 & 0.792 & 4.589 & 0.186 & 0.889 & 0.963 & 0.982 \

-> Computing predictions with size 1024x320
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.103 & 0.909 & 4.642 & 0.183 & 0.899 & 0.965 & 0.982 \

And here are the results mentioned in the paper:

Screenshot_1

Screenshot_2

I don't know what cause this difference, because when I used your pre-trained weights for evaluation, I got the same results as yours. do you have any idea why? maybe the code has slightly changed? or a different version of the torch can cause this?

STEREO SCALE FACTOR

Hello
Can the network trained by MS method use STEREO_SCALE_FACTOR(5.4) to get the real scale like MonoDepth2?
Thank you very much!

Missing Keys in 1024x320 Pretrained Weights

Hi @brandleyzhou! Thank you for your work!

I met a problem when testing your pretrained models:

RuntimeError: Error(s) in loading state_dict for HRDepthDecoder:
       Missing key(s) in state_dict: "convs.72.ca.fc.0.weight", "convs.72.ca.fc.2.weight", "convs.72.conv_se.weight", "convs.72.conv_se.bias", "convs.36.ca.fc.0.weight", "convs.36.ca.fc.2.weight", "convs.36.conv_se.weight", "convs.36.conv_se.bias", "convs.18.ca.fc.0.weight", "convs.18.ca.fc.2.weight", "convs.18.conv_se.weight", "convs.18.conv_se.bias", "convs.9.ca.fc.0.weight", "convs.9.ca.fc.2.weight", "convs.9.conv_se.weight", "convs.9.conv_se.bias", "decoder.2.ca.fc.0.weight", "decoder.2.ca.fc.2.weight", "decoder.3.ca.fc.0.weight", "decoder.3.ca.fc.2.weight", "decoder.4.ca.fc.0.weight", "decoder.4.ca.fc.2.weight", "decoder.5.ca.fc.0.weight", "decoder.5.ca.fc.2.weight".
       Unexpected key(s) in state_dict: "convs.72fSE.fc.0.weight", "convs.72fSE.fc.2.weight", "convs.72fSE.conv_se.weight", "convs.72fSE.conv_se.bias", "convs.36fSE.fc.0.weight", "convs.36fSE.fc.2.weight", "convs.36fSE.conv_se.weight", "convs.36fSE.conv_se.bias", "convs.18fSE.fc.0.weight", "convs.18fSE.fc.2.weight", "convs.18fSE.conv_se.weight", "convs.18fSE.conv_se.bias", "convs.9fSE.fc.0.weight", "convs.9fSE.fc.2.weight", "convs.9fSE.conv_se.weight", "convs.9fSE.conv_se.bias", "decoder.2.fc.0.weight", "decoder.2.fc.2.weight", "decoder.3.fc.0.weight", "decoder.3.fc.2.weight", "decoder.4.fc.0.weight", "decoder.4.fc.2.weight", "decoder.5.fc.0.weight", "decoder.5.fc.2.weight".

The pretrained weights are downloaded from this repository, I was testing 4 models:

I only experienced the problem with the 1024x320 model. Could you please have a look at what the problem might be? Thanks in advance!

Training and testing issue

Hi,

When I'm trying to test a simple image by running the " sh test_sample.sh " code, I get this error: " ModuleNotFoundError: No module named 'hr_networks' "

Would you please let me know how I can get "hr_networks"?

Also when I tried to train the model, this error popup:

from .hrnet_config import MODEL_CONFIGS
File "/media/armin/DATA/DIFFNet/networks/hrnet_config.py", line 5, in
from yacs.config import CfgNode as CN
ModuleNotFoundError: No module named 'yacs'

Do I miss something here?

where is the supplementary material mentioned in paper

At the bottom of page 9 said "The corresponding images are shown in the supplementary material." but I can't find the supplementary material section in this paper.
is there any misunderstanding on my part?
thanks for your time

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.