xy-guo / learning-monocular-depth-by-stereo Goto Github PK

View Code? Open in Web Editor NEW

94.0 94.0 8.0 754 KB

Learning Monocular Depth by Distilling Cross-domain Stereo Networks, ECCV18

Home Page: https://arxiv.org/abs/1808.06586

License: MIT License

Python 73.05% C++ 4.96% Cuda 20.05% Shell 1.95%

learning-monocular-depth-by-stereo's People

Contributors

Stargazers

Watchers

Forkers

zhaozz-lab collector-m stephen-threed songya liuguoyou ling-zzz sg47 desperadolxh

learning-monocular-depth-by-stereo's Issues

what is the meaning of this code?

Learning-Monocular-Depth-by-Stereo/models/monocular_model.py

110 def forward(self, x):
111 mean = Variable(torch.FloatTensor([0.485, 0.456, 0.406])).cuda()
112 var = Variable(torch.FloatTensor([0.229, 0.224, 0.225])).cuda()
113 x = (x - mean.view(1, -1, 1, 1)) / (var.view(1, -1, 1, 1))

Why is this done in monocular model, but does not done this in stereo model? Thank you.

Decoder training in main_distill_mono.py

Learning-Monocular-Depth-by-Stereo/main_distill_mono.py

Line 104 in 3547a9e

    
           model.module.model.only_train_dec = epoch_idx < 1  # only train decoder for the first epoch

Here you train the decoder only for the first epoch. Why?

Could it be possible to provde us a checkpoint？

Thank you very much for sharing your brilliant work.
Could it be possible to provde us a checkpoint model for testing？

No module named mkl

Hi, in your readme, I didn't see any instruction for installing mkl package.
This error occurs when I tried to run main_stereo.py, where mkl is imported and used on the second and third lines.

Can you clarify how to install it correctly? I tried pip install mkl and the installation runs successfully, but the ImportError persists.
Other than installing the package, I found a solution which is commenting out the mkl part, and manually set --num_threads to 0 to avoid the shared memory limit. Perhaps you can add this workaround to your readme. Thank you.

Result discrepancy between provided model and the paper

I download some of the pretrained models, and found that results are different from your paper.
Here's one of the example (release-StereoNoFt.ckpt):
abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3
0.0754, 0.6812, 4.023, 0.162, 0.000, 0.931, 0.971, 0.984

In the paper (page 14) however, you have
abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3
0.072, 0.665, 3.836, 0.153, 0.000, 0.936, 0.973, 0.986

The results obtained from your pretained model are slightly worse than what you report in the paper, could you please clarify the reason?

About The Dataset

Thanks for your work. For the preparation of the dataset, can you give the relevant links and provide more detailed instructions?

Cityscape Test

Hi,

Thank you for your work! Could I get the code to generate the depth on Cityscapes using your model? Like the result in Fig.3 in your paper.

So sorry for my troubles

Many thanks

Ground Truth Depth Map

Hi, Guo,

I am wondering which ground truth depth map you use for supervised fine-tuning and evaluation on KITTI dataset. Is the offical depth provided by KITTI or from monodepth. I am confused, which one shall we use?

Thanks!

PSMNet implementation

Do you intend to provide the PSMNet implementation for the stereo network in the future? Thank you.

Training with Cityscapes datasets

Table 1
StereoUnsupFt→Mono pt No S,K→K 0.099 0.745 4.424 0.182 0.884 0.963 0.983
StereoUnsupFt→Mono pt No S,K→C,K 0.095 0.703 4.316 0.177 0.892 0.966 0.984

From the result in Table1, i can see that using Cityscapes datasets will improve overall performance.

I want to know how to use Cityscapes datasets on training.
After training mono network by KITTI datasets, then finetune its network by Cityscapes datasets?
Or training mono network by both KITTI and Cityscapes datasets?

correlation 1d

Hi, Guo,

I just want to test the stereo model to produce the disparity. But it occurs error for the "correlation 1d". The error is "Segmentation fault" and no other information. Can you figure out the reason, or can you share you stereo disparity results to me?

Thanks!

Does the camera baseline matter?

In your experiments, you use only the scene flow dataset as synthetic data. I wonder if the camera baselines (and other intrinsic parameters) are the same across the three different scenarios (flying things, driving, monkaa)?
In my case, I want to train the proxy stereo net with scene flow and other additional data, does that degrade the network's performance due to different camera settings?

Performance of `StereoNoFt` on Cityscapes

In the paper you state "Stereo networks generalize much better and have smaller synthetic-to-real domain transfer problems."

When I tried release-StereoNoFt.ckpt on a Cityscapes stereo image pair, it produces the following result: (the image is resized to 1024x512 using cv2.INTER_AREA)

But on KITTI it's pretty good (image size 1280x384)

The monocular model release-StereoUnsupFt-Mono-pt.ckpt has the same phenomenon: bad on Cityscapes but good on KITTI. I also tried removing the car hood but without much improvement.
Could you please give us a guide on how to reproduce the result of Figure 7?

Correlation1d with negative disparity

Hi,
Thanks you so much for the excellent work.
I am dealing with stereo images with negative disparity ( The stereo images were adapted from light field images, so their disparity would have negative value).
I'd like to ask you that how should I modify the code of correlation1d to handle images with negative disparity?
It would be greatly appreciate if you can kindly offer me any advice.
best,

Using custom dataset

Hello great work @xy-guo and team!
I have stereo images and depth maps from zed camera for a custom dataset(realistic).In place of kitti scene flow dataset, I thought to initially train the stereo network on the zed camera depth maps.Then go for unsupervised stereo training on the same realistic data (with may be different scenes) and then do the final monocular training for the same dataset.
Will this result in improvements of depth estimation the way you have got in the proposed paper?Or basically will the final monocular depth estimation accuracy be limited to only the accuracy of zed camera depth maps?
Any suggestion is greatly appreciated.
Thanks