xy-guo / learning-monocular-depth-by-stereo Goto Github PK
View Code? Open in Web Editor NEWLearning Monocular Depth by Distilling Cross-domain Stereo Networks, ECCV18
Home Page: https://arxiv.org/abs/1808.06586
License: MIT License
Learning Monocular Depth by Distilling Cross-domain Stereo Networks, ECCV18
Home Page: https://arxiv.org/abs/1808.06586
License: MIT License
Learning-Monocular-Depth-by-Stereo/models/monocular_model.py
110 def forward(self, x):
111 mean = Variable(torch.FloatTensor([0.485, 0.456, 0.406])).cuda()
112 var = Variable(torch.FloatTensor([0.229, 0.224, 0.225])).cuda()
113 x = (x - mean.view(1, -1, 1, 1)) / (var.view(1, -1, 1, 1))
Why is this done in monocular model, but does not done this in stereo model? Thank you.
Here you train the decoder only for the first epoch. Why?
Thank you very much for sharing your brilliant work.
Could it be possible to provde us a checkpoint model for testing?
Hi, in your readme, I didn't see any instruction for installing mkl
package.
This error occurs when I tried to run main_stereo.py
, where mkl
is imported and used on the second and third lines.
Can you clarify how to install it correctly? I tried pip install mkl
and the installation runs successfully, but the ImportError
persists.
Other than installing the package, I found a solution which is commenting out the mkl
part, and manually set --num_threads
to 0
to avoid the shared memory limit. Perhaps you can add this workaround to your readme. Thank you.
I download some of the pretrained models, and found that results are different from your paper.
Here's one of the example (release-StereoNoFt.ckpt
):
abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3
0.0754, 0.6812, 4.023, 0.162, 0.000, 0.931, 0.971, 0.984
In the paper (page 14) however, you have
abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3
0.072, 0.665, 3.836, 0.153, 0.000, 0.936, 0.973, 0.986
The results obtained from your pretained model are slightly worse than what you report in the paper, could you please clarify the reason?
Thanks for your work. For the preparation of the dataset, can you give the relevant links and provide more detailed instructions?
Hi,
Thank you for your work! Could I get the code to generate the depth on Cityscapes using your model? Like the result in Fig.3 in your paper.
So sorry for my troubles
Many thanks
Hi, Guo,
I am wondering which ground truth depth map you use for supervised fine-tuning and evaluation on KITTI dataset. Is the offical depth provided by KITTI or from monodepth. I am confused, which one shall we use?
Thanks!
Do you intend to provide the PSMNet implementation for the stereo network in the future? Thank you.
Table 1
StereoUnsupFt→Mono pt No S,K→K 0.099 0.745 4.424 0.182 0.884 0.963 0.983
StereoUnsupFt→Mono pt No S,K→C,K 0.095 0.703 4.316 0.177 0.892 0.966 0.984
From the result in Table1, i can see that using Cityscapes datasets will improve overall performance.
I want to know how to use Cityscapes datasets on training.
After training mono network by KITTI datasets, then finetune its network by Cityscapes datasets?
Or training mono network by both KITTI and Cityscapes datasets?
Hi, Guo,
I just want to test the stereo model to produce the disparity. But it occurs error for the "correlation 1d". The error is "Segmentation fault" and no other information. Can you figure out the reason, or can you share you stereo disparity results to me?
Thanks!
In your experiments, you use only the scene flow dataset as synthetic data. I wonder if the camera baselines (and other intrinsic parameters) are the same across the three different scenarios (flying things, driving, monkaa)?
In my case, I want to train the proxy stereo net with scene flow and other additional data, does that degrade the network's performance due to different camera settings?
In the paper you state "Stereo networks generalize much better and have smaller synthetic-to-real domain transfer problems."
When I tried release-StereoNoFt.ckpt
on a Cityscapes stereo image pair, it produces the following result: (the image is resized to 1024x512 using cv2.INTER_AREA
)
But on KITTI it's pretty good (image size 1280x384)
The monocular model release-StereoUnsupFt-Mono-pt.ckpt
has the same phenomenon: bad on Cityscapes but good on KITTI. I also tried removing the car hood but without much improvement.
Could you please give us a guide on how to reproduce the result of Figure 7?
Hi,
Thanks you so much for the excellent work.
I am dealing with stereo images with negative disparity ( The stereo images were adapted from light field images, so their disparity would have negative value).
I'd like to ask you that how should I modify the code of correlation1d to handle images with negative disparity?
It would be greatly appreciate if you can kindly offer me any advice.
best,
Hello great work @xy-guo and team!
I have stereo images and depth maps from zed camera for a custom dataset(realistic).In place of kitti scene flow dataset, I thought to initially train the stereo network on the zed camera depth maps.Then go for unsupervised stereo training on the same realistic data (with may be different scenes) and then do the final monocular training for the same dataset.
Will this result in improvements of depth estimation the way you have got in the proposed paper?Or basically will the final monocular depth estimation accuracy be limited to only the accuracy of zed camera depth maps?
Any suggestion is greatly appreciated.
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.