yvanyin / vnl_monocular_depth_prediction Goto Github PK
View Code? Open in Web Editor NEWMonocular Depth Prediction
License: Other
Monocular Depth Prediction
License: Other
Dear YvanYin:
I missed a problem when test your masterpiece,which is that i can't get surface normal image from depth image and RGB image ,would you please tell me how to get it.
I try to run this piece of code to get a sample inference on google colab
!cd /content/VNL_Monocular_Depth_Prediction && python ./tools/test_any_images.py \
--dataroot /content/VNL_Monocular_Depth_Prediction \
--dataset any \
--cfg_file /content/VNL_Monocular_Depth_Prediction/lib/configs/resnext101_32x4d_nyudv2_class \
--load_ckpt /content/VNL_Monocular_Depth_Prediction/nyu_rawdata.pth
But got an error like this
----------------- Options ---------------
batchsize: 2
cfg_file: /content/VNL_Monocular_Depth_Prediction/lib/configs/resnext101_32x4d_nyudv2_class [default: lib/configs/resnext_32x4d_nyudv2_c1]
dataroot: /content/VNL_Monocular_Depth_Prediction [default: None]
dataset: any [default: nyudv2]
epoch: 30
load_ckpt: /content/VNL_Monocular_Depth_Prediction/nyu_rawdata.pth [default: None]
phase: test
phase_anno: test
results_dir: ./evaluation
resume: False
start_epoch: 0
start_step: 0
thread: 4
use_tfboard: False
----------------- End -------------------
INFO load_dataset.py: 31: any is created.
INFO test_any_images.py: 45: test_data_size: 0
Traceback (most recent call last):
File "./tools/test_any_images.py", line 47, in <module>
model = MetricDepthModel()
File "../VNL_Monocular_Depth_Prediction/lib/models/metric_depth_model.py", line 16, in __init__
self.depth_model = DepthModel()
File "../VNL_Monocular_Depth_Prediction/lib/models/metric_depth_model.py", line 121, in __init__
self.decoder_modules = lateral_net.fcn_topdown(cfg.MODEL.ENCODER)
File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 190, in __init__
self._init_modules(self.init_type)
File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 193, in _init_modules
self._init_weights(init_type)
File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 211, in _init_weights
child_m.apply(init_func)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 445, in apply
module.apply(fn)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 446, in apply
fn(self)
File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 207, in init_func
nn.init.normal_(m.weight.data, 1.0, 0.0)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/init.py", line 140, in normal_
return _no_grad_normal_(tensor, mean, std)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/init.py", line 19, in _no_grad_normal_
return tensor.normal_(mean, std)
RuntimeError: normal_ expects std > 0.0, but found std=0
Do you have any ideas ?
Thanks for your share. I have a question on processing NYU datest, in your nyudv2_dataset.py, the depth image divide 10, what type of depth image should input? what are the value range of depth image? 0-65535?
Thanks
Hi, Thanks for the good job.
How do you deal with raw depth images?
It doesn't seem to fit my nyu depth data scale
Hi.
In the process of point cloud reconstruction, you use camera intrinsic parameter, such as focal length and 2D coordinate of optical center. Is that all these parameters contain in the NYUD-v2 and KITTI dataset ? I mean that for each image in both datasets, it has its corresponding camera intrinsic parameter?
Thanks YvanYin for the very nice work! I was trying to train kitti data with the provided loss function, but notice that in the kitti_dataset.py, the depth_to_class() module is missing. Therefore we cannot get data['b_classes']. In addition, I wonder if in the loss function, the input to the cross entropy loss should be (pred_depth, data['B_classes'], etc)?
Thanks in advance for the help!
I would like to ask where is the file named"nyu_rawdata.pth"?
Hi, can you please share the imagenet pretrained model that you used? It seems that the ResNeXt101-32x4d model is not available on torchvision. Thanks
Hi, Thank you for your amazing works.
There is an question confuse me. I found that you use the center of the random croped image' s as the u0 and v0 to reconstruct point clouds in VNL_loss. Won't it get wrong result?
Traceback (most recent call last):
File "tools/test_nyu_metric.py", line 5, in
from lib.core.config import cfg
ModuleNotFoundError: No module named 'lib'
Traceback (most recent call last):
File "./tools/train_nyu_metric.py", line 1, in
from data.load_dataset import CustomerDataLoader
ModuleNotFoundError: No module named 'data'
Hello, I used nyu's pretrained model
But the value of rmse is 0.488
This seems to be much different from 0.416 in the paper
Want to know the reason and if there are any changed parameters?
Thanks for your sharing, I have two questions on "how to test the error metric on KITTI?" As said in the paper, you use the Eigen split. But KITTI provides an offical train/validation split,
The data augmentation involves random cropping. This would leave the focal_x and focal_y values from the NYUV2 and KITTI datasets invalid with error. Especially when we crop with a different aspect ratio than 1:1
Is there a reason why the method is robust to this incorrect fx/fy ratio?
Deare YvanYin,
Great work!
I was wondering if you were using pretrained weights for resnext50, resnext101, and mobilnet from torchvision.
Because I am not able to load torchvision weights with your code.
I would be happy if you could help me here.
Thank you!
Sorry, I don't know if you loaded the train file and the function to calculate the loss. I just didn't find them.
Which part of program that i need to change to skip the validation process and directly to test?
where i can change the opt.phase?
Thank you
Hi! I wonder if you may share your dataset when training the SOTA NYUDv2 model, which you referred to as "20k unlabeled images" (but it has to be labeled since you need depth supervision during training, right?).
Hello~
Are the demonstration results real-time calculation?
if possible
And if so, what would be the latency on a modern pc?
Hi,
I think there is a small typo in the metrics code:
It is a wonderful work, and I want to train the network on my own dataset, when will you release the training code? Or would you mind to send the code to me by email ?
File "E:\vSLAM\VNL_Monocular_Depth_Prediction-master\data\nyudv2_dataset.py", line 154, in depth_to_bins
bins = ((torch.log10(depth) - cfg.DATASET.DEPTH_MIN_LOG) / cfg.DATASET.DEPTH_BIN_INTERVAL).to(torch.int)
TypeError: div(): argument 'other' (position 1) must be Tensor, not NoneType
Hi,YvanYin:
Thanks for your great work!
I want to use my own data set to train this model. The data set contains depth images and rgb images. How can I generate .mat files and .josn files?
How to convert image data into the format of your data set?
Thank you so much for your help!
Hi, in the paper, you mentioned there are 464 different indoor scenes and 249 of them are used for training. Could you let me know where I can get the training list of scenes? Also, could you clarify how did you sample the images from the training set? What's the sampling rate or how many frames in each scene did you sample?
Hi!
Could you please clarify, how is it possible to convert network depth output, bounded between 0 and 1, to real-world values in meters?
_, pred_depth_softmax= model.module.depth_model(img_torch)
pred_depth = bins_to_depth(pred_depth_softmax)
pred_depth = pred_depth.cpu().numpy().squeeze()
i get a depth map from this code in test_any_images.py
and finally i want to calculate real depth when i know max_depth, min_depth in the real world.
do you have related codes about that?
Hello, I try to use weighted cross entropy to train a baseline model which just formulate the depth prediction as a classification problem instead of regression. But the training loss couldn't converge a low value and the results are very bad. Could you please give me some advice? I would appreciate it if you could provide your loss function and training code to me.
This is my email: [email protected]
In your paper, you mention that predicted VN are calculated from the reconstructed point cloud. How about GT VN? How to get it?
Hello, Thank you so much for sharing this code. I am working on a project where we need to get the pedestrian velocities. To estimate the velocities, I need the depth information. We are using a depth camera for real-world implementation, but I need to run the same code on a dataset of monocular images to evaluate our algorithm. I am using your code to get the depth frames, but I am not sure about the unit of depth in the output depth frames. Is there a way to get the depth in meters?
Thanks for the good job.
What’s the gpu did you use and how long did the training take on kitti?
I notice that the vacuity in ground truth of NYU dataset has been filled. I'd like to know the method you use on inpainting dataset and whether it helps the performance of the network. Also, is the same method used to inpaint KITTI dataset? Thank you!
Hello, when I use your train code to train on nyu, I can only train 1 epoch.
When start to train the scend epoch, the process will stuck at for i, data in enumerate(train_dataloader)
.
How can I solve this problem?Thank you so much.
Hello, when I use the training method in your article: training the network with NYUD and KITTI, the loss does not converge. Have you trained on nyud or Kitti alone.
Thanks for sharing. For nyu v2, I downloaded the data and trained model you provide. After running the test code, I got the following result:
rel = 0.10590, log10= 0.04602, rms=0.48770, delta1=0.88267, delta2= 0.97619, deltat3=0.99389.
But the metrics in your paper are:
rel= 0.108, log10=0.048, rms=0.416, delta1=0.875, delta2=0.976, delta3=0.994.
There seems to be large differences in the rms metric. Did I misunderstand some details?
Thanks for any help.
It is good to know your wonderful work.
I use the provided image and script, and log the max/min value for the predicted depth before scaling 60000 https://github.com/YvanYin/VNL_Monocular_Depth_Prediction/blob/master/tools/test_any_images.py#L66.
For image
107_r.png
max: 0.6463524
min: 0.110629596
For image
26_r.png
max:0.5862275
min: 0.10630702
may I know the unit for these prediction? meters? or?
Line 193 in VNL_loss.py, loss = vnl_loss.cal_VNL_loss(pred_depth, gt_depth)
, but there isn't cal_VNL_loss()
in VNL_loss.
I guess it should be loss = vnl_loss(gt_depth, pred_depth)
to use the forward()
?
Hi @YvanYin, thanks for your wonderful work. but as mentioned, is there missing a init.py file in lib folder?
Hi,thanks for sharing the code!
Right now,i want to use your code(test_any_images.py) to generate depth map for my own dataset,but the shape of the output of your model(use the pretrained kitti_official.pth ) is [1,100,300,400],can i get only one channel output like the right part of while still using your pretrained model?
I try to run this piece of code to get an inference sample on google colab
!cd /content/VNL_Monocular_Depth_Prediction && python ./tools/test_any_images.py \
--dataroot /content/VNL_Monocular_Depth_Prediction \
--dataset any \
--cfg_file /content/VNL_Monocular_Depth_Prediction/lib/configs/resnext101_32x4d_nyudv2_class \
--load_ckpt /content/VNL_Monocular_Depth_Prediction/nyu_rawdata.pth
But got an error like this
----------------- Options ---------------
batchsize: 2
cfg_file: /content/VNL_Monocular_Depth_Prediction/lib/configs/resnext101_32x4d_nyudv2_class [default: lib/configs/resnext_32x4d_nyudv2_c1]
dataroot: /content/VNL_Monocular_Depth_Prediction [default: None]
dataset: any [default: nyudv2]
epoch: 30
load_ckpt: /content/VNL_Monocular_Depth_Prediction/nyu_rawdata.pth [default: None]
phase: test
phase_anno: test
results_dir: ./evaluation
resume: False
start_epoch: 0
start_step: 0
thread: 4
use_tfboard: False
----------------- End -------------------
INFO load_dataset.py: 31: any is created.
INFO test_any_images.py: 45: test_data_size: 0
Traceback (most recent call last):
File "./tools/test_any_images.py", line 47, in <module>
model = MetricDepthModel()
File "../VNL_Monocular_Depth_Prediction/lib/models/metric_depth_model.py", line 16, in __init__
self.depth_model = DepthModel()
File "../VNL_Monocular_Depth_Prediction/lib/models/metric_depth_model.py", line 121, in __init__
self.decoder_modules = lateral_net.fcn_topdown(cfg.MODEL.ENCODER)
File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 190, in __init__
self._init_modules(self.init_type)
File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 193, in _init_modules
self._init_weights(init_type)
File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 211, in _init_weights
child_m.apply(init_func)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 445, in apply
module.apply(fn)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 446, in apply
fn(self)
File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 207, in init_func
nn.init.normal_(m.weight.data, 1.0, 0.0)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/init.py", line 140, in normal_
return _no_grad_normal_(tensor, mean, std)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/init.py", line 19, in _no_grad_normal_
return tensor.normal_(mean, std)
RuntimeError: normal_ expects std > 0.0, but found std=0
Do you have any ideas ?
Hi,
Thank you for the amazing work on this project.
Could you also share the code for running inference on live camera feed as shown in the README ?
Thanks :)
In your paper's ablation studies, you discussed the impact of point group size.
The result shows that 80K is the best(though it's just slightly better than 20K).
But in your code, I find your point group size is 385* 385 * 0.15 = 22233, about 22K.
Why don't you choose larger point group size? consider the computation performance? You found that 22K is the best trade-off between model performance and computation ?
Could you plan to kindly release your trainning code? Thanks
Hi YvanYin ,
it is really terrific what you and your team achieved with the new model. My respect to you.
I am currently working on a project, in which I could use precise Depth results generated, e.g by your work. But i am facing a problem with the output. As you addressed a above, when using the on kitti pretrained model to estimate the depth, we need to multiply the results with 80. But even if when I do so, the results are still not right. I used your KITTI pretrained model to predict the depth of the images from KITTI Object Detection. When I transform this depth into xyz coordinates, the pixel point cloud doesn't look right at all. As a comparision, when I use the PSMNet to estimate the Depth for those images, the results look acceptable. I was hoping you may know the reason of it and can help me out. I used the "test_any_image.py" script and adjusted the Path for '--dataroot', '--cfg_file' and '--load_ckpt' in "parse_arg_base". For your understanding, I attached the results generated by PSMNet (1st Image) and your Model (2nd Image).
(PSMNet)
Thanks in advanced.
Hi,
Thanks for releasing this code! I think the idea you present is very neat. I have a few questions:
I could not properly grasp how these hyperparameters relate to the ones you discuss in the paper. I would be very grateful if you can educate me on this.
Thanks in advance!
Cheers,
Erik
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.