yvanyin / vnl_monocular_depth_prediction Goto Github PK

View Code? Open in Web Editor NEW

458.0 13.0 79.0 27.67 MB

Monocular Depth Prediction

License: Other

Python 100.00%

depth-prediction monocular-depth-estimation single-image-depth-prediction

vnl_monocular_depth_prediction's People

Contributors

Stargazers

Watchers

Forkers

hanyeliu fengkai11 nnu-gisa rensimon fuxiao567 peterzhousz choodly peterzs recreatemyself crmauceri xielinjiang minzhangm karolmajek olgaiv39 fireparadise wangzheallen wangshuaixian flamehaze1115 tamwaiban anthonydickson sttomato kharshit adelaide-ai-group sauradip yosungho anurag1paul wei-baldwin-zeng minygd mfrdut liuguoyou walims abeerraj utkuozbulak kimhyung 5l1v3r1 sunil1821 cv-ip zherenx zhangyuancv itking666 rahul24-06 zebrajack soheilappear nakajimakou1 ida-zrt sirius-xie jiawangbian zhanghao00925 berak kopetri zuru sconlyshootery dmj1997 felix-pu christinatan0704 tjqansthd younglbt getuntun iternaadolfo chen-che-w dbraun-ub robert-junwang kevinpogrund zero2er0 zbwxp aoxiangfan gaoqiangwu ding606 tlwzzy maheshkkumar altarizer gtx818 cv-depth eriktaylor jxncyym vshanyiao

vnl_monocular_depth_prediction's Issues

how to get the surface normal image

Dear YvanYin:
I missed a problem when test your masterpiece,which is that i can't get surface normal image from depth image and RGB image ,would you please tell me how to get it.

Error when running test_any_images.py

I try to run this piece of code to get a sample inference on google colab

!cd /content/VNL_Monocular_Depth_Prediction && python ./tools/test_any_images.py \
		--dataroot    /content/VNL_Monocular_Depth_Prediction \
		--dataset     any \
		--cfg_file     /content/VNL_Monocular_Depth_Prediction/lib/configs/resnext101_32x4d_nyudv2_class \
		--load_ckpt   /content/VNL_Monocular_Depth_Prediction/nyu_rawdata.pth

But got an error like this

----------------- Options ---------------
                batchsize: 2                             
                 cfg_file: /content/VNL_Monocular_Depth_Prediction/lib/configs/resnext101_32x4d_nyudv2_class	[default: lib/configs/resnext_32x4d_nyudv2_c1]
                 dataroot: /content/VNL_Monocular_Depth_Prediction	[default: None]
                  dataset: any                           	[default: nyudv2]
                    epoch: 30                            
                load_ckpt: /content/VNL_Monocular_Depth_Prediction/nyu_rawdata.pth	[default: None]
                    phase: test                          
               phase_anno: test                          
              results_dir: ./evaluation                  
                   resume: False                         
              start_epoch: 0                             
               start_step: 0                             
                   thread: 4                             
              use_tfboard: False                         
----------------- End -------------------
INFO load_dataset.py:  31: any is created.
INFO test_any_images.py:  45:  test_data_size: 0                             
Traceback (most recent call last):
  File "./tools/test_any_images.py", line 47, in <module>
    model = MetricDepthModel()
  File "../VNL_Monocular_Depth_Prediction/lib/models/metric_depth_model.py", line 16, in __init__
    self.depth_model = DepthModel()
  File "../VNL_Monocular_Depth_Prediction/lib/models/metric_depth_model.py", line 121, in __init__
    self.decoder_modules = lateral_net.fcn_topdown(cfg.MODEL.ENCODER)
  File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 190, in __init__
    self._init_modules(self.init_type)
  File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 193, in _init_modules
    self._init_weights(init_type)
  File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 211, in _init_weights
    child_m.apply(init_func)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 445, in apply
    module.apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 446, in apply
    fn(self)
  File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 207, in init_func
    nn.init.normal_(m.weight.data, 1.0, 0.0)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/init.py", line 140, in normal_
    return _no_grad_normal_(tensor, mean, std)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/init.py", line 19, in _no_grad_normal_
    return tensor.normal_(mean, std)
RuntimeError: normal_ expects std > 0.0, but found std=0

Do you have any ideas ?

How to process NYU datest

Thanks for your share. I have a question on processing NYU datest, in your nyudv2_dataset.py, the depth image divide 10, what type of depth image should input? what are the value range of depth image? 0-65535?
Thanks

Depth image format

Hi, Thanks for the good job.
How do you deal with raw depth images？
It doesn't seem to fit my nyu depth data scale

How to get the camera intrinsic parameter ?

Hi.
In the process of point cloud reconstruction, you use camera intrinsic parameter, such as focal length and 2D coordinate of optical center. Is that all these parameters contain in the NYUD-v2 and KITTI dataset ? I mean that for each image in both datasets, it has its corresponding camera intrinsic parameter?

depth_to_class function is missing & loss function issue

Thanks YvanYin for the very nice work! I was trying to train kitti data with the provided loss function, but notice that in the kitti_dataset.py, the depth_to_class() module is missing. Therefore we cannot get data['b_classes']. In addition, I wonder if in the loss function, the input to the cross entropy loss should be (pred_depth, data['B_classes'], etc)?

Thanks in advance for the help!

nyu_rawdata.pth

I would like to ask where is the file named"nyu_rawdata.pth"?

Imagenet pre-trained model

Hi, can you please share the imagenet pretrained model that you used? It seems that the ResNeXt101-32x4d model is not available on torchvision. Thanks

About the VNL_loss

Hi, Thank you for your amazing works.
There is an question confuse me. I found that you use the center of the random croped image' s as the u0 and v0 to reconstruct point clouds in VNL_loss. Won't it get wrong result?

get ModuleNotFoundError when running Python files in tools dictionary

Traceback (most recent call last):
File "tools/test_nyu_metric.py", line 5, in
from lib.core.config import cfg
ModuleNotFoundError: No module named 'lib'

Traceback (most recent call last):
File "./tools/train_nyu_metric.py", line 1, in
from data.load_dataset import CustomerDataLoader
ModuleNotFoundError: No module named 'data'

Performance issue

Hello, I used nyu's pretrained model
But the value of rmse is 0.488
This seems to be much different from 0.416 in the paper
Want to know the reason and if there are any changed parameters?

KITTI test metric?

Thanks for your sharing, I have two questions on "how to test the error metric on KITTI?" As said in the paper, you use the Eigen split. But KITTI provides an offical train/validation split,

what's the difference between these two splits? It seems that Eigen used the raw depth map which is sparser than the official train/validation depth map?
the depth ground truth (KITTI provides) is very sparse, do you test the error metric on these sparse gt or inpainted dense gt?
Thanks for any help.

Data augmentation should change intrinsics

The data augmentation involves random cropping. This would leave the focal_x and focal_y values from the NYUV2 and KITTI datasets invalid with error. Especially when we crop with a different aspect ratio than 1:1
Is there a reason why the method is robust to this incorrect fx/fy ratio?

Pretrained Weights

Deare YvanYin,

Great work!
I was wondering if you were using pretrained weights for resnext50, resnext101, and mobilnet from torchvision.
Because I am not able to load torchvision weights with your code.

I would be happy if you could help me here.

Thank you!

Where to start to train?

Sorry, I don't know if you loaded the train file and the function to calculate the loss. I just didn't find them.

val_annotations.json is missing.

Hi, thank you for sharing the code. I tried to run the training code and apparently, there is only train annotation and test annotation. val_annotations.json is missing. Can you share the file? or where i can find it. or do i need to create by myself from available data?

Which part of program that i need to change to skip the validation process and directly to test?
where i can change the opt.phase?

Thank you

About NYUDv2 training dataset

Hi! I wonder if you may share your dataset when training the SOTA NYUDv2 model, which you referred to as "20k unlabeled images" (but it has to be labeled since you need depth supervision during training, right?).

is it real-time calculation?

Hello~
Are the demonstration results real-time calculation?

will the loss function be released?

if possible

Runtime error

Would it be possible to use this in realtime (for example on live webcam footage)?

And if so, what would be the latency on a modern pc?

Calculation of Sq_Rel Error

Hi,
I think there is a small typo in the metrics code:

VNL_Monocular_Depth_Prediction/lib/utils/evaluate_depth_error.py

Line 224 in 17ba024

    
           s_rel = ((gt_scale - pred_scale) * (gt_scale - pred_scale)) / (gt_scale * gt_scale)# compute errors

, where the equation of sq_rel should be (pred-gt)^2/gt rather than (pred-gt)^2/gt^2.

When will you release the training code?

It is a wonderful work, and I want to train the network on my own dataset, when will you release the training code? Or would you mind to send the code to me by email ?

不清楚是哪里配置不对？

File "E:\vSLAM\VNL_Monocular_Depth_Prediction-master\data\nyudv2_dataset.py", line 154, in depth_to_bins
bins = ((torch.log10(depth) - cfg.DATASET.DEPTH_MIN_LOG) / cfg.DATASET.DEPTH_BIN_INTERVAL).to(torch.int)
TypeError: div(): argument 'other' (position 1) must be Tensor, not NoneType

Data set format

Hi,YvanYin:
Thanks for your great work!
I want to use my own data set to train this model. The data set contains depth images and rgb images. How can I generate .mat files and .josn files?
How to convert image data into the format of your data set?
Thank you so much for your help!

nyu depth v2 preprocess

Hi, in the paper, you mentioned there are 464 different indoor scenes and 249 of them are used for training. Could you let me know where I can get the training list of scenes? Also, could you clarify how did you sample the images from the training set? What's the sampling rate or how many frames in each scene did you sample?

Convert network output to meters.

Hi!
Could you please clarify, how is it possible to convert network depth output, bounded between 0 and 1, to real-world values in meters?

how can i get real depth from depth map?

_, pred_depth_softmax= model.module.depth_model(img_torch)
pred_depth = bins_to_depth(pred_depth_softmax)
pred_depth = pred_depth.cpu().numpy().squeeze()

i get a depth map from this code in test_any_images.py
and finally i want to calculate real depth when i know max_depth, min_depth in the real world.

do you have related codes about that?

training with weighted cross entropy loss

Hello, I try to use weighted cross entropy to train a baseline model which just formulate the depth prediction as a classification problem instead of regression. But the training loss couldn't converge a low value and the results are very bad. Could you please give me some advice? I would appreciate it if you could provide your loss function and training code to me.
This is my email: [email protected]

How do you calculate the GT VN?

In your paper, you mention that predicted VN are calculated from the reconstructed point cloud. How about GT VN? How to get it?

''Pred_depth_scale'' Function

Depth prediction and surface normal

While getting output as depth prediction, its in 3 channel(greyscale), but surface normal wants 4 channel image and i am having this error when i am feeding depth image to the surfacenormal file

Depth in meters

Hello, Thank you so much for sharing this code. I am working on a project where we need to get the pedestrian velocities. To estimate the velocities, I need the depth information. We are using a depth camera for real-world implementation, but I need to run the same code on a dataset of monocular images to evaluate our algorithm. I am using your code to get the depth frames, but I am not sure about the unit of depth in the output depth frames. Is there a way to get the depth in meters?

GPU

Thanks for the good job.
What’s the gpu did you use and how long did the training take on kitti?

Inpainting method used on dataset.

I notice that the vacuity in ground truth of NYU dataset has been filled. I'd like to know the method you use on inpainting dataset and whether it helps the performance of the network. Also, is the same method used to inpaint KITTI dataset? Thank you!

Only can train 1 epoch?

Hello, when I use your train code to train on nyu, I can only train 1 epoch.
When start to train the scend epoch, the process will stuck at for i, data in enumerate(train_dataloader).
How can I solve this problem?Thank you so much.

About network training

Hello, when I use the training method in your article: training the network with NYUD and KITTI, the loss does not converge. Have you trained on nyud or Kitti alone.

The test metric on nyu v2 is not the same as the result in paper?

Thanks for sharing. For nyu v2, I downloaded the data and trained model you provide. After running the test code, I got the following result:
rel = 0.10590, log10= 0.04602, rms=0.48770, delta1=0.88267, delta2= 0.97619, deltat3=0.99389.
But the metrics in your paper are:
rel= 0.108, log10=0.048, rms=0.416, delta1=0.875, delta2=0.976, delta3=0.994.
There seems to be large differences in the rms metric. Did I misunderstand some details?
Thanks for any help.

unit for tools/test_any_images.py prediction?

It is good to know your wonderful work.
I use the provided image and script, and log the max/min value for the predicted depth before scaling 60000 https://github.com/YvanYin/VNL_Monocular_Depth_Prediction/blob/master/tools/test_any_images.py#L66.

For image
107_r.png
max: 0.6463524
min: 0.110629596
For image
26_r.png
max:0.5862275
min: 0.10630702
may I know the unit for these prediction? meters? or?

'VNL_Loss' object has no attribute 'cal_VNL_loss'

Line 193 in VNL_loss.py， loss = vnl_loss.cal_VNL_loss(pred_depth, gt_depth) , but there isn't cal_VNL_loss() in VNL_loss.
I guess it should be loss = vnl_loss(gt_depth, pred_depth) to use the forward() ?

Is there missing a init.py file in lib folder?

Hi @YvanYin, thanks for your wonderful work. but as mentioned, is there missing a init.py file in lib folder?

test on any images

Hi,thanks for sharing the code!

Right now,i want to use your code(test_any_images.py) to generate depth map for my own dataset,but the shape of the output of your model(use the pretrained kitti_official.pth ) is [1,100,300,400],can i get only one channel output like the right part of while still using your pretrained model?

Error when running test_any_images.py

I try to run this piece of code to get an inference sample on google colab

!cd /content/VNL_Monocular_Depth_Prediction && python ./tools/test_any_images.py \
		--dataroot    /content/VNL_Monocular_Depth_Prediction \
		--dataset     any \
		--cfg_file     /content/VNL_Monocular_Depth_Prediction/lib/configs/resnext101_32x4d_nyudv2_class \
		--load_ckpt   /content/VNL_Monocular_Depth_Prediction/nyu_rawdata.pth

But got an error like this

----------------- Options ---------------
                batchsize: 2                             
                 cfg_file: /content/VNL_Monocular_Depth_Prediction/lib/configs/resnext101_32x4d_nyudv2_class	[default: lib/configs/resnext_32x4d_nyudv2_c1]
                 dataroot: /content/VNL_Monocular_Depth_Prediction	[default: None]
                  dataset: any                           	[default: nyudv2]
                    epoch: 30                            
                load_ckpt: /content/VNL_Monocular_Depth_Prediction/nyu_rawdata.pth	[default: None]
                    phase: test                          
               phase_anno: test                          
              results_dir: ./evaluation                  
                   resume: False                         
              start_epoch: 0                             
               start_step: 0                             
                   thread: 4                             
              use_tfboard: False                         
----------------- End -------------------
INFO load_dataset.py:  31: any is created.
INFO test_any_images.py:  45:  test_data_size: 0                             
Traceback (most recent call last):
  File "./tools/test_any_images.py", line 47, in <module>
    model = MetricDepthModel()
  File "../VNL_Monocular_Depth_Prediction/lib/models/metric_depth_model.py", line 16, in __init__
    self.depth_model = DepthModel()
  File "../VNL_Monocular_Depth_Prediction/lib/models/metric_depth_model.py", line 121, in __init__
    self.decoder_modules = lateral_net.fcn_topdown(cfg.MODEL.ENCODER)
  File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 190, in __init__
    self._init_modules(self.init_type)
  File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 193, in _init_modules
    self._init_weights(init_type)
  File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 211, in _init_weights
    child_m.apply(init_func)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 445, in apply
    module.apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 446, in apply
    fn(self)
  File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 207, in init_func
    nn.init.normal_(m.weight.data, 1.0, 0.0)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/init.py", line 140, in normal_
    return _no_grad_normal_(tensor, mean, std)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/init.py", line 19, in _no_grad_normal_
    return tensor.normal_(mean, std)
RuntimeError: normal_ expects std > 0.0, but found std=0

Do you have any ideas ?

Code For Live Camera Feed Demo

Hi,

Thank you for the amazing work on this project.

Could you also share the code for running inference on live camera feed as shown in the README ?

Thanks :)

Could you also share the model trained on NYU with mobilenet_v2 backbone?

About point group size

In your paper's ablation studies, you discussed the impact of point group size.
The result shows that 80K is the best(though it's just slightly better than 20K).
But in your code, I find your point group size is 385* 385 * 0.15 = 22233, about 22K.
Why don't you choose larger point group size? consider the computation performance? You found that 22K is the best trade-off between model performance and computation ?

trainning code

Could you plan to kindly release your trainning code? Thanks

Cannot get the right depth on kitti

Hi YvanYin ,

it is really terrific what you and your team achieved with the new model. My respect to you.

I am currently working on a project, in which I could use precise Depth results generated, e.g by your work. But i am facing a problem with the output. As you addressed a above, when using the on kitti pretrained model to estimate the depth, we need to multiply the results with 80. But even if when I do so, the results are still not right. I used your KITTI pretrained model to predict the depth of the images from KITTI Object Detection. When I transform this depth into xyz coordinates, the pixel point cloud doesn't look right at all. As a comparision, when I use the PSMNet to estimate the Depth for those images, the results look acceptable. I was hoping you may know the reason of it and can help me out. I used the "test_any_image.py" script and adjusted the Path for '--dataroot', '--cfg_file' and '--load_ckpt' in "parse_arg_base". For your understanding, I attached the results generated by PSMNet (1st Image) and your Model (2nd Image).

(PSMNet)

(VNL)

Thanks in advanced.

Hyperparameters in paper vs code

Hi,
Thanks for releasing this code! I think the idea you present is very neat. I have a few questions:

I understand that the network outputs a depth between 0-1 which needs to be scaled properly to get the metric depth, but do you apply the loss function on the scaled ground truth depth images (i.e. between 0-1) or do you apply the loss function in the metric space? I ask this because I need to know if the hyperparameters are tuned in the metric space or in the scaled space between 0-1. From the paper, it appears that you apply the loss in the metric space e.g. because you have one hyperparameter theta = 0.6 m in the paper. I would appreciate if you can confirm this.
My second question concerns trying to understand how the names of the hyperparameters in the code are related to the names in the paper. Specifically what are the following hyperparameters called in the code?:
theta = 0.6 m
alpha = 120 degrees
beta = 30 degrees
Please let me know if I missed any hyperparameter from the paper.
Furthermore, I do not understand the following hyperparameters in the code (see the class VNL_Loss and function filter_mask):
delta_cos = 0.867
delta_diff_x = 0.005
delta_diff_y = 0.005
delta_diff_y = 0.005
delta_z = 0.0001

I could not properly grasp how these hyperparameters relate to the ones you discuss in the paper. I would be very grateful if you can educate me on this.

Thanks in advance!
Cheers,
Erik