Giter Site home page Giter Site logo

vnl_monocular_depth_prediction's People

Contributors

yvanyin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vnl_monocular_depth_prediction's Issues

how to get the surface normal image

Dear YvanYin:
I missed a problem when test your masterpiece,which is that i can't get surface normal image from depth image and RGB image ,would you please tell me how to get it.

Error when running test_any_images.py

I try to run this piece of code to get a sample inference on google colab

!cd /content/VNL_Monocular_Depth_Prediction && python ./tools/test_any_images.py \
		--dataroot    /content/VNL_Monocular_Depth_Prediction \
		--dataset     any \
		--cfg_file     /content/VNL_Monocular_Depth_Prediction/lib/configs/resnext101_32x4d_nyudv2_class \
		--load_ckpt   /content/VNL_Monocular_Depth_Prediction/nyu_rawdata.pth

But got an error like this

----------------- Options ---------------
                batchsize: 2                             
                 cfg_file: /content/VNL_Monocular_Depth_Prediction/lib/configs/resnext101_32x4d_nyudv2_class	[default: lib/configs/resnext_32x4d_nyudv2_c1]
                 dataroot: /content/VNL_Monocular_Depth_Prediction	[default: None]
                  dataset: any                           	[default: nyudv2]
                    epoch: 30                            
                load_ckpt: /content/VNL_Monocular_Depth_Prediction/nyu_rawdata.pth	[default: None]
                    phase: test                          
               phase_anno: test                          
              results_dir: ./evaluation                  
                   resume: False                         
              start_epoch: 0                             
               start_step: 0                             
                   thread: 4                             
              use_tfboard: False                         
----------------- End -------------------
INFO load_dataset.py:  31: any is created.
INFO test_any_images.py:  45:  test_data_size: 0                             
Traceback (most recent call last):
  File "./tools/test_any_images.py", line 47, in <module>
    model = MetricDepthModel()
  File "../VNL_Monocular_Depth_Prediction/lib/models/metric_depth_model.py", line 16, in __init__
    self.depth_model = DepthModel()
  File "../VNL_Monocular_Depth_Prediction/lib/models/metric_depth_model.py", line 121, in __init__
    self.decoder_modules = lateral_net.fcn_topdown(cfg.MODEL.ENCODER)
  File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 190, in __init__
    self._init_modules(self.init_type)
  File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 193, in _init_modules
    self._init_weights(init_type)
  File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 211, in _init_weights
    child_m.apply(init_func)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 445, in apply
    module.apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 446, in apply
    fn(self)
  File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 207, in init_func
    nn.init.normal_(m.weight.data, 1.0, 0.0)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/init.py", line 140, in normal_
    return _no_grad_normal_(tensor, mean, std)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/init.py", line 19, in _no_grad_normal_
    return tensor.normal_(mean, std)
RuntimeError: normal_ expects std > 0.0, but found std=0

Do you have any ideas ?

How to process NYU datest

Thanks for your share. I have a question on processing NYU datest, in your nyudv2_dataset.py, the depth image divide 10, what type of depth image should input? what are the value range of depth image? 0-65535?
Thanks

Depth image format

Hi, Thanks for the good job.
How do you deal with raw depth images?
It doesn't seem to fit my nyu depth data scale

How to get the camera intrinsic parameter ?

Hi.
In the process of point cloud reconstruction, you use camera intrinsic parameter, such as focal length and 2D coordinate of optical center. Is that all these parameters contain in the NYUD-v2 and KITTI dataset ? I mean that for each image in both datasets, it has its corresponding camera intrinsic parameter?

depth_to_class function is missing & loss function issue

Thanks YvanYin for the very nice work! I was trying to train kitti data with the provided loss function, but notice that in the kitti_dataset.py, the depth_to_class() module is missing. Therefore we cannot get data['b_classes']. In addition, I wonder if in the loss function, the input to the cross entropy loss should be (pred_depth, data['B_classes'], etc)?

Thanks in advance for the help!

nyu_rawdata.pth

I would like to ask where is the file named"nyu_rawdata.pth"?

Imagenet pre-trained model

Hi, can you please share the imagenet pretrained model that you used? It seems that the ResNeXt101-32x4d model is not available on torchvision. Thanks

About the VNL_loss

Hi, Thank you for your amazing works.
There is an question confuse me. I found that you use the center of the random croped image' s as the u0 and v0 to reconstruct point clouds in VNL_loss. Won't it get wrong result?

get ModuleNotFoundError when running Python files in tools dictionary

Traceback (most recent call last):
File "tools/test_nyu_metric.py", line 5, in
from lib.core.config import cfg
ModuleNotFoundError: No module named 'lib'

Traceback (most recent call last):
File "./tools/train_nyu_metric.py", line 1, in
from data.load_dataset import CustomerDataLoader
ModuleNotFoundError: No module named 'data'

Performance issue

Hello, I used nyu's pretrained model
But the value of rmse is 0.488
This seems to be much different from 0.416 in the paper
Want to know the reason and if there are any changed parameters?

KITTI test metric?

Thanks for your sharing, I have two questions on "how to test the error metric on KITTI?" As said in the paper, you use the Eigen split. But KITTI provides an offical train/validation split,

  1. what's the difference between these two splits? It seems that Eigen used the raw depth map which is sparser than the official train/validation depth map?
  2. the depth ground truth (KITTI provides) is very sparse, do you test the error metric on these sparse gt or inpainted dense gt?
    Thanks for any help.

Data augmentation should change intrinsics

The data augmentation involves random cropping. This would leave the focal_x and focal_y values from the NYUV2 and KITTI datasets invalid with error. Especially when we crop with a different aspect ratio than 1:1
Is there a reason why the method is robust to this incorrect fx/fy ratio?

Pretrained Weights

Deare YvanYin,

Great work!
I was wondering if you were using pretrained weights for resnext50, resnext101, and mobilnet from torchvision.
Because I am not able to load torchvision weights with your code.

I would be happy if you could help me here.

Thank you!

Where to start to train?

Sorry, I don't know if you loaded the train file and the function to calculate the loss. I just didn't find them.

val_annotations.json is missing.

  1. Hi, thank you for sharing the code. I tried to run the training code and apparently, there is only train annotation and test annotation. val_annotations.json is missing. Can you share the file? or where i can find it. or do i need to create by myself from available data?

Which part of program that i need to change to skip the validation process and directly to test?
where i can change the opt.phase?

Thank you

About NYUDv2 training dataset

Hi! I wonder if you may share your dataset when training the SOTA NYUDv2 model, which you referred to as "20k unlabeled images" (but it has to be labeled since you need depth supervision during training, right?).

When will you release the training code?

It is a wonderful work, and I want to train the network on my own dataset, when will you release the training code? Or would you mind to send the code to me by email ?

不清楚是哪里配置不对?

File "E:\vSLAM\VNL_Monocular_Depth_Prediction-master\data\nyudv2_dataset.py", line 154, in depth_to_bins
bins = ((torch.log10(depth) - cfg.DATASET.DEPTH_MIN_LOG) / cfg.DATASET.DEPTH_BIN_INTERVAL).to(torch.int)
TypeError: div(): argument 'other' (position 1) must be Tensor, not NoneType

Data set format

Hi,YvanYin:
Thanks for your great work!
I want to use my own data set to train this model. The data set contains depth images and rgb images. How can I generate .mat files and .josn files?
How to convert image data into the format of your data set?
Thank you so much for your help!

nyu depth v2 preprocess

Hi, in the paper, you mentioned there are 464 different indoor scenes and 249 of them are used for training. Could you let me know where I can get the training list of scenes? Also, could you clarify how did you sample the images from the training set? What's the sampling rate or how many frames in each scene did you sample?

Convert network output to meters.

Hi!
Could you please clarify, how is it possible to convert network depth output, bounded between 0 and 1, to real-world values in meters?

how can i get real depth from depth map?

_, pred_depth_softmax= model.module.depth_model(img_torch)
pred_depth = bins_to_depth(pred_depth_softmax)
pred_depth = pred_depth.cpu().numpy().squeeze()

i get a depth map from this code in test_any_images.py
and finally i want to calculate real depth when i know max_depth, min_depth in the real world.

do you have related codes about that?

training with weighted cross entropy loss

Hello, I try to use weighted cross entropy to train a baseline model which just formulate the depth prediction as a classification problem instead of regression. But the training loss couldn't converge a low value and the results are very bad. Could you please give me some advice? I would appreciate it if you could provide your loss function and training code to me.
This is my email: [email protected]

How do you calculate the GT VN?

In your paper, you mention that predicted VN are calculated from the reconstructed point cloud. How about GT VN? How to get it?

Depth prediction and surface normal

While getting output as depth prediction, its in 3 channel(greyscale), but surface normal wants 4 channel image and i am having this error when i am feeding depth image to the surfacenormal file
image

Depth in meters

Hello, Thank you so much for sharing this code. I am working on a project where we need to get the pedestrian velocities. To estimate the velocities, I need the depth information. We are using a depth camera for real-world implementation, but I need to run the same code on a dataset of monocular images to evaluate our algorithm. I am using your code to get the depth frames, but I am not sure about the unit of depth in the output depth frames. Is there a way to get the depth in meters?

GPU

Thanks for the good job.
What’s the gpu did you use and how long did the training take on kitti?

Inpainting method used on dataset.

I notice that the vacuity in ground truth of NYU dataset has been filled. I'd like to know the method you use on inpainting dataset and whether it helps the performance of the network. Also, is the same method used to inpaint KITTI dataset? Thank you!

Only can train 1 epoch?

Hello, when I use your train code to train on nyu, I can only train 1 epoch.
When start to train the scend epoch, the process will stuck at for i, data in enumerate(train_dataloader).
How can I solve this problem?Thank you so much.

About network training

Hello, when I use the training method in your article: training the network with NYUD and KITTI, the loss does not converge. Have you trained on nyud or Kitti alone.

The test metric on nyu v2 is not the same as the result in paper?

Thanks for sharing. For nyu v2, I downloaded the data and trained model you provide. After running the test code, I got the following result:
rel = 0.10590, log10= 0.04602, rms=0.48770, delta1=0.88267, delta2= 0.97619, deltat3=0.99389.
But the metrics in your paper are:
rel= 0.108, log10=0.048, rms=0.416, delta1=0.875, delta2=0.976, delta3=0.994.
There seems to be large differences in the rms metric. Did I misunderstand some details?
Thanks for any help.

'VNL_Loss' object has no attribute 'cal_VNL_loss'

Line 193 in VNL_loss.py, loss = vnl_loss.cal_VNL_loss(pred_depth, gt_depth) , but there isn't cal_VNL_loss() in VNL_loss.
I guess it should be loss = vnl_loss(gt_depth, pred_depth) to use the forward() ?

test on any images

Hi,thanks for sharing the code!

Right now,i want to use your code(test_any_images.py) to generate depth map for my own dataset,but the shape of the output of your model(use the pretrained kitti_official.pth ) is [1,100,300,400],can i get only one channel output like the right part of while still using your pretrained model?

Error when running test_any_images.py

I try to run this piece of code to get an inference sample on google colab

!cd /content/VNL_Monocular_Depth_Prediction && python ./tools/test_any_images.py \
		--dataroot    /content/VNL_Monocular_Depth_Prediction \
		--dataset     any \
		--cfg_file     /content/VNL_Monocular_Depth_Prediction/lib/configs/resnext101_32x4d_nyudv2_class \
		--load_ckpt   /content/VNL_Monocular_Depth_Prediction/nyu_rawdata.pth

But got an error like this

----------------- Options ---------------
                batchsize: 2                             
                 cfg_file: /content/VNL_Monocular_Depth_Prediction/lib/configs/resnext101_32x4d_nyudv2_class	[default: lib/configs/resnext_32x4d_nyudv2_c1]
                 dataroot: /content/VNL_Monocular_Depth_Prediction	[default: None]
                  dataset: any                           	[default: nyudv2]
                    epoch: 30                            
                load_ckpt: /content/VNL_Monocular_Depth_Prediction/nyu_rawdata.pth	[default: None]
                    phase: test                          
               phase_anno: test                          
              results_dir: ./evaluation                  
                   resume: False                         
              start_epoch: 0                             
               start_step: 0                             
                   thread: 4                             
              use_tfboard: False                         
----------------- End -------------------
INFO load_dataset.py:  31: any is created.
INFO test_any_images.py:  45:  test_data_size: 0                             
Traceback (most recent call last):
  File "./tools/test_any_images.py", line 47, in <module>
    model = MetricDepthModel()
  File "../VNL_Monocular_Depth_Prediction/lib/models/metric_depth_model.py", line 16, in __init__
    self.depth_model = DepthModel()
  File "../VNL_Monocular_Depth_Prediction/lib/models/metric_depth_model.py", line 121, in __init__
    self.decoder_modules = lateral_net.fcn_topdown(cfg.MODEL.ENCODER)
  File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 190, in __init__
    self._init_modules(self.init_type)
  File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 193, in _init_modules
    self._init_weights(init_type)
  File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 211, in _init_weights
    child_m.apply(init_func)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 445, in apply
    module.apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 446, in apply
    fn(self)
  File "../VNL_Monocular_Depth_Prediction/lib/models/lateral_net.py", line 207, in init_func
    nn.init.normal_(m.weight.data, 1.0, 0.0)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/init.py", line 140, in normal_
    return _no_grad_normal_(tensor, mean, std)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/init.py", line 19, in _no_grad_normal_
    return tensor.normal_(mean, std)
RuntimeError: normal_ expects std > 0.0, but found std=0

Do you have any ideas ?

Code For Live Camera Feed Demo

Hi,

Thank you for the amazing work on this project.

Could you also share the code for running inference on live camera feed as shown in the README ?

Thanks :)

About point group size

In your paper's ablation studies, you discussed the impact of point group size.
The result shows that 80K is the best(though it's just slightly better than 20K).
But in your code, I find your point group size is 385* 385 * 0.15 = 22233, about 22K.
Why don't you choose larger point group size? consider the computation performance? You found that 22K is the best trade-off between model performance and computation ?

trainning code

Could you plan to kindly release your trainning code? Thanks

Cannot get the right depth on kitti

Hi YvanYin ,

it is really terrific what you and your team achieved with the new model. My respect to you.

I am currently working on a project, in which I could use precise Depth results generated, e.g by your work. But i am facing a problem with the output. As you addressed a above, when using the on kitti pretrained model to estimate the depth, we need to multiply the results with 80. But even if when I do so, the results are still not right. I used your KITTI pretrained model to predict the depth of the images from KITTI Object Detection. When I transform this depth into xyz coordinates, the pixel point cloud doesn't look right at all. As a comparision, when I use the PSMNet to estimate the Depth for those images, the results look acceptable. I was hoping you may know the reason of it and can help me out. I used the "test_any_image.py" script and adjusted the Path for '--dataroot', '--cfg_file' and '--load_ckpt' in "parse_arg_base". For your understanding, I attached the results generated by PSMNet (1st Image) and your Model (2nd Image).
000000
(PSMNet)

000000
(VNL)

Thanks in advanced.

Hyperparameters in paper vs code

Hi,
Thanks for releasing this code! I think the idea you present is very neat. I have a few questions:

  1. I understand that the network outputs a depth between 0-1 which needs to be scaled properly to get the metric depth, but do you apply the loss function on the scaled ground truth depth images (i.e. between 0-1) or do you apply the loss function in the metric space? I ask this because I need to know if the hyperparameters are tuned in the metric space or in the scaled space between 0-1. From the paper, it appears that you apply the loss in the metric space e.g. because you have one hyperparameter theta = 0.6 m in the paper. I would appreciate if you can confirm this.
  2. My second question concerns trying to understand how the names of the hyperparameters in the code are related to the names in the paper. Specifically what are the following hyperparameters called in the code?:
    theta = 0.6 m
    alpha = 120 degrees
    beta = 30 degrees
    Please let me know if I missed any hyperparameter from the paper.
    Furthermore, I do not understand the following hyperparameters in the code (see the class VNL_Loss and function filter_mask):
    delta_cos = 0.867
    delta_diff_x = 0.005
    delta_diff_y = 0.005
    delta_diff_y = 0.005
    delta_z = 0.0001

I could not properly grasp how these hyperparameters relate to the ones you discuss in the paper. I would be very grateful if you can educate me on this.

Thanks in advance!
Cheers,
Erik

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.