Giter Site home page Giter Site logo

weiyithu / surrounddepth Goto Github PK

View Code? Open in Web Editor NEW
244.0 10.0 36.0 4.01 MB

[CoRL 2022] SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation

License: MIT License

Python 100.00%
depth-estimation multi-camera self-supervised-learning transformer

surrounddepth's People

Contributors

lqzhao avatar weiyithu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

surrounddepth's Issues

A question about visualize_depth

Hi,

I use the code "python -m torch.distributed.launch --nproc_per_node 4 run.py --model_name test --config configs/nusc.txt --models_to_load depth encoder --load_weights_folder=/log/nusc/model/weights/ --save_pred_disps --eval_out_dir=/log/nusc/eval/ --eval_only"

But there is no picture output. The eval folder in the log directory is empty. So I'd like to ask for advice on how to visualize my evaluation

Thanks,

About disp to depth

Hi, Thanks a lot for the nice work!

I have a basic question about the disp to depth as follows:

def disp_to_depth(disp, min_depth, max_depth):
"""Convert network's sigmoid output into depth prediction
The formula for this conversion is given in the 'additional considerations'
section of the paper.
"""
min_disp = 1 / max_depth
max_disp = 1 / min_depth
scaled_disp = min_disp + (max_disp - min_disp) * disp # 01 0.110
depth = 1 / scaled_disp # 0.1~10
return scaled_disp, depth

May I ask the reason that doing such an operation for depth rather than disp * max_depth? Thanks a lot for your help!

Depth Metrics

Hi, I have a question about the unit of the metrics. Is the RMSE for nuScenes in meter?

不能复现论文结果

作者您好,
我尝试复现论文中在DDAD数据集的实验结果。但是在20个epoch中,最好的结果也达不到论文中所展示的结果。
在scale-ambiguous evaluation中,最好的结果如下:
all
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.208 & 3.474 & 12.536 & 0.309 & 0.720 & 0.886 & 0.944 \
以下是论文的结果:
image

A question about distributed training on DDAD dataset

Hello! I am following your work and doing a reproduction. But I got these questions below while using the command python -m torch.distributed.launch --nproc_per_node 8 run.py --model_name ddad --config configs/ddad.txt for distributed training on the DDAD dataset.

[E ProcessGroupNCCL.cpp:587] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(OpType=_ALLGATHER_BASE, Timeout(ms)=1800000) ran for 1806986 milliseconds before timing out. '

[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. T o avoid this inconsistency, we are taking the entire process down.

After training for a while, the process would be automatically shut down for running overtime.
Are there any details or training settings that I have ignored? Or does the torch version matter?
Thanks!

Reproduction problem of the nuscenes dataset

Hello, Your results are great, I'm following your work.
I have a problem of reproducing the results for the nuScenes dataset, the abs_rel by Monodepth2 for the nuScenes dataset are high to 0.35, and the curve during training is very strange.
I use the code of Monodepth2, and use your code to process the dataset.
How can I reproduce the Monodepth2 results in your paper?Are there any details that need special attention?
Thanks!
1665560657308
103

DDAD training sample

作者你好,我尝试着用你的模型训练了一下。但是在DDAD数据集下的训练样本似乎只有12319,而不是12650。请问这是正常的吗?我统计了我所下载的DDAD数据集的图片数目,确保了它是完整的(99600)。请问你是否与nuscence数据集一样去除了静态帧?在你提供的index.pkl中只有16421张图片,另外的79张是人为去除了吗?为什么要这么做呢?
另外,关于模型测试,SurroundDepth是没有额外设置测试集吗?
期待您的回复!
祝好!

How to get the metric depth?

Dear author:
Thank you very much for your contributions in this paper!
I try to get the depth in evaluation. Theoretically, we can get the metric depth between min_depth and max_depth(meter) after the function 'disp_to_depth' . However, I observed several groups of depth generated in this way, it seems not like a correct metric depth. As below:
(Pdb) pred_depth
array([[0.7076706 , 0.7073359 , 0.70599675, ..., 0.60197115, 0.59946537, 0.5988389 ],
[0.70776784, 0.7074405 , 0.7061312 , ..., 0.60220945, 0.599669 , 0.59903383],
[0.7082062 , 0.70791256, 0.706738 , ..., 0.603285 , 0.60058796, 0.5999137 ],
...,
[0.11618532, 0.1162222 , 0.11636975, ..., 0.11287601, 0.11278087, 0.11275709],
[0.11617843, 0.11621682, 0.11637037, ..., 0.11286871, 0.11277094, 0.11274651],
[0.1161769 , 0.11621563, 0.11637051, ..., 0.11286709, 0.11276875, 0.11274416]], dtype=float32)
The depth number is very small compare to real depth, much of them less than 1.0. So how to get the metric depth?

whether intrinsics matrix is normalized?

Hi weiyi,
nice job.
I have some questions.
when you use the intrinsics matrix in the Nuscenes dataset, I found you did not normalize the intrinsics matrix K.
In monodepth2, the intrinsics matrix is normalized by the original image size.
so the K = np.array([[0.58, 0, 0.5, 0],
[0, 1.92, 0.5, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]], dtype=np.float32)
some small values.
But in your work, the K is directly introduced without any change.
I wonder if this way is ok?

About the dataset fetching for Nuscenes

Hi, thanks for sharing your great work. I am reading the code of this work. I notice you get the adjacent frame by this line: index_temporal_i = cam_sample['prev'] and index_temporal_i = cam_sample['next']
I'm very curious about whether the image obtained in the way is a keyframe or a sweep frame.
And how do you remove the static scenes?

Thanks a lot and looking forward to your reply.

A question about depth gt storage usage on nusc dataset

python export_gt_depth_nusc.py val

The depth map occupies more than 200G, Can the depth map gt take up less space?
I want to use depth map supervision.

python export_gt_depth_nusc.py train

This will make my storage explode.

About Deployment

Hi,It's a very great work.Can it be deployed on embed system such as Nvidia Jetson AGX?

About focal_scale

Hi, thank you for nice work!
I'm confused about the 'focal_scale'. Why do we need to do this:

SurroundDepth/runer.py

Lines 382 to 383 in 22dfecf

if self.opt.focal:
pred_depth = pred_depth * data[("K", 0, 0)][i, 0, 0].item() / self.opt.focal_scale

about training time

Hello, thank you for your great work. I was wondering how long it took to train this network and what was the model and quantity of the GPU(s) used. I would greatly appreciate it if you could reply!

Question about scale-aware model type nuScenes evaluation

Hi, I tried to evaluate nuScenes validation set performance with your released model, which is nusc_scale model. Since this model should be scale-aware, I suppose the scale-aware evaluation results should be similar as you mentioned in README.md, which is

type dataset Abs Rel Sq Rel delta < 1.25
scale-aware nuScenes 0.280 4.401 0.661

However, it turns out the result worse than expected. The scale-ambiguity evaluation was higher than scale-aware evaluation in the scale-aware model. The relevant output was shown as below

Loading depth weights...
Loading encoder weights...
Training model named: nusc_scale
There are 20096 training items and 6019 validation items

median: 0.33512431383132935
-> Evaluating 1
scale-ambiguous evaluation:
front
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   0.262  &   2.961  &  10.989  &   0.398  &   0.527  &   0.791  &   0.889  \\
front_left
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   0.301  &   2.457  &   7.887  &   0.398  &   0.532  &   0.788  &   0.893  \\
back_left
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   0.289  &   2.230  &   7.050  &   0.386  &   0.578  &   0.799  &   0.895  \\
back
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   0.329  &   3.483  &  11.492  &   0.477  &   0.405  &   0.712  &   0.850  \\
back_right
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   0.298  &   2.418  &   7.371  &   0.405  &   0.556  &   0.789  &   0.887  \\
front_right
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   0.305  &   2.714  &   8.292  &   0.417  &   0.530  &   0.778  &   0.882  \\
all
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   0.297  &   2.711  &   8.847  &   0.413  &   0.522  &   0.776  &   0.883  \\
scale-aware evaluation:
front
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   1.976  &  42.443  &  22.277  &   1.090  &   0.048  &   0.107  &   0.194  \\
front_left
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   2.300  &  49.173  &  21.861  &   1.169  &   0.036  &   0.081  &   0.152  \\
back_left
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   2.392  &  51.237  &  21.708  &   1.186  &   0.029  &   0.070  &   0.135  \\
back
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   1.679  &  26.246  &  15.413  &   0.992  &   0.092  &   0.190  &   0.295  \\
back_right
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   2.569  &  61.392  &  22.064  &   1.214  &   0.031  &   0.075  &   0.143  \\
front_right
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   2.476  &  57.699  &  22.465  &   1.206  &   0.037  &   0.087  &   0.154  \\
all
 abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   2.232  &  48.032  &  20.965  &   1.143  &   0.046  &   0.102  &   0.179  \\

I export the GT by using tools/export_gt_depth_nusc.py with val and used configs/nusc_scale_pretrain.txt for evaluation (most config stay same except I changed the min_depth to 0.5).

Is this reasonable or where I am using it wrong, thank you.

Some problems encountered in preparing data

Hi!
I follow the README to prepare DDAD data. But after I perform sift and match operations, I find that the content contained in the sift and match folders is not complete. There should be folders from 000 to 199, but the sift and match folders are not complete, my sift folder only contains 000 to 106, and my match folder only contains 001 to 144. The same problem also exists in the depth folder, the numbered files in this folder are complete (000 to 199), but some files are empty. Why are these situations?

A question about self-supervised and supervised learning

Basically we use self-supervised methods to train depth prediction models. Have you tried self-supervised combined with supervised methods? You know, Nuscenes and DDAD datasets have some sparse point cloud. I tried but got bad performance and did not figure out what lead to such result.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.