ygjwd12345 / transdepth Goto Github PK

View Code? Open in Web Editor NEW

171.0 171.0 20.0 667 KB

Code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction

License: MIT License

Python 98.08% Dockerfile 0.16% Shell 1.27% MATLAB 0.49%

transdepth's Introduction

Hi there 👋

Contact Me:

✉ Email: [email protected]

✧ Google Scholar: https://scholar.google.com/citations?user=DHgNKnAAAAAJ&hl=en -->

transdepth's People

Contributors

Stargazers

Watchers

transdepth's Issues

Why design latent kernel structure

Thank you for sharing this wonderful work.
In my opinion, the AGD module is used to select low-level features in the encoder to enhance the feature (encoder output), perform spatial and channel attention will be enough. Why design latent kernel mechanism, what is the motivation and actual improvement using latent kernel?

Why the AGD model does not exist in the optimizer

The Training parameters are only encoder and decoder

README for normal estimation

Thanks for this great work!
Could you provide a README for the "normal estimation" folder?
Currently, It is difficult to run the normal estimation model.

Unable to download dataset

Hello, your data set cannot be downloaded. I opened the link to the dataset and found that the link was not accessible at all. The link seems to have expired. Could you please update the link.

[Memory Address Question] How to control gpu memory usage in this code?

Thank you for your excellent work.
I encountered a CUDA out of memory error while turning your code. Perhaps I think this is a problem caused by a lack of gpu memory.
Because of this, I increased the number of num_threads in the multi-gpu part of your code and reduced the batch size, but the error still does not disappear. Do you happen to know how to control this?

Below is the full text of errors.

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/cv1/TransDepth/pytorch/bts_main.py", line 347, in main_worker
model = BtsModel(args)
File "/home/cv1/TransDepth/pytorch/bts.py", line 345, in init
self.encoder = ViT_seg(config_vit, img_size=[params.input_height,params.input_width], num_classes=config_vit.n_classes).cuda()
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 304, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 201, in _apply
module._apply(fn)
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 201, in _apply
module._apply(fn)
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 201, in _apply
module._apply(fn)
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 223, in _apply
param_applied = fn(param)
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 304, in
return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA out of memory. Tried to allocate 72.00 MiB (GPU 0; 10.76 GiB total capacity; 400.86 MiB already allocated; 66.69 MiB free; 452.00 MiB reserved in total by PyTorch)

How do I train the model on a windows machine?

I already solved several errors arising due to the code not being directly compatible with windows, but then in the train command, I am getting an error which I can't seem to fix. Can you help me out in this?

I am using Torch version: 1.8.0 and CUDA: 11.4

min-depth-eval and max-depth-eval

I want to know how do you ensure the value of the arguments "--min_depth_eval" and "--max_depth_eval".

How to get NYUv2 normal GT of training set?

Hi, thanks for your work.
I notice that you refer to the https://github.com/MARSLab-UMN/TiltedImageSurfaceNormal in the issue #11 for NYUv2 dataset. However, that project only provides the image & surface normal GT in terms of test set. I wonder how can I get the normal GT of NYUv2 training data so that dataloader can work well https://github.com/ygjwd12345/TransDepth/blob/main/surface_normal/dataset_loader/dataset_loader_nyud_2.py#L35 .

NYU Surface normal dataset

Thanks for the great work!

I have a question about the surface normal dataset: NYU. I download and unzip following the description but I cannot find the surface normal. Can you help me?

Thanks.

mkdir -p pytorch/dataset/nyu_depth_v2
python utils/download_from_gdrive.py 1AysroWpfISmm-yRFGBgFTrLy6FjQwvwP pytorch/dataset/nyu_depth_v2/sync.zip
cd pytorch/dataset/nyu_depth_v2
unzip sync.zip

Problems during training

Hello, I only have one GPU, when I try to train with NYU dataset, enter the following command

CUDA_VISIBLE_DEVICES=0 python bts_main.py arguments_train_nyu.txt

Found the following problems

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/ace/Anaconda3/envs/TransDepth/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/ace/PycharmProjects/TransDepth-main/pytorch/bts_main.py", line 439, in main_worker
var_sum = np.sum(var_sum)
File "<array_function internals>", line 6, in sum
File "/home/ace/Anaconda3/envs/TransDepth/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2248, in sum
initial=initial, where=where)
File "/home/ace/Anaconda3/envs/TransDepth/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
File "/home/ace/Anaconda3/envs/TransDepth/lib/python3.7/site-packages/torch/tensor.py", line 621, in array
return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

I found some solutions but none of them worked，How can I solve this？I sincerely look forward to your reply

Inconsistency between the text and code

Hi,

Thanks for the great work. Actually, in Fig. 2 of the paper it is written that "*" stands for convolution. For example I_{r-->r}^{i}*f_{r} in Eq. (8) means these two maps get convolved together. However, in code you just use an element-wise multiplication between these two feature maps.

My second question is about unfolding. It seems that after unfolding the input variable (

TransDepth/pytorch/AttentionGraphCondKernel.py

Line 101 in 0a7422c

    
           inputs_se_1 = unfold(inputs_se, kernel_size=3, dilation=1, padding=1).view(f_se[0], f_se[1], self.ks ** 2,

), we get an output with the same spatial size but 9 additional channels, in addition to the previous channels we are already provided. I was just wondering if the spatial content was preserved by this type of unforlding, I mean if we sample the top right corner of the spatial maps, whether all the channels are from the same spatial location in the original map.

Thanks,

Performance gap between baseline method: BTS

Hi, thanks for the great work. When I read your paper, I find: "We choose the ResNet-50 with the same prediction head as our baseline", but there are no words about the "decoder head" design, so I come to GitHub to figure it out. I find your method bases on BTS and uses its decoder:

TransDepth/pytorch/bts.py

Line 347 in 3ae116f

self.decoder = bts(params, [64, 256, 512, 1024, 2048], params.bts_size)

So, "We choose the ResNet-50 with the same prediction head as our baseline" means you replace the BTS encoder with ResNet-50, and preserve other setting the same. I recently reproduced the BTS with their official code, so I am a little bit familiar with its quantitative results. Although the result of the baseline on the NYU dataset is similar to the one reported in BTS, when it comes to the KITTI, I find that your baseline result is much lower than the one reported in BTS. As follows:

NYU: (Abs rel, RMSE, a1, a2, a3)
Your report: 0.118 0.414 0.866 0.979 0.995 (TransDepth, Table.2, Baseline)
BTS report: 0.119 0.419 0.865 0.975 0.993 (BTS, Table. 5, ResNet-50)

KITTI: (Abs rel, RMSE, a1, a2, a3)
Your report: 0.106 3.981 0.888 0.967 0.986 (TransDepth, Table.1, Baseline)
BTS report: 0.061 2.803 0.954 0.992 0.998 (BTS, Table. 6, ResNet-50)

May I ask if I misunderstood, or did you use a different setting from the BTS?

Training on a new dataset

Hello, how would one go about training on a dataset that is different from kitti or nyu? Thanks!

About the Image

when the argument "baseline = false", the image.shape=[1,224,224].
Why adopting the grey image instead of rgb image?

Kitti Pretrained download Error

Thank you for your excellent work.

Currently, if I try to download checkpoint using .sh file in your script, It doesn't work with Error404 .
Could you check the pretrained file link?

The following is the full text of the error.

Note: available models are kitti_depth, nyu_depth, and nyu_surfacenormal
Specified [kitti_dpeth]
WARNING: timestamping does nothing in combination with -O. See the manual
for details.

--2022-09-27 08:03:41-- http://disi.unitn.it/~hao.tang/uploads/models/TransDepth/kitti_dpeth_pretrained.tar.gz
Resolving disi.unitn.it (disi.unitn.it)... 193.205.194.4
Connecting to disi.unitn.it (disi.unitn.it)|193.205.194.4|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2022-09-27 08:03:42 ERROR 404: Not Found.

The download link has something wrong!

In download_from_gdrive.py file, the link "https://docs.google.com/uc?export=download1AysroWpfISmm-yRFGBgFTrLy6FjQwvwP" has something wrong. I can't have the access to the website.

The error code is 400, so maybe the link is wrong.
Thanks for your help!

Pretrained Encoder link

In the README, the pretrained encoder cmd should be:

wget https://storage.googleapis.com/vit_models/imagenet21k/R50%2BViT-B_16.npz

not the current:

wget https://storage.googleapis.com/vit_models/imagenet21k/R50-ViT-B_16.npz

Training time and GPU-Memory usage?

Hi @ygjwd12345

Thank you for sharing this amazing work,
Can you share your training time cost and GPU-Memory usage?

error

I tried to run python ’bts_test.py arguments_test_nyu.txt‘, but returned 'No module named 'bts_nyu_v2_pytorch_att''

Weights

Are the trained weights publicly available that generated the metrics in the paper? If not, can they be? Thanks!

About AGD

Excuse me, the current code does not seem to use AGD because it is commented in bts.py
def forward(self, x, focal,rank=0):
skip_feat = self.encoder(x)
# for i in range(len(skip_feat)):
# print(skip_feat[i].shape)
# skip_feat[5] = self.AttentionGraphCondKernel(skip_feat[2],skip_feat[3],skip_feat[4],skip_feat[5],rank)
return self.decoder(skip_feat, focal)
But I found that it works as good as using AGD during training. Why is that?

ygjwd12345 / transdepth Goto Github PK

transdepth's Introduction

Hi there 👋

transdepth's People

Contributors

Stargazers

Watchers

Forkers

transdepth's Issues

Recommend Projects

Recommend Topics

Recommend Org