Giter Site home page Giter Site logo

transdepth's Introduction

transdepth's People

Contributors

ha0tang avatar ygjwd12345 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

transdepth's Issues

Why design latent kernel structure

Thank you for sharing this wonderful work.
In my opinion, the AGD module is used to select low-level features in the encoder to enhance the feature (encoder output), perform spatial and channel attention will be enough. Why design latent kernel mechanism, what is the motivation and actual improvement using latent kernel?

README for normal estimation

Thanks for this great work!
Could you provide a README for the "normal estimation" folder?
Currently, It is difficult to run the normal estimation model.

Unable to download dataset

Hello, your data set cannot be downloaded. I opened the link to the dataset and found that the link was not accessible at all. The link seems to have expired. Could you please update the link.

[Memory Address Question] How to control gpu memory usage in this code?

Thank you for your excellent work.
I encountered a CUDA out of memory error while turning your code. Perhaps I think this is a problem caused by a lack of gpu memory.
Because of this, I increased the number of num_threads in the multi-gpu part of your code and reduced the batch size, but the error still does not disappear. Do you happen to know how to control this?

Below is the full text of errors.

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/cv1/TransDepth/pytorch/bts_main.py", line 347, in main_worker
model = BtsModel(args)
File "/home/cv1/TransDepth/pytorch/bts.py", line 345, in init
self.encoder = ViT_seg(config_vit, img_size=[params.input_height,params.input_width], num_classes=config_vit.n_classes).cuda()
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 304, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 201, in _apply
module._apply(fn)
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 201, in _apply
module._apply(fn)
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 201, in _apply
module._apply(fn)
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 223, in _apply
param_applied = fn(param)
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 304, in
return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA out of memory. Tried to allocate 72.00 MiB (GPU 0; 10.76 GiB total capacity; 400.86 MiB already allocated; 66.69 MiB free; 452.00 MiB reserved in total by PyTorch)

How do I train the model on a windows machine?

I already solved several errors arising due to the code not being directly compatible with windows, but then in the train command, I am getting an error which I can't seem to fix. Can you help me out in this?
image

I am using Torch version: 1.8.0 and CUDA: 11.4

NYU Surface normal dataset

Thanks for the great work!

I have a question about the surface normal dataset: NYU. I download and unzip following the description but I cannot find the surface normal. Can you help me?

Thanks.

mkdir -p pytorch/dataset/nyu_depth_v2
python utils/download_from_gdrive.py 1AysroWpfISmm-yRFGBgFTrLy6FjQwvwP pytorch/dataset/nyu_depth_v2/sync.zip
cd pytorch/dataset/nyu_depth_v2
unzip sync.zip

Problems during training

Hello, I only have one GPU, when I try to train with NYU dataset, enter the following command

CUDA_VISIBLE_DEVICES=0 python bts_main.py arguments_train_nyu.txt

Found the following problems

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/ace/Anaconda3/envs/TransDepth/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/ace/PycharmProjects/TransDepth-main/pytorch/bts_main.py", line 439, in main_worker
var_sum = np.sum(var_sum)
File "<array_function internals>", line 6, in sum
File "/home/ace/Anaconda3/envs/TransDepth/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2248, in sum
initial=initial, where=where)
File "/home/ace/Anaconda3/envs/TransDepth/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
File "/home/ace/Anaconda3/envs/TransDepth/lib/python3.7/site-packages/torch/tensor.py", line 621, in array
return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

I found some solutions but none of them worked,How can I solve this?I sincerely look forward to your reply

Inconsistency between the text and code

Hi,

Thanks for the great work. Actually, in Fig. 2 of the paper it is written that "*" stands for convolution. For example I_{r-->r}^{i}*f_{r} in Eq. (8) means these two maps get convolved together. However, in code you just use an element-wise multiplication between these two feature maps.

My second question is about unfolding. It seems that after unfolding the input variable (

inputs_se_1 = unfold(inputs_se, kernel_size=3, dilation=1, padding=1).view(f_se[0], f_se[1], self.ks ** 2,
), we get an output with the same spatial size but 9 additional channels, in addition to the previous channels we are already provided. I was just wondering if the spatial content was preserved by this type of unforlding, I mean if we sample the top right corner of the spatial maps, whether all the channels are from the same spatial location in the original map.

Thanks,

Performance gap between baseline method: BTS

Hi, thanks for the great work. When I read your paper, I find: "We choose the ResNet-50 with the same prediction head as our baseline", but there are no words about the "decoder head" design, so I come to GitHub to figure it out. I find your method bases on BTS and uses its decoder:

self.decoder = bts(params, [64, 256, 512, 1024, 2048], params.bts_size)

So, "We choose the ResNet-50 with the same prediction head as our baseline" means you replace the BTS encoder with ResNet-50, and preserve other setting the same. I recently reproduced the BTS with their official code, so I am a little bit familiar with its quantitative results. Although the result of the baseline on the NYU dataset is similar to the one reported in BTS, when it comes to the KITTI, I find that your baseline result is much lower than the one reported in BTS. As follows:

NYU: (Abs rel, RMSE, a1, a2, a3)
Your report: 0.118 0.414 0.866 0.979 0.995 (TransDepth, Table.2, Baseline)
BTS report: 0.119 0.419 0.865 0.975 0.993 (BTS, Table. 5, ResNet-50)

KITTI: (Abs rel, RMSE, a1, a2, a3)
Your report: 0.106 3.981 0.888 0.967 0.986 (TransDepth, Table.1, Baseline)
BTS report: 0.061 2.803 0.954 0.992 0.998 (BTS, Table. 6, ResNet-50)

May I ask if I misunderstood, or did you use a different setting from the BTS?

About the Image

when the argument "baseline = false", the image.shape=[1,224,224].
Why adopting the grey image instead of rgb image?

Kitti Pretrained download Error

Thank you for your excellent work.

Currently, if I try to download checkpoint using .sh file in your script, It doesn't work with Error404 .
Could you check the pretrained file link?

The following is the full text of the error.

Note: available models are kitti_depth, nyu_depth, and nyu_surfacenormal
Specified [kitti_dpeth]
WARNING: timestamping does nothing in combination with -O. See the manual
for details.

--2022-09-27 08:03:41-- http://disi.unitn.it/~hao.tang/uploads/models/TransDepth/kitti_dpeth_pretrained.tar.gz
Resolving disi.unitn.it (disi.unitn.it)... 193.205.194.4
Connecting to disi.unitn.it (disi.unitn.it)|193.205.194.4|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2022-09-27 08:03:42 ERROR 404: Not Found.

Pretrained Encoder link

In the README, the pretrained encoder cmd should be:

wget https://storage.googleapis.com/vit_models/imagenet21k/R50%2BViT-B_16.npz

not the current:

wget https://storage.googleapis.com/vit_models/imagenet21k/R50-ViT-B_16.npz

error

I tried to run python ’bts_test.py arguments_test_nyu.txt‘, but returned 'No module named 'bts_nyu_v2_pytorch_att''

Weights

Are the trained weights publicly available that generated the metrics in the paper? If not, can they be? Thanks!

About AGD

Excuse me, the current code does not seem to use AGD because it is commented in bts.py
def forward(self, x, focal,rank=0):
skip_feat = self.encoder(x)
# for i in range(len(skip_feat)):
# print(skip_feat[i].shape)
# skip_feat[5] = self.AttentionGraphCondKernel(skip_feat[2],skip_feat[3],skip_feat[4],skip_feat[5],rank)
return self.decoder(skip_feat, focal)
But I found that it works as good as using AGD during training. Why is that?
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.