Contact Me:
✉ Email: [email protected]
✧ Google Scholar: https://scholar.google.com/citations?user=DHgNKnAAAAAJ&hl=en -->
Code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction
License: MIT License
Contact Me:
✉ Email: [email protected]
✧ Google Scholar: https://scholar.google.com/citations?user=DHgNKnAAAAAJ&hl=en -->
Thank you for sharing this wonderful work.
In my opinion, the AGD module is used to select low-level features in the encoder to enhance the feature (encoder output), perform spatial and channel attention will be enough. Why design latent kernel mechanism, what is the motivation and actual improvement using latent kernel?
The Training parameters are only encoder and decoder
Thanks for this great work!
Could you provide a README for the "normal estimation" folder?
Currently, It is difficult to run the normal estimation model.
Hello, your data set cannot be downloaded. I opened the link to the dataset and found that the link was not accessible at all. The link seems to have expired. Could you please update the link.
Thank you for your excellent work.
I encountered a CUDA out of memory error while turning your code. Perhaps I think this is a problem caused by a lack of gpu memory.
Because of this, I increased the number of num_threads in the multi-gpu part of your code and reduced the batch size, but the error still does not disappear. Do you happen to know how to control this?
Below is the full text of errors.
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/cv1/TransDepth/pytorch/bts_main.py", line 347, in main_worker
model = BtsModel(args)
File "/home/cv1/TransDepth/pytorch/bts.py", line 345, in init
self.encoder = ViT_seg(config_vit, img_size=[params.input_height,params.input_width], num_classes=config_vit.n_classes).cuda()
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 304, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 201, in _apply
module._apply(fn)
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 201, in _apply
module._apply(fn)
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 201, in _apply
module._apply(fn)
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 223, in _apply
param_applied = fn(param)
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 304, in
return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA out of memory. Tried to allocate 72.00 MiB (GPU 0; 10.76 GiB total capacity; 400.86 MiB already allocated; 66.69 MiB free; 452.00 MiB reserved in total by PyTorch)
I want to know how do you ensure the value of the arguments "--min_depth_eval" and "--max_depth_eval".
Hi, thanks for your work.
I notice that you refer to the https://github.com/MARSLab-UMN/TiltedImageSurfaceNormal in the issue #11 for NYUv2 dataset. However, that project only provides the image & surface normal GT in terms of test set. I wonder how can I get the normal GT of NYUv2 training data so that dataloader can work well https://github.com/ygjwd12345/TransDepth/blob/main/surface_normal/dataset_loader/dataset_loader_nyud_2.py#L35 .
Thanks for the great work!
I have a question about the surface normal dataset: NYU. I download and unzip following the description but I cannot find the surface normal. Can you help me?
Thanks.
mkdir -p pytorch/dataset/nyu_depth_v2
python utils/download_from_gdrive.py 1AysroWpfISmm-yRFGBgFTrLy6FjQwvwP pytorch/dataset/nyu_depth_v2/sync.zip
cd pytorch/dataset/nyu_depth_v2
unzip sync.zip
Hello, I only have one GPU, when I try to train with NYU dataset, enter the following command
CUDA_VISIBLE_DEVICES=0 python bts_main.py arguments_train_nyu.txt
Found the following problems
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/ace/Anaconda3/envs/TransDepth/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/ace/PycharmProjects/TransDepth-main/pytorch/bts_main.py", line 439, in main_worker
var_sum = np.sum(var_sum)
File "<array_function internals>", line 6, in sum
File "/home/ace/Anaconda3/envs/TransDepth/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2248, in sum
initial=initial, where=where)
File "/home/ace/Anaconda3/envs/TransDepth/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
File "/home/ace/Anaconda3/envs/TransDepth/lib/python3.7/site-packages/torch/tensor.py", line 621, in array
return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
I found some solutions but none of them worked,How can I solve this?I sincerely look forward to your reply
Hi,
Thanks for the great work. Actually, in Fig. 2 of the paper it is written that "*" stands for convolution. For example I_{r-->r}^{i}*f_{r} in Eq. (8) means these two maps get convolved together. However, in code you just use an element-wise multiplication between these two feature maps.
My second question is about unfolding. It seems that after unfolding the input variable (
), we get an output with the same spatial size but 9 additional channels, in addition to the previous channels we are already provided. I was just wondering if the spatial content was preserved by this type of unforlding, I mean if we sample the top right corner of the spatial maps, whether all the channels are from the same spatial location in the original map.Thanks,
Hi, thanks for the great work. When I read your paper, I find: "We choose the ResNet-50 with the same prediction head as our baseline", but there are no words about the "decoder head" design, so I come to GitHub to figure it out. I find your method bases on BTS and uses its decoder:
Line 347 in 3ae116f
So, "We choose the ResNet-50 with the same prediction head as our baseline" means you replace the BTS encoder with ResNet-50, and preserve other setting the same. I recently reproduced the BTS with their official code, so I am a little bit familiar with its quantitative results. Although the result of the baseline on the NYU dataset is similar to the one reported in BTS, when it comes to the KITTI, I find that your baseline result is much lower than the one reported in BTS. As follows:
NYU: (Abs rel, RMSE, a1, a2, a3)
Your report: 0.118 0.414 0.866 0.979 0.995 (TransDepth, Table.2, Baseline)
BTS report: 0.119 0.419 0.865 0.975 0.993 (BTS, Table. 5, ResNet-50)
KITTI: (Abs rel, RMSE, a1, a2, a3)
Your report: 0.106 3.981 0.888 0.967 0.986 (TransDepth, Table.1, Baseline)
BTS report: 0.061 2.803 0.954 0.992 0.998 (BTS, Table. 6, ResNet-50)
May I ask if I misunderstood, or did you use a different setting from the BTS?
Hello, how would one go about training on a dataset that is different from kitti or nyu? Thanks!
when the argument "baseline = false", the image.shape=[1,224,224].
Why adopting the grey image instead of rgb image?
Thank you for your excellent work.
Currently, if I try to download checkpoint using .sh file in your script, It doesn't work with Error404 .
Could you check the pretrained file link?
The following is the full text of the error.
Note: available models are kitti_depth, nyu_depth, and nyu_surfacenormal
Specified [kitti_dpeth]
WARNING: timestamping does nothing in combination with -O. See the manual
for details.
--2022-09-27 08:03:41-- http://disi.unitn.it/~hao.tang/uploads/models/TransDepth/kitti_dpeth_pretrained.tar.gz
Resolving disi.unitn.it (disi.unitn.it)... 193.205.194.4
Connecting to disi.unitn.it (disi.unitn.it)|193.205.194.4|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2022-09-27 08:03:42 ERROR 404: Not Found.
In download_from_gdrive.py file, the link "https://docs.google.com/uc?export=download1AysroWpfISmm-yRFGBgFTrLy6FjQwvwP" has something wrong. I can't have the access to the website.
The error code is 400, so maybe the link is wrong.
Thanks for your help!
In the README, the pretrained encoder cmd should be:
wget https://storage.googleapis.com/vit_models/imagenet21k/R50%2BViT-B_16.npz
not the current:
wget https://storage.googleapis.com/vit_models/imagenet21k/R50-ViT-B_16.npz
Hi @ygjwd12345
Thank you for sharing this amazing work,
Can you share your training time cost and GPU-Memory usage?
I tried to run python ’bts_test.py arguments_test_nyu.txt‘, but returned 'No module named 'bts_nyu_v2_pytorch_att''
Are the trained weights publicly available that generated the metrics in the paper? If not, can they be? Thanks!
Excuse me, the current code does not seem to use AGD because it is commented in bts.py
def forward(self, x, focal,rank=0):
skip_feat = self.encoder(x)
# for i in range(len(skip_feat)):
# print(skip_feat[i].shape)
# skip_feat[5] = self.AttentionGraphCondKernel(skip_feat[2],skip_feat[3],skip_feat[4],skip_feat[5],rank)
return self.decoder(skip_feat, focal)
But I found that it works as good as using AGD during training. Why is that?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.