swintransformer / mim-depth-estimation Goto Github PK

This is an official implementation of our CVPR 2023 paper "Revealing the Dark Secrets of Masked Image Modeling" on Depth Estimation.

License: MIT License

Python 100.00%

mim-depth-estimation's People

Contributors

Stargazers

Watchers

Forkers

josaklil-ai javazeroo landiaokafeiyan saidineshpola cv-depth di3-c4y kmbmjn sumitbhimte advance-deep-learning 7yzx pardistaghavi

mim-depth-estimation's Issues

How to predict or inference on a single figure when performing depth estimation?

I want to predict or inference on some figures instead of evaluting on NYUv2 to perform depth estimation, how should I do? I have some figures and I just hope to have a look of the performance on your model or method.

I just have a try, but I get confused when handling my own dataset(only test on dozens of figures) and meet some error.

So I wonder whether there is a convenient way or modifying least codes to do inference on a single figure? Thanks a lot.

About pretrain model

When I train according to the training script provided by the readme, I get the following information:

size mismatch for layers.0.blocks.0.attn.relative_coords_table: copying a param with shape torch.Size([1, 23, 23, 2]) from checkpoint, the shape in current model is torch.Size([1, 43, 43, 2]).
size mismatch for layers.0.blocks.0.attn.relative_position_index: copying a param with shape torch.Size([144, 144]) from checkpoint, the shape in current model is torch.Size([484, 484]).
size mismatch for layers.0.blocks.1.attn.relative_coords_table: copying a param with shape torch.Size([1, 23, 23, 2]) from checkpoint, the shape in current model is torch.Size([1, 43, 43, 2]).
size mismatch for layers.0.blocks.1.attn.relative_position_index: copying a param with shape torch.Size([144, 144]) from checkpoint, the shape in current model is torch.Size([484, 484]).
size mismatch for layers.1.blocks.0.attn.relative_coords_table: copying a param with shape torch.Size([1, 23, 23, 2]) from checkpoint, the shape in current model is torch.Size([1, 43, 43, 2]).
。。。。。。

The pre-trained model does not match the current model shape size.
How can I solve this problem？I run exactly according to your script and did not modify any code。
Thanks！

Training with model (KITTI Swin-Large) ，why the backbone uses swin_base_v2 ,why not swin_large_v2 ?

When evaluate with model with Swin-Base and Swin-Large , i found that the command uses same ckpt_dir " --ckpt_dir ckpt/kitti_swin_base.ckpt" and the "--backbone" still swin_base_v2. It's reasonable？

README.md miss

plz check in README.md:

Evaluate with model (KITTI Swin-Large)
...

the command is KITTI Swin-Base

hardware setup for default training parameters

hey, thanks for putting this on GitHub.
trying to reproduce the kitti results.

I was wondering what the original training hardware setup was (number of GPUs and type)?
couldn't find any information in the paper, running it on a 4090 with 24gb ram and batch size 3. default batch size seems to be 24.

load model error

When I try to load model by model = DepthAnything.from_pretrained("depth_anything_{:}14".format(args.encoder)), it occurs that

Traceback (most recent call last):
  File "./models/depth_anything/dpt.py", line 183, in <module>
    model = DepthAnything.from_pretrained("depth_anything_{:}14".format(args.encoder))
  File "/home/aokiji/anaconda3/envs/go-slam/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/aokiji/anaconda3/envs/go-slam/lib/python3.8/site-packages/huggingface_hub/hub_mixin.py", line 277, in from_pretrained
    instance = cls._from_pretrained(
  File "/home/aokiji/anaconda3/envs/go-slam/lib/python3.8/site-packages/huggingface_hub/hub_mixin.py", line 485, in _from_pretrained
    model = cls(**model_kwargs)
TypeError: __init__() missing 1 required positional argument: 'config'

I'm sure that the pth in the right path.

How to pretrain models

Hi, thank you very much, this repository looks great!
Can you provide some more information (and perhaps a training script) to generate your pre-trained models using SimMIM?

About NYUv2 Results

For NYUv2, we obtain abs_rel about 0.09XX.
I think "0.044" for NYUv2 corresponds to the sq_rel, not the abs_rel.
Please check these indices.

I think the KITTI results are correct.

SwinV2-B is recorded high in the KITTI Eigen split.

When evaluated with the same parameters, all other results matched except for the RMSE, which measured 2.030 in the KITTI Eigen split.
Thank you.

about retrain models

Hello,

I have a question regarding the pretrained models.

What's the difference between swin_v2_base_simmim.pth and kitti_swin_base.pth?
I want to use the swin_mim pretrained model on kitti dataset. In this case, which one do I have to use?

Thank you.

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Hi there,

Thanks for your excellent work. I have this problem when I train and test your code. Do you have any idea what is wrong? Since I find that the data and model are all in cuda.

Thanks in advance!

about metric

I trained my own dataset. But I got the below message.

So I check the image size and ground-truth value.
when I check the iamge size 3011X4008 and the ground truth min : 0.0 the ground turth max : about 730.0

and my command line is like below:

python3 train.py --dataset my_data --max_depth 730.0 --max_depth_eval 730.0 --data_path ../data/ --backbone swin_large_v2 --depths 2 2 18 2 --num_filters 32 32 32 --deconv_kernels 2 2 2 --window_size 30 30 30 15 --pretrain_window_size 12 12 12 6 --use_shift True True False False --flip_test --shift_window_test --shift_size 2 --pretrained weights/swin_v2_large_simmim.pth --save_model --crop_h 480 --crop_w 480 --layer_decay 0.85 --drop_path_rate 0.5 --log_dir logs/ --save_result