Giter Site home page Giter Site logo

swintransformer / mim-depth-estimation Goto Github PK

View Code? Open in Web Editor NEW
159.0 159.0 11.0 508 KB

This is an official implementation of our CVPR 2023 paper "Revealing the Dark Secrets of Masked Image Modeling" on Depth Estimation.

License: MIT License

Python 100.00%

mim-depth-estimation's People

Contributors

gengzigang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mim-depth-estimation's Issues

How to predict or inference on a single figure when performing depth estimation?

I want to predict or inference on some figures instead of evaluting on NYUv2 to perform depth estimation, how should I do? I have some figures and I just hope to have a look of the performance on your model or method.

I just have a try, but I get confused when handling my own dataset(only test on dozens of figures) and meet some error.

So I wonder whether there is a convenient way or modifying least codes to do inference on a single figure? Thanks a lot.

About pretrain model

When I train according to the training script provided by the readme, I get the following information:

size mismatch for layers.0.blocks.0.attn.relative_coords_table: copying a param with shape torch.Size([1, 23, 23, 2]) from checkpoint, the shape in current model is torch.Size([1, 43, 43, 2]).
size mismatch for layers.0.blocks.0.attn.relative_position_index: copying a param with shape torch.Size([144, 144]) from checkpoint, the shape in current model is torch.Size([484, 484]).
size mismatch for layers.0.blocks.1.attn.relative_coords_table: copying a param with shape torch.Size([1, 23, 23, 2]) from checkpoint, the shape in current model is torch.Size([1, 43, 43, 2]).
size mismatch for layers.0.blocks.1.attn.relative_position_index: copying a param with shape torch.Size([144, 144]) from checkpoint, the shape in current model is torch.Size([484, 484]).
size mismatch for layers.1.blocks.0.attn.relative_coords_table: copying a param with shape torch.Size([1, 23, 23, 2]) from checkpoint, the shape in current model is torch.Size([1, 43, 43, 2]).
。。。。。。

The pre-trained model does not match the current model shape size.
How can I solve this problem?I run exactly according to your script and did not modify any code。
Thanks!

README.md miss

plz check in README.md:

Evaluate with model (KITTI Swin-Large)
...

the command is KITTI Swin-Base

hardware setup for default training parameters

hey, thanks for putting this on GitHub.
trying to reproduce the kitti results.

I was wondering what the original training hardware setup was (number of GPUs and type)?
couldn't find any information in the paper, running it on a 4090 with 24gb ram and batch size 3. default batch size seems to be 24.

load model error

When I try to load model by model = DepthAnything.from_pretrained("depth_anything_{:}14".format(args.encoder)), it occurs that

Traceback (most recent call last):
  File "./models/depth_anything/dpt.py", line 183, in <module>
    model = DepthAnything.from_pretrained("depth_anything_{:}14".format(args.encoder))
  File "/home/aokiji/anaconda3/envs/go-slam/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/aokiji/anaconda3/envs/go-slam/lib/python3.8/site-packages/huggingface_hub/hub_mixin.py", line 277, in from_pretrained
    instance = cls._from_pretrained(
  File "/home/aokiji/anaconda3/envs/go-slam/lib/python3.8/site-packages/huggingface_hub/hub_mixin.py", line 485, in _from_pretrained
    model = cls(**model_kwargs)
TypeError: __init__() missing 1 required positional argument: 'config'

I'm sure that the pth in the right path.

How to pretrain models

Hi, thank you very much, this repository looks great!
Can you provide some more information (and perhaps a training script) to generate your pre-trained models using SimMIM?

About NYUv2 Results

image

For NYUv2, we obtain abs_rel about 0.09XX.
I think "0.044" for NYUv2 corresponds to the sq_rel, not the abs_rel.
Please check these indices.

I think the KITTI results are correct.

about retrain models

Hello,

I have a question regarding the pretrained models.

What's the difference between swin_v2_base_simmim.pth and kitti_swin_base.pth?
I want to use the swin_mim pretrained model on kitti dataset. In this case, which one do I have to use?

Thank you.

about metric

I trained my own dataset. But I got the below message.

image

So I check the image size and ground-truth value.
when I check the iamge size 3011X4008 and the ground truth min : 0.0 the ground turth max : about 730.0

and my command line is like below:

python3 train.py --dataset my_data --max_depth 730.0 --max_depth_eval 730.0 --data_path ../data/ --backbone swin_large_v2 --depths 2 2 18 2 --num_filters 32 32 32 --deconv_kernels 2 2 2 --window_size 30 30 30 15 --pretrain_window_size 12 12 12 6 --use_shift True True False False --flip_test --shift_window_test --shift_size 2 --pretrained weights/swin_v2_large_simmim.pth --save_model --crop_h 480 --crop_w 480 --layer_decay 0.85 --drop_path_rate 0.5 --log_dir logs/ --save_result

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.