swintransformer / ait Goto Github PK

Python 100.00%

ait's Issues

Training time

Hi, interesting work! Can you share the approximate time to train the VQVAE and the task solver on both tasks? Thanks!

Thank you for releasing your code! I am wondering if you happen to have any pre-trained checkpoints for Swin-S and Swin-Ti? or even just the ImageNet-1k weights. The ImageNet-1k pre-trained weights would be more preferable, as I can't seem to find these released anywhere with matching sizes.

Thanks!

There is a bug in dataset maybe. Might cause over-fitting maybe.

Thanks for yours sharing.

    transform = [
        A.Crop(x_min=41, y_min=0, x_max=601, y_max=480),
        A.HorizontalFlip(),
        A.RandomCrop(crop_size[0], crop_size[1]),
    ]

In dataset./nyudepthv2.py , i found you cropped image to (480,480)[fixed region], after that a randomcrop is used.
Maybe albumentations could change the transform sequence？
I am not sure.

Small typo

Hi, great work! I just noticed a small typo :
In the inference section of the readme, the supposedly <model_checkpoint> is written <model_checkpiont>

Unable to evaluate the results

Hello,

I am trying to run these models to evaluate the results, however I am not able to do that due to errors at runtime.

The best "result" I could get is by with this Dockerfile (at the root of the project):

FROM nvidia/cuda:11.4.3-cudnn8-devel-ubuntu18.04

ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=Etc/UTC
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

# Install system dependencies
RUN apt-get update && \
    apt-get install -y \
    git \
    wget \
    python3-pip \
    python3-dev \
    python3-opencv \
    python3-six

RUN python3 -m pip install --upgrade pip

RUN pip3 install setuptools openmim

# Install PyTorch and torchvision
RUN pip3 install torch torchvision torchaudio -f https://download.pytorch.org/whl/cu111/torch_stable.html
RUN python3 -m pip install h5py albumentations tensorboardX gdown scipy

RUN python3 -m mim install mmcv

# Upgrade pip

WORKDIR /

RUN wget http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat -O nyu_depth_v2_labeled.mat

RUN git clone https://github.com/vinvino02/GLPDepth.git --depth 1

RUN mv GLPDepth/code/utils/logging.py GLPDepth/code/utils/glp_depth_logging.py


# Set the working directory
WORKDIR /app


RUN python3 ../GLPDepth/code/utils/extract_official_train_test_set_from_mat.py ../nyu_depth_v2_labeled.mat ../GLPDepth/datasets/splits.mat ./data/nyu_depth_v2/official_splits/


# RUN ln -s data ait/data


COPY requirements.txt requirements.txt

RUN python3 -m pip install -r requirements.txt

COPY . .

RUN rm -rf .git

Built the Dockerfile with:

sudo docker build -t mde . -f Dockerfile

And run with:

sudo docker run --name mde-test --gpus all --ipc=host -it --rm -v $(pwd):/app mde

Finally running the evaluation command. For example:

cd ait
python3 -m torch.distributed.launch --nproc_per_node=1 code/train.py configs/swinv2b_480reso_parallel_depthonly.py  --cfg-options model.task_heads.depth.vae_cfg.pretrained=../models/vqvae_depth_2bp.pt --eval ../models/ait_depth_swinv2b_parallel.pth

In this way, the inference process is launched, eventually an anonymous error happen:

eval task depth
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 654/654, 2.5 task/s, elapsed: 262s, ETA:     0sERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 34) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/run.py", line 713, in run
    )(*cmd_args)
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
    failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
===================================================
code/train.py FAILED
---------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
---------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-08-26_03:01:18
  host      : f50427e7ad50
  rank      : 0 (local_rank: 0)
  exitcode  : -9 (pid: 34)
  error_file: <N/A>
  traceback : Signal 9 (SIGKILL) received by PID 34
===================================================

Are the authors able to provide the versions of all the software they are using? In particular:

Linux version and distribution
CUDA version
Python version
Packages version (in the requirements, some versions are missing)
Any other relevant information about

Thanks.

denorm twice in eval_coco.py

Hello! I find that /vae/utils/eval_coco.py denorm the reconstruction image twice in line 45.

if hasattr(vae, 'get_codebook_indices'):
                code = vae.get_codebook_indices(mask)
                remask = vae.decode(code)[0, 0, :, :].cpu().numpy() * 0.5 + 0.5 # why denorm here?

because in class func decode, the attr use_norm is True, so decode will denorm the image, but the code denorm after decodeing.
I will try to investigate the effect when evaluating.

Single Image Inference

How can i perform inferencing with my custom set of images? What changes do I need to do for data pre processing? Do I need to change val dict under data in AiT/ait/configs/swinv2b_480reso_depthonly.py ?

a

Ask for models and data

The model weights and data you released are inaccessible. Can you please make these weights and data publicly available again? Very much looking forward to your response!

train/visualize on single GPU

Hello!
I am trying to evaluate it by one GPU,but found a lot of errors.
I am new in these,do you have the code for a single GPU?
Best wishes

RuntimeError: The size of tensor a (256) must match the size of tensor b (225) at non-singleton dimension 1

Dear author:
Thanks for your meaning work. During inference, I met 'RuntimeError: The size of tensor a (256) must match the size of tensor b (225) at non-singleton dimension 1'.

Some problem with visualizing the depth of pred and gt.

Thanks for your work. I meet some problems with visualizing the depth of pred and gt. Here is the location to visualize them in

AiT/ait/code/model/depth/depth.py

Lines 157 to 159 in ca2c2d1

    
           for pred_d, depth_gt in results: 
        
               pred_crop, gt_crop = cropping_img(pred_d, depth_gt) 
        
               computed_result = eval_depth(pred_crop, gt_crop)

    for pred_d, depth_gt in results:
        '''visualize 'pred_d'''
        pred_crop, gt_crop = cropping_img(pred_d, depth_gt)
         ''' After reshaping, visualize 'pred_crop, gt_crop'''
        computed_result = eval_depth(pred_crop, gt_crop)

this is cmd:
CUDA_VISIBLE_DEVICES=5,6,7 python -m torch.distributed.launch --nproc_per_node=3 code/train.py configs/swinv2b_480reso_depthonly.py --cfg-options model.task_heads.depth.vae_cfg.pretrained=vqvae_depth.pt --eval ait_joint_swinv2b.pth

However, the results of pred_d,pred_crop and gt_crop are very similar. The results of them are like this picture[The picture is almost white]

'PublicAccessNotPermitted' when download the checkpoints

Hi, thank you for the excellent work!
I come across some troubles when I download the checkpoints using wget, it raises an error 'PublicAccessNotPermitted'. I would like to know how to download them properly, especially the pre-trained backbone models.
Thank you in advance!

Error(s) in loading state_dict for VQVAE

Thank you for your nice work!
However, after training VA-VAE on depth estimation, I tried to train task-solver on depth estimation, the following error comes out:

Error(s) in loading state_dict for VQVAE:
        Missing key(s) in state_dict: "encoder.0.weight", "encoder.0.bias", "encoder.2.weight", "encoder.2.bias", "encoder.4.weight", "encoder.4.bias", "encoder.6.weight", "encoder.6.bias", "encoder.8.weight", "encoder.8.bias", "encoder.10.net.0.weight", "encoder.10.net.0.bias", "encoder.10.net.2.weight", "encoder.10.net.2.bias", "encoder.10.net.4.weight", "encoder.10.net.4.bias", "encoder.11.net.0.weight", "encoder.11.net.0.bias", "encoder.11.net.2.weight", "encoder.11.net.2.bias", "encoder.11.net.4.weight", "encoder.11.net.4.bias", "encoder.12.weight", "encoder.12.bias", "decoder.0.weight", "decoder.0.bias", "decoder.2.net.0.weight", "decoder.2.net.0.bias", "decoder.2.net.2.weight", "decoder.2.net.2.bias", "decoder.2.net.4.weight", "decoder.2.net.4.bias", "decoder.3.net.0.weight", "decoder.3.net.0.bias", "decoder.3.net.2.weight", "decoder.3.net.2.bias", "decoder.3.net.4.weight", "decoder.3.net.4.bias", "decoder.4.weight", "decoder.4.bias", "decoder.6.weight", "decoder.6.bias", "decoder.8.weight", "decoder.8.bias", "decoder.10.weight", "decoder.10.bias", "decoder.12.weight", "decoder.12.bias", "decoder.14.weight", "decoder.14.bias", "_vq_vae._embedding", "_vq_vae._ema_cluster_size", "_vq_vae._ema_w".

How can I solve it? Thank you.

swintransformer / ait Goto Github PK

ait's Issues

Training time

Swin-S and Swin-Ti weights

There is a bug in dataset maybe. Might cause over-fitting maybe.

Small typo

Unable to evaluate the results

denorm twice in eval_coco.py

Single Image Inference

a

Ask for models and data

train/visualize on single GPU

RuntimeError: The size of tensor a (256) must match the size of tensor b (225) at non-singleton dimension 1

Some problem with visualizing the depth of pred and gt.

'PublicAccessNotPermitted' when download the checkpoints

Error(s) in loading state_dict for VQVAE

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	for pred_d, depth_gt in results:
	pred_crop, gt_crop = cropping_img(pred_d, depth_gt)
	computed_result = eval_depth(pred_crop, gt_crop)