swintransformer / swin-transformer-semantic-segmentation Goto Github PK

This project forked from open-mmlab/mmsegmentation

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Semantic Segmentation.

Home Page: https://arxiv.org/abs/2103.14030

License: Apache License 2.0

Python 99.74% Dockerfile 0.08% Shell 0.18%

ade20k semantic-segmentation swin-transformer upernet

swin-transformer-semantic-segmentation's People

Contributors

Stargazers

Watchers

Forkers

outbreak-hui zhidaole xiliu ycszen hhsummerwind samleoqh kinraymon boosting cvlife cydia2018 ayanamireifan caojinpei aelbakry spideralanken cheng-01037 dkswxd ygxxcmbs zhengsx zhaojingyao wuzhenyubuaa wahyurahmaniar shijun18 jinzhangyu sammica lsabrinax lifunudt andreweros uyoung-jeong stonegiggity liuguangjin98 jegernoutt jimmy-zhu xusanpangzi memari-majid stephenyan1231 kartikwar mbonyani 123zbt yl949 zfxu frankocar laipixiong lingorx xueliancheng zyxu1996 bashirkazimi kentaroy47 mingyangzhang77 chkswiftly xieenze swoook shujunyy123 ngfuong mark1dong jztd6676 pengyuange jdc08161063 pinglmlcv lianhui1993 kuazhangxiaoai ao-123 carlosliu25 yangsenwxy tor4z samjcheng koechslin xautdestiny se122811 giorgiozannini bboyhanat anukritisinghh 574411705 zzzhoudj stat-eklee gaolii yujxzjcn anhvth big-chan whh14 wr19960001 ppgod95 hujacobjiabao mxeatsomuch standingbychen bilibulu1 lujia-ai linbaba222 miaochunle lv-tuan ucasligang chupeixie kkahatapitiya nobelvictory lovaya leekuo1990 creater-zq absf123 aicaicaicai dariush-bahrami junjie2008v

swin-transformer-semantic-segmentation's Issues

the value of "embed_dim" in Linear Embeding

Excuse me, does the value of "embed_dim" in Linear Embeding have to use 96, 128 and 192? Can it be changed to other values, such as 32 ?

KeyError: 'SwinTransformer is not in the models registry'

getting this error
KeyError: 'SwinTransformer is not in the models registry'

on running the command
python tools/test.py configs/swin/upernet_swin_base_patch4_window7_512x512_160k_ade20k.py models/upernet_swin_base_patch4_window7_512x512.pth --eval mIoU

Can someone help ?

AssertionError:Default process group is not initialized

when i use one gpu to train, it happened one question: AssertionError: Default process group is not initialized?
command:python3 ./tools/train.py configs/swin/upernet_swin_base_patch4_window7_512x512_106k_ade20k.py

Cityscapes Swin-S miou only 56

How u tested on cityscapes? Seems miou not so good. Training from scratch.

Runtime Error

Thanks for your error report and we appreciate it a lot.

Checklist

I have searched related issues but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug
A clear and concise description of what the bug is.

Reproduction

What command or script did you run?

python -u ./tools/train.py configs/swin/upernet_swin_tiny_patch4_window7_512x512_160k_ade20k.py

Did you make any modifications on the code or config? Did you understand what you have modified?
use mmcv==1.3.9
What dataset did you use?
ADEChallengeData2016
Environment
Please run python mmseg/utils/collect_env.py to collect necessary environment infomation and paste it here.
sys.platform: linux
Python: 3.8.2 (default, Mar 26 2020, 15:53:00) [GCC 7.3.0]
CUDA available: True
GPU 0: GeForce RTX 3090
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.2.r11.2/compiler.29618528_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.9.0
PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2021.2-Product Build 20210312 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.0.5
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.10.0
OpenCV: 4.5.3
MMCV: 1.3.9
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.1
MMSegmentation: 0.11.0+

2021-08-13 16:43:11,382 - mmseg - INFO - Distributed training: False
2021-08-13 16:43:11,711 - mmseg - INFO - Config:
norm_cfg = dict(type='SyncBN', requires_grad=True)

You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source]
- Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback

If applicable, paste the error trackback here.

Traceback (most recent call last):
  File "/home/wyh/Codes/SwinTransformer/tools/train.py", line 163, in <module>
    main()
  File "/home/wyh/Codes/SwinTransformer/tools/train.py", line 152, in main
    train_segmentor(
  File "/home/wyh/Codes/SwinTransformer/mmseg/apis/train.py", line 116, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "/home/wyh/anaconda3/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 133, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/home/wyh/anaconda3/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train
    outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
  File "/home/wyh/anaconda3/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/home/wyh/Codes/SwinTransformer/mmseg/models/segmentors/base.py", line 152, in train_step
    losses = self(**data_batch)
  File "/home/wyh/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/wyh/anaconda3/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 98, in new_func
    return old_func(*args, **kwargs)
  File "/home/wyh/Codes/SwinTransformer/mmseg/models/segmentors/base.py", line 122, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/home/wyh/Codes/SwinTransformer/mmseg/models/segmentors/encoder_decoder.py", line 157, in forward_train
    loss_decode = self._decode_head_forward_train(x, img_metas,
  File "/home/wyh/Codes/SwinTransformer/mmseg/models/segmentors/encoder_decoder.py", line 100, in _decode_head_forward_train
    loss_decode = self.decode_head.forward_train(x, img_metas,
  File "/home/wyh/Codes/SwinTransformer/mmseg/models/decode_heads/decode_head.py", line 186, in forward_train
    seg_logits = self.forward(inputs)
  File "/home/wyh/Codes/SwinTransformer/mmseg/models/decode_heads/uper_head.py", line 92, in forward
    laterals = [
  File "/home/wyh/Codes/SwinTransformer/mmseg/models/decode_heads/uper_head.py", line 93, in <listcomp>
    lateral_conv(inputs[i])
  File "/home/wyh/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/wyh/anaconda3/lib/python3.8/site-packages/mmcv/cnn/bricks/conv_module.py", line 200, in forward
    x = self.norm(x)
  File "/home/wyh/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/wyh/anaconda3/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 731, in forward
    world_size = torch.distributed.get_world_size(process_group)
  File "/home/wyh/anaconda3/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 748, in get_world_size
    return _get_group_size(group)
  File "/home/wyh/anaconda3/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 274, in _get_group_size
    default_pg = _get_default_group()
  File "/home/wyh/anaconda3/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 358, in _get_default_group
    raise RuntimeError("Default process group has not been initialized, "
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

Bug fix

If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

The gradient is not updated after each epoch

The gradient is not updated after each epoch.

Can the Swin-Transformer-Semantic-Segmentation support mixed precision training?

I've tried to add the following code
optimizer_config = dict(type='Fp16OptimizerHook', loss_scale=512.) fp16 = dict()
to the config file, but I got this error message:
RuntimeError: Expected object of scalar type Float but got scalar type Half for argument #2 'mat2' in call to _th_bmm_out.

I wonder if the SwinTransformer for Semantic Segmentation support mixed precision training, and how to use it correctly?
This problem is important to me. Looking forward to your reply! Thank you~

How can I load an pretrained model to train my dataset with num_cls=2.

After I loaded the pretrained model file (namely 'upernet_swin_small_patch4_window7_512x512.pth'), an error was raised as the follow. What is different between my custom dataset with ADE20K is that the num_cls of mine is 2, not 150. So, did the pretrained model only be useful to the ADE? How can I load an pretrained model to train my dataset with num_cls=2.
Thank you for your help!
Traceback (most recent call last):
File "/home/xxx/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 179, in build_from_cfg
return obj_cls(**args)
File "/home/xxx/paper_code2/Swin-Transformer-Semantic-Segmentation/mmseg/models/segmentors/encoder_decoder.py", line 39, in init
self.init_weights(pretrained=pretrained)
File "/home/xxx/paper_code2/Swin-Transformer-Semantic-Segmentation/mmseg/models/segmentors/encoder_decoder.py", line 68, in init_weights
self.backbone.init_weights(pretrained=pretrained)
File "/home/xxx/paper_code2/Swin-Transformer-Semantic-Segmentation/mmseg/models/backbones/swin_transformer.py", line 594, in init_weights
load_checkpoint(self, pretrained, strict=False, logger=logger)
File "/home/xxx/paper_code2/Swin-Transformer-Semantic-Segmentation/mmcv_custom/checkpoint.py", line 340, in load_checkpoint
table_current = model.state_dict()[table_key]
KeyError: 'backbone.layers.0.blocks.0.attn.relative_position_bias_table'

During handling of the above exception, another exception occurred:

FLOPs on ADE20K dataset seems to be too large?

The FLOPs reported in the paper seem to be too large, I'm wondering what is the input size when you calculate the FLOPs?

When I calculate the FLOPs for Swin-Tiny model with the command:

python tools/get_flops.py configs/swin/upernet_swin_tiny_patch4_window7_512x512_160k_ade20k.py --shape 512 512

I got the following output:

==============================
Input shape: (3, 512, 512)
Flops: 236.08 GFLOPs
Params: 59.94 M
==============================

But in the paper, Swin-Tiny has 945G FLOPs.

Could you share more details on how you calculated the FLOPs in your paper? Thanks!

Speed is extremely slow.

Very very slow. inference one image on cityscapes needs 500 ms, 1080ti.

and this is only model forward time..

# forward the model
    with torch.no_grad():
        import time
        tic = time.time()
        result = model(return_loss=False, rescale=True, **data)
        print(time.time() - tic)
    return result

Swin-L Pre-trained model

Dear author,
Is Swin-L pre-trained model available?
I thought the best mIoU on ADE20K dataset is 53.5 with Swin-L model, Can I download it anywhere?

Thanks.

Does the Swin-Transformer-Semantic-Segmentation support mixed precision training?

Does the Swin-Transformer-Semantic-Segmentation support mixed precision training? For example, fp16 or apex?

the result of swin base segmentation is 49.72% mIoU, is it used pretrained model imagenet-1k or imagenet-22k?

the result of swin base segmentation is 49.72% mIoU, i see the pretrain=None in the config. So, this result is not used pretrained model, right?

Is there config of Swin-L ?

I just find the config of Swin-S, Swin-B, and Swin-T. Can you provide the Config of Swin-L and it's pretrained model.

cuda out of memory with different crop_size

I changed crop_size in config\_base_\dataset\cityscapes.py from crop_size = (512, 1024) to crop_size = (1024, 2048) and got a cuda out of memory error from colab (free).

This is the traceback:

2021-06-21 13:33:03,482 - mmseg - INFO - Loaded 500 images
2021-06-21 13:33:03,483 - mmseg - INFO - Start running, host: root@c8eb715da416, work_dir: /content/drive/MyDrive/Image-Segmentation/Swin-Transformer-Semantic-Segmentation/checkpoints/swint_upernet_cityscapes_20k_1024x2048
2021-06-21 13:33:03,484 - mmseg - INFO - workflow: [('train', 1)], max: 20000 iters
Traceback (most recent call last):
  File "/content/drive/MyDrive/Image-Segmentation/Swin-Transformer-Semantic-Segmentation/tools/train.py", line 163, in <module>
    main()
  File "/content/drive/MyDrive/Image-Segmentation/Swin-Transformer-Semantic-Segmentation/tools/train.py", line 159, in main
    meta=meta)
  File "/usr/local/lib/python3.7/dist-packages/mmsegmentation-0.11.0-py3.7.egg/mmseg/apis/train.py", line 116, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/iter_based_runner.py", line 131, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/iter_based_runner.py", line 60, in train
    outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/usr/local/lib/python3.7/dist-packages/mmsegmentation-0.11.0-py3.7.egg/mmseg/models/segmentors/base.py", line 152, in train_step
    losses = self(**data_batch)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
    return old_func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmsegmentation-0.11.0-py3.7.egg/mmseg/models/segmentors/base.py", line 122, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmsegmentation-0.11.0-py3.7.egg/mmseg/models/segmentors/encoder_decoder.py", line 158, in forward_train
    gt_semantic_seg)
  File "/usr/local/lib/python3.7/dist-packages/mmsegmentation-0.11.0-py3.7.egg/mmseg/models/segmentors/encoder_decoder.py", line 102, in _decode_head_forward_train
    self.train_cfg)
  File "/usr/local/lib/python3.7/dist-packages/mmsegmentation-0.11.0-py3.7.egg/mmseg/models/decode_heads/decode_head.py", line 186, in forward_train
    seg_logits = self.forward(inputs)
  File "/usr/local/lib/python3.7/dist-packages/mmsegmentation-0.11.0-py3.7.egg/mmseg/models/decode_heads/uper_head.py", line 94, in forward
    for i, lateral_conv in enumerate(self.lateral_convs)
  File "/usr/local/lib/python3.7/dist-packages/mmsegmentation-0.11.0-py3.7.egg/mmseg/models/decode_heads/uper_head.py", line 94, in <listcomp>
    for i, lateral_conv in enumerate(self.lateral_convs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/cnn/bricks/conv_module.py", line 200, in forward
    x = self.norm(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/batchnorm.py", line 136, in forward
    self.weight, self.bias, bn_training, exponential_average_factor, self.eps)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 2016, in batch_norm
    training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 11.17 GiB total capacity; 10.37 GiB already allocated; 182.81 MiB free; 10.63 GiB reserved in total by PyTorch)

To prevent memory overflow I need to reduce the batch size but I cannot find which location to change the batch size parameter. Is it possible to change batch size? If not, what parameter should I change?

This is my config:

2021-06-21 13:32:59,272 - mmseg - INFO - Distributed training: False
2021-06-21 13:32:59,624 - mmseg - INFO - Config:
norm_cfg = dict(type='BN', requires_grad=True)
model = dict(
    type='EncoderDecoder',
    pretrained=
    '/content/drive/MyDrive/Image-Segmentation/Swin-Transformer-Semantic-Segmentation/checkpoints/swin_tiny_patch4_window7_224.pth',
    backbone=dict(
        type='SwinTransformer',
        embed_dim=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=7,
        mlp_ratio=4.0,
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.0,
        attn_drop_rate=0.0,
        drop_path_rate=0.3,
        ape=False,
        patch_norm=True,
        out_indices=(0, 1, 2, 3),
        use_checkpoint=False),
    decode_head=dict(
        type='UPerHead',
        in_channels=[96, 192, 384, 768],
        in_index=[0, 1, 2, 3],
        pool_scales=(1, 2, 3, 6),
        channels=512,
        dropout_ratio=0.1,
        num_classes=150,
        norm_cfg=dict(type='BN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    auxiliary_head=dict(
        type='FCNHead',
        in_channels=384,
        in_index=2,
        channels=256,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=150,
        norm_cfg=dict(type='BN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
    train_cfg=dict(),
    test_cfg=dict(mode='whole'))
dataset_type = 'CityscapesDataset'
data_root = 'data/cityscapes/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (1024, 2048)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations'),
    dict(type='Resize', img_scale=(2049, 1025), ratio_range=(0.5, 2.0)),
    dict(type='RandomCrop', crop_size=(1024, 2048), cat_max_ratio=0.75),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size=(1024, 2048), pad_val=0, seg_pad_val=255),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(2049, 1025),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type='CityscapesDataset',
        data_root='data/cityscapes/',
        img_dir='leftImg8bit/train',
        ann_dir='gtFine/train',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations'),
            dict(
                type='Resize', img_scale=(2049, 1025), ratio_range=(0.5, 2.0)),
            dict(
                type='RandomCrop', crop_size=(1024, 2048), cat_max_ratio=0.75),
            dict(type='RandomFlip', prob=0.5),
            dict(type='PhotoMetricDistortion'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size=(1024, 2048), pad_val=0, seg_pad_val=255),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_semantic_seg'])
        ]),
    val=dict(
        type='CityscapesDataset',
        data_root='data/cityscapes/',
        img_dir='leftImg8bit/val',
        ann_dir='gtFine/val',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(2049, 1025),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='CityscapesDataset',
        data_root='data/cityscapes/',
        img_dir='leftImg8bit/val',
        ann_dir='gtFine/val',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(2049, 1025),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
log_config = dict(
    interval=50, hooks=[dict(type='TextLoggerHook', by_epoch=False)])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
cudnn_benchmark = True
optimizer = dict(
    type='AdamW',
    lr=6e-05,
    betas=(0.9, 0.999),
    weight_decay=0.01,
    paramwise_cfg=dict(
        custom_keys=dict(
            absolute_pos_embed=dict(decay_mult=0.0),
            relative_position_bias_table=dict(decay_mult=0.0),
            norm=dict(decay_mult=0.0))))
optimizer_config = dict()
lr_config = dict(
    policy='poly',
    warmup='linear',
    warmup_iters=1500,
    warmup_ratio=1e-06,
    power=1.0,
    min_lr=0.0,
    by_epoch=False)
runner = dict(type='IterBasedRunner', max_iters=20000)
checkpoint_config = dict(by_epoch=False, interval=2000)
evaluation = dict(interval=2000, metric='mIoU')
work_dir = '/content/drive/MyDrive/Image-Segmentation/Swin-Transformer-Semantic-Segmentation/checkpoints/swint_upernet_cityscapes_20k_1024x2048'
gpu_ids = range(0, 1)

2021-06-21 13:33:01,119 - mmseg - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: norm.weight, norm.bias, head.weight, head.bias, layers.0.blocks.1.attn_mask, layers.1.blocks.1.attn_mask, layers.2.blocks.1.attn_mask, layers.2.blocks.3.attn_mask, layers.2.blocks.5.attn_mask

missing keys in source state_dict: norm0.weight, norm0.bias, norm1.weight, norm1.bias, norm2.weight, norm2.bias, norm3.weight, norm3.bias

2021-06-21 13:33:01,126 - mmseg - INFO - EncoderDecoder(
  (backbone): SwinTransformer(
    (patch_embed): PatchEmbed(
      (proj): Conv2d(3, 96, kernel_size=(4, 4), stride=(4, 4))
      (norm): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
    )
    (pos_drop): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0): BasicLayer(
        (blocks): ModuleList(
          (0): SwinTransformerBlock(
            (norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
            (attn): WindowAttention(
              (qkv): Linear(in_features=96, out_features=288, bias=True)
              (attn_drop): Dropout(p=0.0, inplace=False)
              (proj): Linear(in_features=96, out_features=96, bias=True)
              (proj_drop): Dropout(p=0.0, inplace=False)
              (softmax): Softmax(dim=-1)
            )
            (drop_path): Identity()
            (norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
            (mlp): Mlp(
              (fc1): Linear(in_features=96, out_features=384, bias=True)
              (act): GELU()
              (fc2): Linear(in_features=384, out_features=96, bias=True)
              (drop): Dropout(p=0.0, inplace=False)
            )
          )
          (1): SwinTransformerBlock(
            (norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
            (attn): WindowAttention(
              (qkv): Linear(in_features=96, out_features=288, bias=True)
              (attn_drop): Dropout(p=0.0, inplace=False)
              (proj): Linear(in_features=96, out_features=96, bias=True)
              (proj_drop): Dropout(p=0.0, inplace=False)
              (softmax): Softmax(dim=-1)
            )
            (drop_path): DropPath()
            (norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
            (mlp): Mlp(
              (fc1): Linear(in_features=96, out_features=384, bias=True)
              (act): GELU()
              (fc2): Linear(in_features=384, out_features=96, bias=True)
              (drop): Dropout(p=0.0, inplace=False)
            )
          )
        )
        (downsample): PatchMerging(
          (reduction): Linear(in_features=384, out_features=192, bias=False)
          (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
        )
      )
      (1): BasicLayer(
        (blocks): ModuleList(
          (0): SwinTransformerBlock(
            (norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
            (attn): WindowAttention(
              (qkv): Linear(in_features=192, out_features=576, bias=True)
              (attn_drop): Dropout(p=0.0, inplace=False)
              (proj): Linear(in_features=192, out_features=192, bias=True)
              (proj_drop): Dropout(p=0.0, inplace=False)
              (softmax): Softmax(dim=-1)
            )
            (drop_path): DropPath()
            (norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
            (mlp): Mlp(
              (fc1): Linear(in_features=192, out_features=768, bias=True)
              (act): GELU()
              (fc2): Linear(in_features=768, out_features=192, bias=True)
              (drop): Dropout(p=0.0, inplace=False)
            )
          )
          (1): SwinTransformerBlock(
            (norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
            (attn): WindowAttention(
              (qkv): Linear(in_features=192, out_features=576, bias=True)
              (attn_drop): Dropout(p=0.0, inplace=False)
              (proj): Linear(in_features=192, out_features=192, bias=True)
              (proj_drop): Dropout(p=0.0, inplace=False)
              (softmax): Softmax(dim=-1)
            )
            (drop_path): DropPath()
            (norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
            (mlp): Mlp(
              (fc1): Linear(in_features=192, out_features=768, bias=True)
              (act): GELU()
              (fc2): Linear(in_features=768, out_features=192, bias=True)
              (drop): Dropout(p=0.0, inplace=False)
            )
          )
        )
        (downsample): PatchMerging(
          (reduction): Linear(in_features=768, out_features=384, bias=False)
          (norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        )
      )
      (2): BasicLayer(
        (blocks): ModuleList(
          (0): SwinTransformerBlock(
            (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
            (attn): WindowAttention(
              (qkv): Linear(in_features=384, out_features=1152, bias=True)
              (attn_drop): Dropout(p=0.0, inplace=False)
              (proj): Linear(in_features=384, out_features=384, bias=True)
              (proj_drop): Dropout(p=0.0, inplace=False)
              (softmax): Softmax(dim=-1)
            )
            (drop_path): DropPath()
            (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
            (mlp): Mlp(
              (fc1): Linear(in_features=384, out_features=1536, bias=True)
              (act): GELU()
              (fc2): Linear(in_features=1536, out_features=384, bias=True)
              (drop): Dropout(p=0.0, inplace=False)
            )
          )
          (1): SwinTransformerBlock(
            (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
            (attn): WindowAttention(
              (qkv): Linear(in_features=384, out_features=1152, bias=True)
              (attn_drop): Dropout(p=0.0, inplace=False)
              (proj): Linear(in_features=384, out_features=384, bias=True)
              (proj_drop): Dropout(p=0.0, inplace=False)
              (softmax): Softmax(dim=-1)
            )
            (drop_path): DropPath()
            (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
            (mlp): Mlp(
              (fc1): Linear(in_features=384, out_features=1536, bias=True)
              (act): GELU()
              (fc2): Linear(in_features=1536, out_features=384, bias=True)
              (drop): Dropout(p=0.0, inplace=False)
            )
          )
          (2): SwinTransformerBlock(
            (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
            (attn): WindowAttention(
              (qkv): Linear(in_features=384, out_features=1152, bias=True)
              (attn_drop): Dropout(p=0.0, inplace=False)
              (proj): Linear(in_features=384, out_features=384, bias=True)
              (proj_drop): Dropout(p=0.0, inplace=False)
              (softmax): Softmax(dim=-1)
            )
            (drop_path): DropPath()
            (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
            (mlp): Mlp(
              (fc1): Linear(in_features=384, out_features=1536, bias=True)
              (act): GELU()
              (fc2): Linear(in_features=1536, out_features=384, bias=True)
              (drop): Dropout(p=0.0, inplace=False)
            )
          )
          (3): SwinTransformerBlock(
            (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
            (attn): WindowAttention(
              (qkv): Linear(in_features=384, out_features=1152, bias=True)
              (attn_drop): Dropout(p=0.0, inplace=False)
              (proj): Linear(in_features=384, out_features=384, bias=True)
              (proj_drop): Dropout(p=0.0, inplace=False)
              (softmax): Softmax(dim=-1)
            )
            (drop_path): DropPath()
            (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
            (mlp): Mlp(
              (fc1): Linear(in_features=384, out_features=1536, bias=True)
              (act): GELU()
              (fc2): Linear(in_features=1536, out_features=384, bias=True)
              (drop): Dropout(p=0.0, inplace=False)
            )
          )
          (4): SwinTransformerBlock(
            (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
            (attn): WindowAttention(
              (qkv): Linear(in_features=384, out_features=1152, bias=True)
              (attn_drop): Dropout(p=0.0, inplace=False)
              (proj): Linear(in_features=384, out_features=384, bias=True)
              (proj_drop): Dropout(p=0.0, inplace=False)
              (softmax): Softmax(dim=-1)
            )
            (drop_path): DropPath()
            (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
            (mlp): Mlp(
              (fc1): Linear(in_features=384, out_features=1536, bias=True)
              (act): GELU()
              (fc2): Linear(in_features=1536, out_features=384, bias=True)
              (drop): Dropout(p=0.0, inplace=False)
            )
          )
          (5): SwinTransformerBlock(
            (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
            (attn): WindowAttention(
              (qkv): Linear(in_features=384, out_features=1152, bias=True)
              (attn_drop): Dropout(p=0.0, inplace=False)
              (proj): Linear(in_features=384, out_features=384, bias=True)
              (proj_drop): Dropout(p=0.0, inplace=False)
              (softmax): Softmax(dim=-1)
            )
            (drop_path): DropPath()
            (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
            (mlp): Mlp(
              (fc1): Linear(in_features=384, out_features=1536, bias=True)
              (act): GELU()
              (fc2): Linear(in_features=1536, out_features=384, bias=True)
              (drop): Dropout(p=0.0, inplace=False)
            )
          )
        )
        (downsample): PatchMerging(
          (reduction): Linear(in_features=1536, out_features=768, bias=False)
          (norm): LayerNorm((1536,), eps=1e-05, elementwise_affine=True)
        )
      )
      (3): BasicLayer(
        (blocks): ModuleList(
          (0): SwinTransformerBlock(
            (norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): WindowAttention(
              (qkv): Linear(in_features=768, out_features=2304, bias=True)
              (attn_drop): Dropout(p=0.0, inplace=False)
              (proj): Linear(in_features=768, out_features=768, bias=True)
              (proj_drop): Dropout(p=0.0, inplace=False)
              (softmax): Softmax(dim=-1)
            )
            (drop_path): DropPath()
            (norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (mlp): Mlp(
              (fc1): Linear(in_features=768, out_features=3072, bias=True)
              (act): GELU()
              (fc2): Linear(in_features=3072, out_features=768, bias=True)
              (drop): Dropout(p=0.0, inplace=False)
            )
          )
          (1): SwinTransformerBlock(
            (norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): WindowAttention(
              (qkv): Linear(in_features=768, out_features=2304, bias=True)
              (attn_drop): Dropout(p=0.0, inplace=False)
              (proj): Linear(in_features=768, out_features=768, bias=True)
              (proj_drop): Dropout(p=0.0, inplace=False)
              (softmax): Softmax(dim=-1)
            )
            (drop_path): DropPath()
            (norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (mlp): Mlp(
              (fc1): Linear(in_features=768, out_features=3072, bias=True)
              (act): GELU()
              (fc2): Linear(in_features=3072, out_features=768, bias=True)
              (drop): Dropout(p=0.0, inplace=False)
            )
          )
        )
      )
    )
    (norm0): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
    (norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
    (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
    (norm3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (decode_head): UPerHead(
    input_transform=multiple_select, ignore_index=255, align_corners=False
    (loss_decode): CrossEntropyLoss()
    (conv_seg): Conv2d(512, 150, kernel_size=(1, 1), stride=(1, 1))
    (dropout): Dropout2d(p=0.1, inplace=False)
    (psp_modules): PPM(
      (0): Sequential(
        (0): AdaptiveAvgPool2d(output_size=1)
        (1): ConvModule(
          (conv): Conv2d(768, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activate): ReLU(inplace=True)
        )
      )
      (1): Sequential(
        (0): AdaptiveAvgPool2d(output_size=2)
        (1): ConvModule(
          (conv): Conv2d(768, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activate): ReLU(inplace=True)
        )
      )
      (2): Sequential(
        (0): AdaptiveAvgPool2d(output_size=3)
        (1): ConvModule(
          (conv): Conv2d(768, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activate): ReLU(inplace=True)
        )
      )
      (3): Sequential(
        (0): AdaptiveAvgPool2d(output_size=6)
        (1): ConvModule(
          (conv): Conv2d(768, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activate): ReLU(inplace=True)
        )
      )
    )
    (bottleneck): ConvModule(
      (conv): Conv2d(2816, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (activate): ReLU(inplace=True)
    )
    (lateral_convs): ModuleList(
      (0): ConvModule(
        (conv): Conv2d(96, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activate): ReLU()
      )
      (1): ConvModule(
        (conv): Conv2d(192, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activate): ReLU()
      )
      (2): ConvModule(
        (conv): Conv2d(384, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activate): ReLU()
      )
    )
    (fpn_convs): ModuleList(
      (0): ConvModule(
        (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activate): ReLU()
      )
      (1): ConvModule(
        (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activate): ReLU()
      )
      (2): ConvModule(
        (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activate): ReLU()
      )
    )
    (fpn_bottleneck): ConvModule(
      (conv): Conv2d(2048, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (activate): ReLU(inplace=True)
    )
  )
  (auxiliary_head): FCNHead(
    input_transform=None, ignore_index=255, align_corners=False
    (loss_decode): CrossEntropyLoss()
    (conv_seg): Conv2d(256, 150, kernel_size=(1, 1), stride=(1, 1))
    (dropout): Dropout2d(p=0.1, inplace=False)
    (convs): Sequential(
      (0): ConvModule(
        (conv): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (activate): ReLU(inplace=True)
      )
    )
  )
)

Questions in using pretrained model

The state_dict keys in pretrained model seem not match the keys in segmentation model, e.g.
In pretrained model, keys are like "patch_embed.proj.weight", "layers.0.blocks.0.attn.proj.weight", but in the segmenatation model, keys are like "backbone.patch_embed.proj.weight" and "backbone.layers.0.blocks.0.attn.proj.weight".
Moreover, even I try to modify the keys in pretained model with "backbone." as the prefix, there are still some missing layer parameters to load from pretrained model like "backbone.norm0.weights".
Does anybody can share the correct way to use the pretained model?

A question about attn mask

Thanks for your work. I'm confused about what 's the target of design of attention mask mechanism, and how it works.

AttributeError: 'super' object has no attribute '_specify_ddp_gpu_num'

Thank you very much ,I have a question to ask you .
configuration 0f my enviroment
mmcv-full 1.2.4
mmsegmentation 0.11.0
my problem:
AttributeError: 'super' object has no attribute '_specify_ddp_gpu_num'

KeyError: "EncoderDecoder: 'SwinTransformer is not in the backbone registry'"

When I am running the inference step, I meet the error as below.

About classifying pixels.

Hi I wonder somethings with your nice model.

This model might uses UPerNet. But this classifies pixel by using conv2d(decoder head, self.cls_seg)
But original model,UPerNet, uses soft max. Is there no problem?
If I want to customize binary class model, how can I choose num_class? ('back_ground', 'car'). Is it 1 or 2?
I also tried each, I have no confidence

Thanks!

Pretrained Model

When I use the pretrained model upernet_swin_tiny_patch4_window7_512x512.pth，I got this error KeyError: "EncoderDecoder: 'backbone.layers.0.blocks.0.attn.relative_position_bias_table'"
What should I do ？

python -u /home/wyh/Codes/SwinTransformer/tools/train.py configs/swin/upernet_swin_tiny_patch4_window7_512x512_160k_ade20k.py --options model.pretrained=upernet_swin_tiny_patch4_window7_512x512.pth

Confusion of implementation details

class SwinTransformerBlock(nn.Module):
    """ 
        Swin Transformer Block.
    ...

    def __init__(self, dim, num_heads, window_size=7, shift_size=0,
                 mlp_ratio=4., qkv_bias=True, qk_scale=None, drop=0., attn_drop=0., drop_path=0.,
                 act_layer=nn.GELU, norm_layer=nn.LayerNorm):
        super().__init__()
        ...

        self.H = None
        self.W = None

    def forward(self, x, mask_matrix):
        ...
        H, W = self.H, self.W
        assert L == H * W, "input feature has wrong size"

The implementation codes in SwinTransformerBlock, you assign self.H = None; self.W = None, but if we run the function forward() , the program looks like going to stop at assert L == H * W, "input feature has wrong size". Is this a bug or did I miss something?

Learning rate scaling rule for different batch sizes

Hi there! I was trying to train your model on 4 cards, by increasing the number of samples per card to 4. The total batch size = 4x4 = 8x2 = 16, which is the same as your original setting. Everything else is kept the same. Do you think that would be equivalent to your 8-card experiment?

More generally, when the total batch size is changed, do you expect the linear LR scaling rule presented in this report to hold true for Swin experiments, which use AdamW instead of SGD? If not, can you recommend a way for it? Many thanks in advance!

github

Tried to run demo, got unexpected keyword argument 'pretrain_style'

Thanks for your error report and we appreciate it a lot.

Checklist

I have searched related issues but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug
A clear and concise description of what the bug is.

Reproduction

What command or script did you run?

(swin_transformer) [user@node001 mmsegmentation-master]$ python ./demo/image_demo.py /datasets/ADE20K/ADEChallengeData2016/images/validation/ADE_val_00000001.jpg ./configs/swin/upernet_swin_tiny_patch4_window7_512x512_160k_ade20k_pretrain_224x224_1K.py ./pretrained/upernet_swin_tiny_patch4_window7_512x512.pth

Did you make any modifications on the code or config? Did you understand what you have modified?
No modifications.
What dataset did you use?
ADE20K

Environment

Please run python mmseg/utils/collect_env.py to collect necessary environment infomation and paste it here.

python collect_env.py 
fatal: Not a git repository (or any parent up to mount point /data2)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
sys.platform: linux
Python: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: Tesla V100-SXM2-32GB
CUDA_HOME: /cm/shared/apps/cuda10.1/toolkit/10.0.130
GCC: gcc (GCC) 6.3.0
PyTorch: 1.6.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.3-Product Build 20210617 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

TorchVision: 0.7.0
OpenCV: 4.5.3
MMCV: 1.3.9
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.2
MMSegmentation: 0.16.0+

You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source]
- Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

conda create -n swin_transformer python=3.7 -y
conda activate swin_transformer
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch

Error traceback

If applicable, paste the error trackback here.

python ./demo/image_demo.py /datasets/ADE20K/ADEChallengeData2016/images/validation/ADE_val_00000001.jpg ./configs/swin/upernet_swin_tiny_patch4_window7_512x512_160k_ade20k_pretrain_224x224_1K.py ./pretrained/upernet_swin_tiny_patch4_window7_512x512.pth
Traceback (most recent call last):
  File "/home/user/work/anaconda3/envs/swin_transformer/lib/python3.7/site-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
    return obj_cls(**args)
TypeError: __init__() got an unexpected keyword argument 'pretrain_style'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/work/anaconda3/envs/swin_transformer/lib/python3.7/site-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
    return obj_cls(**args)
  File "/data2/userdata/user/workSpace/Swin-Transformer-Seg/mmsegmentation-master/mmseg/models/segmentors/encoder_decoder.py", line 36, in __init__
    self.backbone = builder.build_backbone(backbone)
  File "/data2/userdata/user/workSpace/Swin-Transformer-Seg/mmsegmentation-master/mmseg/models/builder.py", line 20, in build_backbone
    return BACKBONES.build(cfg)
  File "/home/user/work/anaconda3/envs/swin_transformer/lib/python3.7/site-packages/mmcv/utils/registry.py", line 210, in build
    return self.build_func(*args, **kwargs, registry=self)
  File "/home/user/work/anaconda3/envs/swin_transformer/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 26, in build_model_from_cfg
    return build_from_cfg(cfg, registry, default_args)
  File "/home/user/work/anaconda3/envs/swin_transformer/lib/python3.7/site-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
TypeError: SwinTransformer: __init__() got an unexpected keyword argument 'pretrain_style'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./demo/image_demo.py", line 40, in <module>
    main()
  File "./demo/image_demo.py", line 27, in main
    model = init_segmentor(args.config, args.checkpoint, device=args.device)
  File "/data2/userdata/user/workSpace/Swin-Transformer-Seg/mmsegmentation-master/mmseg/apis/inference.py", line 32, in init_segmentor
    model = build_segmentor(config.model, test_cfg=config.get('test_cfg'))
  File "/data2/userdata/user/workSpace/Swin-Transformer-Seg/mmsegmentation-master/mmseg/models/builder.py", line 49, in build_segmentor
    cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg))
  File "/home/user/work/anaconda3/envs/swin_transformer/lib/python3.7/site-packages/mmcv/utils/registry.py", line 210, in build
    return self.build_func(*args, **kwargs, registry=self)
  File "/home/user/work/anaconda3/envs/swin_transformer/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 26, in build_model_from_cfg
    return build_from_cfg(cfg, registry, default_args)
  File "/home/user/work/anaconda3/envs/swin_transformer/lib/python3.7/site-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
TypeError: EncoderDecoder: SwinTransformer: __init__() got an unexpected keyword argument 'pretrain_style'

Bug fix

If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

Questions about computing flops.

How to get the reported flops in the paper? I used the scripts in tools/get_flops.py, but I only get 236.1 GFLOPs with Swin-Tiny with the image shape of (3,512,512) , while the paper is 945 GFLOPs.

Problem about reproducing the results of Swin tiny model segmentation

Hi, dear authors! Swin is a great work and I would like to try to reproduce some of its results.

I use the swin tiny config provided in this code and the pretrained model in Swin-Transformer Classification to reproduce the Swin tiny model segmentation results on ade20k, however, the result (single scale/multi scale) is 43.94/45.59, which differs from 44.51/45.81 produced in this code, also differs from the 46.1 in paper.

Do you have any idea how can I reproduce your results? Or have you noticed the same results in your experiments?

I set up the environment as described in Swin-Transformer Classification get_started.md. The environment detail is as below:

sys.platform: linux
Python: 3.7.7 (default, May 7 2020, 21:25:33) [GCC 7.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: Tesla V100-PCIE-32GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.1, V10.1.243
GCC: gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
PyTorch: 1.7.1
PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
CuDNN 7.6.3
Magma 2.5.2
Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.8.2
OpenCV: 4.4.0
MMCV: 1.2.7
MMCV Compiler: GCC 7.4
MMCV CUDA Compiler: 10.1
MMSegmentation: 0.11.0+c914b2c

"EncoderDecoder:'SwinTransformer' is not in the backbone registry

Hello,
when I run the train.py:
error occured:KeyError:"EncoderDecoder:'SwinTransformer' is not in the backbone registry

Unexpected keyword argument 'ape'

When I run a demo using a pretrained model, I faced an error.
What I run is
python demo/image_demo.py test.png configs/swin/upernet_swin_tiny_patch4_window7_512_512_160k_ade20k.py moby_upernet_swin_tiny_patch4_window7_512*512.pth

Error is

Question about config files (ade20k.py and schedule_160k.py)

If I'm not mistaken, a couple of config files are yet to be committed.

datasets/ade20k.py
schedules/schedule_160k.py

Please do consider committing the above files.

What is the launch args in VSCode?

I try to run and debug train.py in VSCode, but fails to set the right arguments. What should it be like?

My args are like this:
"version": "0.2.0",
"configurations": [
{
"name": "Python: Current File",
"type": "python",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal",
"args": [
"/xxx/SwinT-Segment/configs/swin/upernet_swin_tiny_patch4_window7_512x512_160k_cityscapes.py",
"--gpus",
"8",
"--options",
"model.pretrained=/xxx/SwinT-Segment/pretrain/swin_tiny_patch4_window7_224.pth",
]
}
]
}

MMCV= 1.1.15 is not compatible while the rest versions cannot be installed

Hi,
I followed get_started.md to install all the dependencies same as is mentioned in the GitHub.

When I am running the command"python tools/test.py configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py checkpoints/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth --eval mIoU" Iam getting this error :

AssertionError: MMCV==1.1.5 is used but incompatible. Please install mmcv>=(1, 3, 13, 0, 0, 0), <=(1, 4, 0, 0, 0, 0).
The problem I have is that I cannot install any mmcv versions between 1.3.13 and 1.4.0.

Can you please help how to solve it?
Thanks

Train on Custom Dataset

Hi, I really appreciate this wonderful work, but I got some problems on training on my custom dataset. I have a dataset with only 2 classes (0: background; 1:target), and class 0 accounts for 95%+, I just modify the config file as follow:

but mIoU will keep unchanged after a few epochs of training, the info is as follow:

Is there anything that I missed or any configs I overlooked? Hope someone could give me some suggestions, thx in advance!!

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR what(): CUDA error: an illegal memory access was encountered

As I ran the Inference for single-gpu testing as written in the README, the inference process stopped at 10/2000 with the following error:

Use load_from_local loader
[                              ] 10/2000, 0.9 task/s, elapsed: 11s, ETA:  2161spython-BaseException
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: an illegal memory access was encountered
Traceback (most recent call last):
  File "/home/peter/PycharmProjects/Swin-Transformer-Semantic-Segmentation/mmseg/apis/test.py", line 60, in single_gpu_test
    result = model(return_loss=False, **data)
  File "/home/peter/Applications/anaconda3/envs/swin-seg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/peter/Applications/anaconda3/envs/swin-seg/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 42, in forward
    return super().forward(*inputs, **kwargs)
  File "/home/peter/Applications/anaconda3/envs/swin-seg/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/peter/Applications/anaconda3/envs/swin-seg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/peter/Applications/anaconda3/envs/swin-seg/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
    return old_func(*args, **kwargs)
  File "/home/peter/PycharmProjects/Swin-Transformer-Semantic-Segmentation/mmseg/models/segmentors/base.py", line 124, in forward
    return self.forward_test(img, img_metas, **kwargs)
  File "/home/peter/PycharmProjects/Swin-Transformer-Semantic-Segmentation/mmseg/models/segmentors/base.py", line 106, in forward_test
    return self.simple_test(imgs[0], img_metas[0], **kwargs)
  File "/home/peter/PycharmProjects/Swin-Transformer-Semantic-Segmentation/mmseg/models/segmentors/encoder_decoder.py", line 265, in simple_test
    seg_logit = self.inference(img, img_meta, rescale)
  File "/home/peter/PycharmProjects/Swin-Transformer-Semantic-Segmentation/mmseg/models/segmentors/encoder_decoder.py", line 250, in inference
    seg_logit = self.whole_inference(img, img_meta, rescale)
  File "/home/peter/PycharmProjects/Swin-Transformer-Semantic-Segmentation/mmseg/models/segmentors/encoder_decoder.py", line 217, in whole_inference
    seg_logit = self.encode_decode(img, img_meta)
  File "/home/peter/PycharmProjects/Swin-Transformer-Semantic-Segmentation/mmseg/models/segmentors/encoder_decoder.py", line 88, in encode_decode
    out = self._decode_head_forward_test(x, img_metas)
  File "/home/peter/PycharmProjects/Swin-Transformer-Semantic-Segmentation/mmseg/models/segmentors/encoder_decoder.py", line 110, in _decode_head_forward_test

I have googled this error (pytorch/pytorch#21819) and the culprits could be too big batch_size.

But the samples_per_gpu was set to 1 and I have a GPU with a huge memory of 48GB, and during the inference it gradually took all 48GB till it overflowed. so it must be something wrong with the setting or within the code, but I cannot find it out.

Can you help me to solve it? Thanks in advance.

Pre-trained weights are not fully compatible with backbone

Hey! Thanks for your brilliant work!

The pre-trained weights seemed to be not fully compatible with the backbone. What's the reason for this difference? Does it affect the training process?

Missing key(s) in state_dict: "absolute_pos_embed", "norm0.weight", "norm0.bias", "norm1.weight", "norm1.bias", "norm2.weight", "norm2.bias", "norm3.weight", "norm3.bias". 
Unexpected key(s) in state_dict: "norm.weight", "norm.bias", "layers.0.blocks.1.attn_mask", "layers.1.blocks.1.attn_mask", "layers.2.blocks.1.attn_mask", "layers.2.blocks.3.attn_mask", "layers.2.blocks.5.attn_mask".

_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe28TypeMeta21_typeMetaDataInstanceISt7complexIdEEEPKNS_6detail12TypeMetaDataEv

0\ use Xshell connect my dev machine
1\ setup my env following https://github.com/open-mmlab/mmsegmentation/blob/83d312e87a805050bc52d51165e5a332f1ff84fb/docs/get_started.md
2\ check version: python3.7, mmcv-full 1.3.0, torch 1.6.0
3\ run code: python3 demo/image_demo.py data/infer/images/test/dataset10k_9684.jpg configs/swin/upernet_swin_base_patch4_window7_512x512_160k_ade20k.py ../upernet_swin_base_patch4_window7_512x512.pth

Traceback (most recent call last):
File "demo/image_demo.py", line 3, in
from mmseg.apis import inference_segmentor, init_segmentor, show_result_pyplot
File "/home/users/xxx/swin-trans/Swin-Transformer-Semantic-Segmentation-main/mmseg/apis/init.py", line 1, in
from .inference import inference_segmentor, init_segmentor, show_result_pyplot
File "/home/users/xxx/swin-trans/Swin-Transformer-Semantic-Segmentation-main/mmseg/apis/inference.py", line 8, in
from mmseg.models import build_segmentor
File "/home/users/xxx/swin-trans/Swin-Transformer-Semantic-Segmentation-main/mmseg/models/init.py", line 1, in
from .backbones import * # noqa: F401,F403
File "/home/users/xxx/swin-trans/Swin-Transformer-Semantic-Segmentation-main/mmseg/models/backbones/init.py", line 2, in
from .fast_scnn import FastSCNN
File "/home/users/xxx/swin-trans/Swin-Transformer-Semantic-Segmentation-main/mmseg/models/backbones/fast_scnn.py", line 7, in
from mmseg.models.decode_heads.psp_head import PPM
File "/home/users/xxx/swin-trans/Swin-Transformer-Semantic-Segmentation-main/mmseg/models/decode_heads/init.py", line 4, in
from .cc_head import CCHead
File "/home/users/xxx/swin-trans/Swin-Transformer-Semantic-Segmentation-main/mmseg/models/decode_heads/cc_head.py", line 7, in
from mmcv.ops import CrissCrossAttention
File "/home/users/xxx/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/ops/init.py", line 1, in
from .bbox import bbox_overlaps
File "/home/users/xxx/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/ops/bbox.py", line 3, in
ext_module = ext_loader.load_ext('_ext', ['bbox_overlaps'])
File "/home/users/xxx/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/ext_loader.py", line 11, in load_ext
ext = importlib.import_module('mmcv.' + name)
File "/home/users/xxx/anaconda3/envs/open-mmlab/lib/python3.7/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ImportError: /home/users/xxx/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe28TypeMeta21_typeMetaDataInstanceISt7complexIdEEEPKNS_6detail12TypeMetaDataEv

Something went wrong, but I have no idea...

mIOU is not expected

python tools/train.py configs/swin/upernet_swin_tiny_patch4_window7_512x512_160k_ade20k.py --options model.pretrained=checkpoints/swin_tiny_patch4_window7_224.pth model.backbone.use_checkpoint=True --work-dir=./
I trained the model with above cmd on Windows. mIOU is 30.91 much less than 44.51 as expected. The only change I made was change SyncBN to BN in order to train with single gpu. Is it normal that the mIou differs so much?

question about AssertionError: PascalVOCDataset:

Hello, I convert my dataset to VOC format and modify my config file ,then I trained a segment task with "python tools/train.py configs/swin/upernet_swin_tiny_patch4_window7_512x512_160k_voc.py --options model.pretrained=../pretrain_models/swin_tiny_patch4_window7_224.pth" on my own datasets,it happened:

Traceback (most recent call last):
File "tools/train.py", line 163, in
main()
File "tools/train.py", line 159, in main
meta=meta)
File "/home/ma-user/work/Swin-Transformer-Semantic-Segmentation/mmseg/apis/train.py", line 100, in train_segmentor
val_dataset = build_dataset(cfg.data.val, dict(test_mode=True))
File "/home/ma-user/work/Swin-Transformer-Semantic-Segmentation/mmseg/datasets/builder.py", line 73, in build_dataset
dataset = build_from_cfg(cfg, DATASETS, default_args)
File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/registry.py", line 182, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
AssertionError: PascalVOCDataset:

I wonder if the code not support VOC format dataset? which kind of dataset do the code support?
thank you!

is it necessary to use stochastic depth in swin_tiny

Is it really effective in sem.seg? Won't this affect the network's performance?Also, does the test keep the stochastic depth.

KeyError: "EncoderDecoder: 'SwinTransformer is not in the backbone registry'

请问这是什么错误

What is the differences between mIoU and mIoU (ms+flip) ?

Hi, everyone:

Backbone	Method	Crop Size	Lr Schd	mIoU	mIoU (ms+flip)	#params	FLOPs	config	log	model
Swin-T	UPerNet	512x512	160K	44.51	45.81	60M	945G	config	github/baidu	github/baidu
Swin-S	UperNet	512x512	160K	47.64	49.47	81M	1038G	config	github/baidu	github/baidu
Swin-B	UperNet	512x512	160K	48.13	49.72	121M	1188G	config	github/baidu	github/baidu

I want to know the differences between mIoU and mIoU (ms+flip), and what is the exact setting of mIoU (ms+flip)

Thanks a lot!

question about run code in training on one gpu

Hello, when I enter code with "python tools/train.py configs/swin/upernet_swin_tiny_patch4_window7_512x512_160k_ade20k.py --options model.pretrained=models/upernet_swin_tiny_patch4_window7_512x512.pth [model.backbone.use_checkpoint=True]" on my windows,it happened:
(RTX3090) D:\Swin-Transformer-Semantic-Segmentation>python tools/train.py configs/swin/upernet_swin_tiny_patch4_window7_512x512_160k_ade20k.py --options model.pretrained=models/upernet_swin_tiny_patch4_window7_512x512.pth [model.backbone.use_checkpoint=True]
.\work_dirs\upernet_swin_tiny_patch4_window7_512x512_160k_ade20k
Traceback (most recent call last):
File "D:\Anaconda\envs\RTX3090\lib\site-packages\yapf\yapflib\pytree_utils.py", line 115, in ParseCodeToTree
tree = parser_driver.parse_string(code, debug=False)
File "D:\Anaconda\envs\RTX3090\lib\lib2to3\pgen2\driver.py", line 104, in parse_string
return self.parse_tokens(tokens, debug)
File "D:\Anaconda\envs\RTX3090\lib\lib2to3\pgen2\driver.py", line 72, in parse_tokens
if p.addtoken(type, value, (prefix, start)):
File "D:\Anaconda\envs\RTX3090\lib\lib2to3\pgen2\parse.py", line 159, in addtoken
raise ParseError("bad input", type, value, context)
lib2to3.pgen2.parse.ParseError: bad input: type=3, value="'model'", context=('\n', (4, 0))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\Anaconda\envs\RTX3090\lib\site-packages\yapf\yapflib\pytree_utils.py", line 121, in ParseCodeToTree
tree = parser_driver.parse_string(code, debug=False)
File "D:\Anaconda\envs\RTX3090\lib\lib2to3\pgen2\driver.py", line 104, in parse_string
return self.parse_tokens(tokens, debug)
File "D:\Anaconda\envs\RTX3090\lib\lib2to3\pgen2\driver.py", line 72, in parse_tokens
if p.addtoken(type, value, (prefix, start)):
File "D:\Anaconda\envs\RTX3090\lib\lib2to3\pgen2\parse.py", line 159, in addtoken
raise ParseError("bad input", type, value, context)
lib2to3.pgen2.parse.ParseError: bad input: type=3, value="'model'", context=('\n', (4, 0))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "tools/train.py", line 164, in
main()
File "tools/train.py", line 101, in main
cfg.dump(osp.join(cfg.work_dir, osp.basename(args.config)))
File "D:\Anaconda\envs\RTX3090\lib\site-packages\mmcv\utils\config.py", line 458, in dump
f.write(self.pretty_text)
File "D:\Anaconda\envs\RTX3090\lib\site-packages\mmcv\utils\config.py", line 413, in pretty_text
text, _ = FormatCode(text, style_config=yapf_style, verify=True)
File "D:\Anaconda\envs\RTX3090\lib\site-packages\yapf\yapflib\yapf_api.py", line 147, in FormatCode
tree = pytree_utils.ParseCodeToTree(unformatted_source)
File "D:\Anaconda\envs\RTX3090\lib\site-packages\yapf\yapflib\pytree_utils.py", line 127, in ParseCodeToTree
raise e
File "D:\Anaconda\envs\RTX3090\lib\site-packages\yapf\yapflib\pytree_utils.py", line 125, in ParseCodeToTree
ast.parse(code)
File "D:\Anaconda\envs\RTX3090\lib\ast.py", line 35, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "", line 4
'model': dict(
^
SyntaxError: invalid syntax

so I want ask the run code example for windows when I train on one gpu?

关于W-MSA的问题

我看到您的代码中，取消了W-MSA模块，全部使用了SW-MSA模块。请问，这是为什么呢？

Stochastic depth ratio

Hi,

Thanks for sharing this code. I have a question about the stochastic depth ratio for the experiments on ADE20K.

The stochastic depth ratio reported in the paper is 0.2, but this repo adopts 0.3 for all models. Could you please let me know which one did you use in Table 3?

Thanks.

Different window size for pretrained model

In the config file, only the window size 7 is used, which corresponds to the pretrained model trained with 224x224 images. I wonder how would the pretrained image size and window size impact?

If I use the swin_base_patch4_window12_384_22k.pth model, what should I change in the config file upernet_swin_base_patch4_window7_512x512_160k_ade20k.py. Only changing the window size from 7 to 12?

embed_dim error

Excuse me, I have the following problem when running your model：

TypeError: EncoderDecoder: SwinTransformer: init() got an unexpected keyword argument 'embed_dim'

The command I use is ： python tools/train.py configs/swin/upernet_swin_tiny_patch4_window7_512x512_160k_ade20k.py

I hope I can get your help

meet the issue KeyError: 'backbone.layers.0.blocks.0.attn.relative_position_bias_table'

when I train my own datasets, meet the issue as below:
KeyError: 'backbone.layers.0.blocks.0.attn.relative_position_bias_table

environment:
rtx3090 4 GPU(rtx3090) with cuda 11.1
swin-b
upernet_swin_base_patch4_window7_512x512.pth
may somebody give any clues?

How to load the pretrained model ?

Dear authors:
first, great thanks for your excellent work.
When I want to train the model for my custom datasets with one single GPU, I run the following code:
python tools/train.py configs/swin/upernet_swin_base_patch4_window7_512x512_160k_messidor.py --options model.pretrained=weights/upernet_swin_base_patch4_window7_512x512.pth

the error occurs: KeyError: "EncoderDecoder: 'backbone.layers.0.blocks.0.attn.relative_position_bias_table'"

So, how to solve it?

MIoU, mAcc, aAcc during single GPU training at 16000/160000 test

What are the values of mIoU, mAcc, and aAcc during the 16000/160000 test with a single GPU training?

ModuleNotFoundError: No module named 'mmseg'

after install the requirement by pip install -r requirement.txt

> python tools/test.py -h
Traceback (most recent call last):
  File "tools/test.py", line 10, in <module>
    from mmseg.apis import multi_gpu_test, single_gpu_test
ModuleNotFoundError: No module named 'mmseg

swintransformer / swin-transformer-semantic-segmentation Goto Github PK

swin-transformer-semantic-segmentation's People

Contributors

Stargazers

Watchers

Forkers

swin-transformer-semantic-segmentation's Issues

TorchVision: 0.10.0 OpenCV: 4.5.3 MMCV: 1.3.9 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMSegmentation: 0.11.0+

Recommend Projects

Recommend Topics

Recommend Org

TorchVision: 0.10.0
OpenCV: 4.5.3
MMCV: 1.3.9
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.1
MMSegmentation: 0.11.0+