Giter Site home page Giter Site logo

open-mmlab / mmsegmentation Goto Github PK

View Code? Open in Web Editor NEW
7.4K 54.0 2.5K 44.21 MB

OpenMMLab Semantic Segmentation Toolbox and Benchmark.

Home Page: https://mmsegmentation.readthedocs.io/en/main/

License: Apache License 2.0

Python 99.16% Dockerfile 0.10% Shell 0.62% Jupyter Notebook 0.12%
semantic-segmentation pytorch pspnet deeplabv3 transformer swin-transformer realtime-segmentation vessel-segmentation retinal-vessel-segmentation image-segmentation

mmsegmentation's Introduction

 
OpenMMLab website HOT      OpenMMLab platform TRY IT OUT
 

PyPI - Python Version PyPI docs badge codecov license issue resolution open issues Open in OpenXLab

Documentation: https://mmsegmentation.readthedocs.io/en/latest/

English | 简体中文

Introduction

MMSegmentation is an open source semantic segmentation toolbox based on PyTorch. It is a part of the OpenMMLab project.

The main branch works with PyTorch 1.6+.

🎉 Introducing MMSegmentation v1.0.0 🎉

We are thrilled to announce the official release of MMSegmentation's latest version! For this new release, the main branch serves as the primary branch, while the development branch is dev-1.x. The stable branch for the previous release remains as the 0.x branch. Please note that the master branch will only be maintained for a limited time before being removed. We encourage you to be mindful of branch selection and updates during use. Thank you for your unwavering support and enthusiasm, and let's work together to make MMSegmentation even more robust and powerful! 💪

MMSegmentation v1.x brings remarkable improvements over the 0.x release, offering a more flexible and feature-packed experience. To utilize the new features in v1.x, we kindly invite you to consult our detailed 📚 migration guide, which will help you seamlessly transition your projects. Your support is invaluable, and we eagerly await your feedback!

demo image

Major features

  • Unified Benchmark

    We provide a unified benchmark toolbox for various semantic segmentation methods.

  • Modular Design

    We decompose the semantic segmentation framework into different components and one can easily construct a customized semantic segmentation framework by combining different modules.

  • Support of multiple methods out of box

    The toolbox directly supports popular and contemporary semantic segmentation frameworks, e.g. PSPNet, DeepLabV3, PSANet, DeepLabV3+, etc.

  • High efficiency

    The training speed is faster than or comparable to other codebases.

What's New

v1.2.0 was released on 10/12/2023, from 1.1.0 to 1.2.0, we have added or updated the following features:

Highlights

  • Support for the open-vocabulary semantic segmentation algorithm SAN

  • Support monocular depth estimation task, please refer to VPD and Adabins for more details.

    depth estimation

  • Add new projects: open-vocabulary semantic segmentation algorithm CAT-Seg, real-time semantic segmentation algofithm PP-MobileSeg

Installation

Please refer to get_started.md for installation and dataset_prepare.md for dataset preparation.

Get Started

Please see Overview for the general introduction of MMSegmentation.

Please see user guides for the basic usage of MMSegmentation. There are also advanced tutorials for in-depth understanding of mmseg design and implementation .

A Colab tutorial is also provided. You may preview the notebook here or directly run on Colab.

To migrate from MMSegmentation 0.x, please refer to migration.

Tutorial

MMSegmentation Tutorials
Get Started MMSeg Basic Tutorial MMSeg Detail Tutorial MMSeg Development Tutorial

Benchmark and model zoo

Results and models are available in the model zoo.

Overview
Supported backbones Supported methods Supported Head Supported datasets Other

Please refer to FAQ for frequently asked questions.

Projects

Here are some implementations of SOTA models and solutions built on MMSegmentation, which are supported and maintained by community users. These projects demonstrate the best practices based on MMSegmentation for research and product development. We welcome and appreciate all the contributions to OpenMMLab ecosystem.

Contributing

We appreciate all contributions to improve MMSegmentation. Please refer to CONTRIBUTING.md for the contributing guideline.

Acknowledgement

MMSegmentation is an open source project that welcome any contribution and feedback. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible as well as standardized toolkit to reimplement existing methods and develop their own new semantic segmentation methods.

Citation

If you find this project useful in your research, please consider cite:

@misc{mmseg2020,
    title={{MMSegmentation}: OpenMMLab Semantic Segmentation Toolbox and Benchmark},
    author={MMSegmentation Contributors},
    howpublished = {\url{https://github.com/open-mmlab/mmsegmentation}},
    year={2020}
}

License

This project is released under the Apache 2.0 license.

OpenMMLab Family

  • MMEngine: OpenMMLab foundational library for training deep learning models.
  • MMCV: OpenMMLab foundational library for computer vision.
  • MMPreTrain: OpenMMLab pre-training toolbox and benchmark.
  • MMagic: OpenMMLab Advanced, Generative and Intelligent Creation toolbox.
  • MMDetection: OpenMMLab detection toolbox and benchmark.
  • MMYOLO: OpenMMLab YOLO series toolbox and benchmark.
  • MMDetection3D: OpenMMLab's next-generation platform for general 3D object detection.
  • MMRotate: OpenMMLab rotated object detection toolbox and benchmark.
  • MMTracking: OpenMMLab video perception toolbox and benchmark.
  • MMSegmentation: OpenMMLab semantic segmentation toolbox and benchmark.
  • MMOCR: OpenMMLab text detection, recognition, and understanding toolbox.
  • MMPose: OpenMMLab pose estimation toolbox and benchmark.
  • MMHuman3D: OpenMMLab 3D human parametric model toolbox and benchmark.
  • MMFewShot: OpenMMLab fewshot learning toolbox and benchmark.
  • MMAction2: OpenMMLab's next-generation action understanding toolbox and benchmark.
  • MMFlow: OpenMMLab optical flow toolbox and benchmark.
  • MMDeploy: OpenMMLab Model Deployment Framework.
  • MMRazor: OpenMMLab model compression toolbox and benchmark.
  • MIM: MIM installs OpenMMLab packages.
  • Playground: A central hub for gathering and showcasing amazing projects built upon OpenMMLab.

mmsegmentation's People

Contributors

ai-tianlong avatar ben-louis avatar csatsurnh avatar daavoo avatar drcut avatar grimoire avatar hellock avatar jinxianwei avatar johnzja avatar junjun2016 avatar linfangjian01 avatar lkm2835 avatar masaaki-75 avatar matrixgame2018 avatar mengzhangli avatar meowzheng avatar mmeendez8 avatar provable0816 avatar rockeycoss avatar sennnnn avatar siddancha avatar sshuair avatar tianbinli avatar tianleishi avatar voyagerxvoyagerx avatar wangjiangben-hw avatar xiexinch avatar xvjiarui avatar yamengxi avatar zoulinx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mmsegmentation's Issues

how to define class_weights?

We know, the class imbalance is a common problem in semantic segmentation, where can I adjust the class weight of each class in this repo?

Error : Default process group is not initialized

Torch : 1.4.0

CUDA: 10.0

MMCV : 1.0.2

MMSEG: 0.5.0+1c3f547

small custom dataset

Config :

norm_cfg = dict(type='BN', requires_grad=True)

model = dict(
    type='CascadeEncoderDecoder',
    num_stages=2,
    pretrained='open-mmlab://msra/hrnetv2_w18',
    backbone=dict(
        type='HRNet',
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        norm_eval=False,
        extra=dict(
            stage1=dict(
                num_modules=1,
                num_branches=1,
                block='BOTTLENECK',
                num_blocks=(4, ),
                num_channels=(64, )),
            stage2=dict(
                num_modules=1,
                num_branches=2,
                block='BASIC',
                num_blocks=(4, 4),
                num_channels=(18, 36)),
            stage3=dict(
                num_modules=4,
                num_branches=3,
                block='BASIC',
                num_blocks=(4, 4, 4),
                num_channels=(18, 36, 72)),
            stage4=dict(
                num_modules=3,
                num_branches=4,
                block='BASIC',
                num_blocks=(4, 4, 4, 4),
                num_channels=(18, 36, 72, 144)))),
    decode_head=[
        dict(
            type='FCNHead',
            in_channels=[18, 36, 72, 144],
            channels=270,
            in_index=(0, 1, 2, 3),
            input_transform='resize_concat',
            kernel_size=1,
            num_convs=1,
            concat_input=False,
            dropout_ratio=-1,
            num_classes=8,
            norm_cfg=dict(type='SyncBN', requires_grad=True),
            align_corners=False,
            loss_decode=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
        dict(
            type='OCRHead',
            in_channels=[18, 36, 72, 144],
            in_index=(0, 1, 2, 3),
            input_transform='resize_concat',
            channels=512,
            ocr_channels=256,
            dropout_ratio=-1,
            num_classes=8,
            norm_cfg=dict(type='SyncBN', requires_grad=True),
            align_corners=False,
            loss_decode=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0))
    ])
train_cfg = dict()
test_cfg = dict(mode='whole')
dataset_type = 'Aircraft'
data_root = '/mmdetection_aircraft/data/segm/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (512, 1024)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations'),
    dict(type='Resize', img_scale=(1024, 768), ratio_range=(0.5, 2.0)),
    dict(type='RandomCrop', crop_size=(512, 384), cat_max_ratio=0.75),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size=(512, 384), pad_val=0, seg_pad_val=255),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1024, 768),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=5,
    workers_per_gpu=2,
    train=dict(
        type='Aircraft',
        data_root='/mmdetection_aircraft/data/segm/',
        img_dir='JPEGImages',
        ann_dir='SegmentationClass',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations'),
            dict(type='Resize', img_scale=(1024, 768), ratio_range=(0.5, 2.0)),
            dict(type='RandomCrop', crop_size=(512, 384), cat_max_ratio=0.75),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(type='PhotoMetricDistortion'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size=(512, 384), pad_val=0, seg_pad_val=255),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_semantic_seg'])
        ],
        split='train.txt'),
    val=dict(
        type='Aircraft',
        data_root='/mmdetection_aircraft/data/segm/',
        img_dir='JPEGImages',
        ann_dir='SegmentationClass',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1024, 768),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        split='val.txt'),
    test=dict(
        type='Aircraft',
        data_root='/mmdetection_aircraft/data/segm/',
        img_dir='JPEGImages',
        ann_dir='SegmentationClass',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1024, 768),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        split='val.txt'))
log_config = dict(
    interval=1, hooks=[dict(type='TextLoggerHook', by_epoch=False)])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = 'checkpoints/ocrnet_hr18_512x1024_40k_cityscapes_20200601_033320-401c5bdd.pth'
resume_from = None
workflow = [('train', 1)]
cudnn_benchmark = True
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
optimizer_config = dict()
lr_config = dict(policy='poly', power=0.9, min_lr=0.0001, by_epoch=False)
total_iters = 3
checkpoint_config = dict(by_epoch=False, interval=3)
evaluation = dict(interval=3, metric='mIoU')
work_dir = './work_dirs/tutorial'
seed = 0
gpu_ids = [0]

TRAIN MODEL :

model = build_segmentor(
    cfg.model, train_cfg=cfg.train_cfg, test_cfg=cfg.test_cfg)
model.CLASSES = datasets[0].CLASSES
mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
train_segmentor(model, datasets, cfg, distributed=False, validate=True, 
                meta=dict())

#FULL error description:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-16-fec2661e1f4c> in <module>
     16 mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
     17 train_segmentor(model, datasets, cfg, distributed=False, validate=True, 
---> 18                 meta=dict())

~/mmsegmentation/mmseg/apis/train.py in train_segmentor(model, dataset, cfg, distributed, validate, timestamp, meta)
    104     elif cfg.load_from:
    105         runner.load_checkpoint(cfg.load_from)
--> 106     runner.run(data_loaders, cfg.workflow, cfg.total_iters)

~/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py in run(self, data_loaders, workflow, max_iters, **kwargs)
    117                     if mode == 'train' and self.iter >= max_iters:
    118                         return
--> 119                     iter_runner(iter_loaders[i], **kwargs)
    120 
    121         time.sleep(1)  # wait for some hooks like loggers to finish

~/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py in train(self, data_loader, **kwargs)
     53         self.call_hook('before_train_iter')
     54         data_batch = next(data_loader)
---> 55         outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
     56         if not isinstance(outputs, dict):
     57             raise TypeError('model.train_step() must return a dict')

~/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py in train_step(self, *inputs, **kwargs)
     29 
     30         inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
---> 31         return self.module.train_step(*inputs[0], **kwargs[0])
     32 
     33     def val_step(self, *inputs, **kwargs):

~/mmsegmentation/mmseg/models/segmentors/base.py in train_step(self, data_batch, optimizer, **kwargs)
    147                 averaging the logs.
    148         """
--> 149         losses = self.forward_train(**data_batch, **kwargs)
    150         loss, log_vars = self._parse_losses(losses)
    151 

~/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py in forward_train(self, img, img_metas, gt_semantic_seg)
    150         """
    151 
--> 152         x = self.extract_feat(img)
    153 
    154         losses = dict()

~/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py in extract_feat(self, img)
     76     def extract_feat(self, img):
     77         """Extract features from images."""
---> 78         x = self.backbone(img)
     79         if self.with_neck:
     80             x = self.neck(x)

~/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

~/mmsegmentation/mmseg/models/backbones/hrnet.py in forward(self, x)
    512 
    513         x = self.conv1(x)
--> 514         x = self.norm1(x)
    515         x = self.relu(x)
    516         x = self.conv2(x)

~/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

~/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py in forward(self, input)
    456             if self.process_group:
    457                 process_group = self.process_group
--> 458             world_size = torch.distributed.get_world_size(process_group)
    459             need_sync = world_size > 1
    460 

~/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py in get_world_size(group)
    584         return -1
    585 
--> 586     return _get_group_size(group)
    587 
    588 

~/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py in _get_group_size(group)
    200     """
    201     if group is GroupMember.WORLD:
--> 202         _check_default_pg()
    203         return _default_pg.size()
    204     if group not in _pg_group_ranks:

~/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py in _check_default_pg()
    191     """
    192     assert _default_pg is not None, \
--> 193         "Default process group is not initialized"
    194 
    195 

AssertionError: Default process group is not initialized

cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

2020-08-14 21:26:59,786 - mmseg - INFO - Environment info:

sys.platform: linux
Python: 3.7.0 (default, Oct 9 2018, 10:31:47) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda-10.2
NVCC: Cuda compilation tools, release 10.2, V10.2.89
GPU 0,1,2,3,4,5,6: GeForce RTX 2080 Ti
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.5.0
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.1 Product Build 20200208 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 10.2
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.5
  • Magma 2.5.2
  • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.6.0a0+82fd1c8
OpenCV: 4.3.0
MMCV: 1.0.5
MMSegmentation: 0.5.1+381eacb
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.2

2020-08-14 21:26:59,787 - mmseg - INFO - Distributed training: True
2020-08-14 21:27:00,150 - mmseg - INFO - Config:
norm_cfg = dict(type='SyncBN', requires_grad=True)

2020-08-14 21:27:38,119 - mmseg - INFO - Loaded 2975 images
2020-08-14 21:27:38,678 - mmseg - INFO - Loaded 500 images
2020-08-14 21:27:38,678 - mmseg - INFO - Start running, host: rss@ps, work_dir: /home/rss/mmsegmentation/work_dirs/deeplabv3plus_r50-d8_769x769_40k_cityscapes
2020-08-14 21:27:38,678 - mmseg - INFO - workflow: [('train', 1)], max: 40000 iters
Traceback (most recent call last):
File "./tools/train.py", line 161, in
main()
File "./tools/train.py", line 157, in main
meta=meta)
File "/home/rss/mmsegmentation/mmseg/apis/train.py", line 106, in train_segmentor
runner.run(data_loaders, cfg.workflow, cfg.total_iters)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 119, in run
iter_runner(iter_loaders[i], **kwargs)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 55, in train
outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 36, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/home/rss/mmsegmentation/mmseg/models/segmentors/base.py", line 152, in train_step
losses = self(**data_batch)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
return old_func(*args, **kwargs)
File "/home/rss/mmsegmentation/mmseg/models/segmentors/base.py", line 122, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/home/rss/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 163, in forward_train
x, img_metas, gt_semantic_seg)
File "/home/rss/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 125, in _auxiliary_head_forward_train
x, img_metas, gt_semantic_seg, self.train_cfg)
File "/home/rss/mmsegmentation/mmseg/models/decode_heads/decode_head.py", line 185, in forward_train
seg_logits = self.forward(inputs)
File "/home/rss/mmsegmentation/mmseg/models/decode_heads/fcn_head.py", line 65, in forward
output = self.convs(x)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/cnn/bricks/conv_module.py", line 185, in forward
x = self.conv(x)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 349, in forward
return self._conv_forward(input, self.weight)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 346, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File "./tools/train.py", line 161, in
main()
File "./tools/train.py", line 157, in main
meta=meta)
File "/home/rss/mmsegmentation/mmseg/apis/train.py", line 106, in train_segmentor
runner.run(data_loaders, cfg.workflow, cfg.total_iters)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 119, in run
iter_runner(iter_loaders[i], **kwargs)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 55, in train
outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 36, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/home/rss/mmsegmentation/mmseg/models/segmentors/base.py", line 152, in train_step
losses = self(**data_batch)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
return old_func(*args, **kwargs)
File "/home/rss/mmsegmentation/mmseg/models/segmentors/base.py", line 122, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/home/rss/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 163, in forward_train
x, img_metas, gt_semantic_seg)
File "/home/rss/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 125, in _auxiliary_head_forward_train
x, img_metas, gt_semantic_seg, self.train_cfg)
File "/home/rss/mmsegmentation/mmseg/models/decode_heads/decode_head.py", line 185, in forward_train
seg_logits = self.forward(inputs)
File "/home/rss/mmsegmentation/mmseg/models/decode_heads/fcn_head.py", line 65, in forward
output = self.convs(x)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/cnn/bricks/conv_module.py", line 185, in forward
x = self.conv(x)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 349, in forward
return self._conv_forward(input, self.weight)
File "/home/rss/anaconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 346, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
单张显卡可以运行,多张显卡时会报这种错误?请问环境怎么解决?需要重新配置cuda10.1吗?

ModuleNotFoundError: No module named 'mmseg.version'

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. The bug has not been fixed in the latest version.

Describe the bug
A clear and concise description of what the bug is.

Reproduction

  1. What command or script did you run?
A placeholder for the command.
  1. Did you make any modifications on the code or config? Did you understand what you have modified?
  2. What dataset did you use?

Environment

  1. Please run python mmseg/utils/collect_env.py to collect necessary environment infomation and paste it here.
  2. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback
If applicable, paste the error trackback here.

A placeholder for trackback.

Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

when i run ./tools/dist_train.sh configs/fcn/fcn_r50-d8_512x512_40k_voc12aug.py 4

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. The bug has not been fixed in the latest version.

Describe the bug
A clear and concise description of what the bug is.

Reproduction

  1. What command or script did you run?
A placeholder for the command.
  1. Did you make any modifications on the code or config? Did you understand what you have modified?
  2. What dataset did you use?

Environment

  1. Please run python mmseg/utils/collect_env.py to collect necessary environment infomation and paste it here.
  2. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback
If applicable, paste the error trackback here.

A placeholder for trackback.

Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

FCN's difference between code and paper

FCN's implement of mmseg concatenate feature using one block feature, but multi blocks feature is concatenated in FPN of paper.
this is key point of FCN, and its different between code and paper

AssertionError: Default process group is not initialized

Describe the bug
python tools/train.py configs/danet/danet_r50-d8_512x1024_40k_cityscapes.py. I get an error when using custom data for model training, AssertionError: Default process group is not initialized.
GPU now has two target detection networks running, is this the reason? mmdetection can train multiple networks simultaneously.

Environment info
sys.platform: linux
Python: 3.7.7 (default, Mar 23 2020, 22:36:06) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.1, V10.1.243
GPU 0: Tesla V100-PCIE-32GB
GCC: gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0
PyTorch: 1.5.0
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 10.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.3
  • Magma 2.5.2
  • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.6.0a0+82fd1c8
OpenCV: 4.2.0
MMCV: 1.0.2
MMSegmentation: 0.5.0+b72a6d0
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1

segmentation result image

I want to know where I can change the overlay effect of the segmentation result image and the original image, I just want to change the background color to black.

model zoo

Your repo is great.

I found that in Model ZOO, you gave models of different Lr SCHD(20000, 40000, 80000 and 160000). I have some problem.

  1. Why do you give models of different of Lr SCHD instead of the best one?

  2. If I want to use the models as pretrained model , which model should I choose?

  3. Why did you use steps to design LR SCHD instead of epochs?

Roadmap of MMSegmentation

We keep this issue open to collect feature requests from users and hear your voice. Our monthly release plan is also available here.

You can either:

  1. Suggest a new feature by leaving a comment.
  2. Vote for a feature request with 👍 or be against with 👎. (Remember that developers are busy and cannot respond to all feature requests, so vote for your most favorable one!)
  3. Tell us that you would like to help implement one of the features in the list or review the PRs. (This is the greatest things to hear about!)

V0.6 (August)

  • ResNeSt. (#47)
  • Semantic FPN. (#35)
  • PointRend. (#35)
  • FastSCNN. (#58)
  • DNL. (#37)
  • ONNX export. (#12)

Question abound normalize mean and std

I found VOC mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375] , the range is not 0-1, why not image divide 255???

Thanks for answering my question!!

AssertionError: Default process group is not initialized

when i train the model with one GPU, i meet the problem below. can anybody solve this? thanks

Traceback (most recent call last):
File "tools/train.py", line 161, in
main()
File "tools/train.py", line 157, in main
meta=meta)
File "/home/zkyd/anaconda3/envs/mmseg/lib/python3.7/site-packages/mmseg/apis/train.py", line 106, in train_segmentor
runner.run(data_loaders, cfg.workflow, cfg.total_iters)
File "/home/zkyd/anaconda3/envs/mmseg/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 119, in run
iter_runner(iter_loaders[i], **kwargs)
File "/home/zkyd/anaconda3/envs/mmseg/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 55, in train
outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
File "/home/zkyd/anaconda3/envs/mmseg/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 31, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/home/zkyd/anaconda3/envs/mmseg/lib/python3.7/site-packages/mmseg/models/segmentors/base.py", line 149, in train_step
losses = self.forward_train(**data_batch, **kwargs)
File "/home/zkyd/anaconda3/envs/mmseg/lib/python3.7/site-packages/mmseg/models/segmentors/encoder_decoder.py", line 152, in forward_train
x = self.extract_feat(img)
File "/home/zkyd/anaconda3/envs/mmseg/lib/python3.7/site-packages/mmseg/models/segmentors/encoder_decoder.py", line 78, in extract_feat
x = self.backbone(img)
File "/home/zkyd/anaconda3/envs/mmseg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zkyd/anaconda3/envs/mmseg/lib/python3.7/site-packages/mmseg/models/backbones/resnet.py", line 635, in forward
x = self.stem(x)
File "/home/zkyd/anaconda3/envs/mmseg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zkyd/anaconda3/envs/mmseg/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/zkyd/anaconda3/envs/mmseg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zkyd/anaconda3/envs/mmseg/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 458, in forward
world_size = torch.distributed.get_world_size(process_group)
File "/home/zkyd/anaconda3/envs/mmseg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 586, in get_world_size
return _get_group_size(group)
File "/home/zkyd/anaconda3/envs/mmseg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 202, in _get_group_size
_check_default_pg()
File "/home/zkyd/anaconda3/envs/mmseg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 193, in _check_default_pg
"Default process group is not initialized"
AssertionError: Default process group is not initialized

OSError: [Errno 12] Cannot allocate memory

when i train cityscapes with single GPU 1080ti,encounter this error when iters=4000 and iters==8000
OSError: [Errno 12] Cannot allocate memory
I think should decrease Bath_size.So if i need to change 1img/perGPU ,how to do that? And, how to set total Epoch instead of total_iters = 40000

How to shuffle my training data?

I want to shuffle my data when training.
In pytorch I just add "shuffle=True" in TrainDataloader() to shuffle my data.
Is there any convenient way to do this in mmsegmentation?

RuntimeError: The size of tensor a (125) must match the size of tensor b (128) at non-singleton dimension 2

A RuntimeError happened when I tried using newest fast_scnn to infer on my own dataset.
The error has never happened when I was using other models in this repository on the same images.

Here is the Traceback:
Traceback (most recent call last):
File "image_inference_box.py", line 116, in
main()
File "image_inference_box.py", line 42, in main
result = inference_segmentor(model, img)
File "/home/lzhpc/mmsegmentation-master/mmseg/apis/inference.py", line 95, in inference_segmentor
result = model(return_loss=False, rescale=True, **data)
File "/home/lzhpc/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/lzhpc/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
return old_func(*args, **kwargs)
File "/home/lzhpc/mmsegmentation-master/mmseg/models/segmentors/base.py", line 124, in forward
return self.forward_test(img, img_metas, **kwargs)
File "/home/lzhpc/mmsegmentation-master/mmseg/models/segmentors/base.py", line 106, in forward_test
return self.simple_test(imgs[0], img_metas[0], **kwargs)
File "/home/lzhpc/mmsegmentation-master/mmseg/models/segmentors/encoder_decoder.py", line 261, in simple_test
seg_logit = self.inference(img, img_meta, rescale)
File "/home/lzhpc/mmsegmentation-master/mmseg/models/segmentors/encoder_decoder.py", line 246, in inference
seg_logit = self.whole_inference(img, img_meta, rescale)
File "/home/lzhpc/mmsegmentation-master/mmseg/models/segmentors/encoder_decoder.py", line 213, in whole_inference
seg_logit = self.encode_decode(img, img_meta)
File "/home/lzhpc/mmsegmentation-master/mmseg/models/segmentors/encoder_decoder.py", line 87, in encode_decode
x = self.extract_feat(img)
File "/home/lzhpc/mmsegmentation-master/mmseg/models/segmentors/encoder_decoder.py", line 79, in extract_feat
x = self.backbone(img)
File "/home/lzhpc/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/lzhpc/mmsegmentation-master/mmseg/models/backbones/fast_scnn.py", line 381, in forward
lower_res_features)
File "/home/lzhpc/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/lzhpc/mmsegmentation-master/mmseg/models/backbones/fast_scnn.py", line 249, in forward
out = higher_res_feature + lower_res_feature
RuntimeError: The size of tensor a (125) must match the size of tensor b (128) at non-singleton dimension 2

ModuleNotFoundError: No module named 'mmseg' ——————I change the source root but useless

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. The bug has not been fixed in the latest version.

Describe the bug
A clear and concise description of what the bug is.

Reproduction

  1. What command or script did you run?
A placeholder for the command.
  1. Did you make any modifications on the code or config? Did you understand what you have modified?
  2. What dataset did you use?

Environment

  1. Please run python mmseg/utils/collect_env.py to collect necessary environment infomation and paste it here.
  2. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback
If applicable, paste the error trackback here.

A placeholder for trackback.

Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

when i run ./tools/dist_train.sh configs/fcn/fcn_r50-d8_512x512_40k_voc12aug.py 4 a error RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1587428091666/work/torch/lib/c10d/ProcessGroupNCCL.cpp:514, unhandled system error, NCCL version 2.4.8

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. The bug has not been fixed in the latest version.

Describe the bug
A clear and concise description of what the bug is.

Reproduction

  1. What command or script did you run?
A placeholder for the command.
  1. Did you make any modifications on the code or config? Did you understand what you have modified?
  2. What dataset did you use?

Environment

  1. Please run python mmseg/utils/collect_env.py to collect necessary environment infomation and paste it here.
  2. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback
If applicable, paste the error trackback here.

A placeholder for trackback.

Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

crop_size and img_scale order

In base dataset-related configs you usually have something like, e.g. in cityscapes

crop_size = (512, 1024)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations'),
    dict(type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 2.0)),

does it mean that crop_size has height-width order while img_scale in Resize has width-height order as cityscapes is obviously has the width equal to ~2x of the height and crops are expected to have similar ratio?

Train

你好,我想知道我是只有一张显卡应该如何训练,--resume-from ${CHECKPOINT_FILE}: Resume from a previous checkpoint file.这是要我先下载预训练模型吗?我该如何使用不同的网络结构进行不同的训练呢?

speed benchmark

Hello,
in your docs the training speed of deeplabv3plus is 0.85 s per iter. Is this test using single GPUs or using distributed training? And the batch size is 2. Does it mean samples_per_gpu=2? or just use single GPU with 2 images?
I would appreciate it if you could provide more details.

Loss turns to NAN suddenly with FP16+AdamW

I'm training an unet for a one-class segmentation task using a medical dataset of 10K+ images.
During training, the AdamW optimizer was used for quicker convergence with FP16. The loss function is the weighted sum of BCE and dice-loss with 1.0 and 0.5 coefficients.
For the first half of the training, everything looked fine. Then suddenly the loss turns to NAN, like the following log snippet.

2020-08-13 22:55:06,554 - mmseg - INFO - Iter [11264/35238] lr: 7.217e-04, mIoU: 0.6905, mAcc: 0.7772, aAcc: 0.9943, mDice: 0.7553
2020-08-13 22:55:30,333 - mmseg - INFO - Iter [11300/35238] lr: 7.208e-04, eta: 5:24:59, time: 1.966, data_time: 1.318, memory: 5422, decode.loss_seg: 0.1299, decode.acc_seg: 97.8018, loss: 0.1299
2020-08-13 22:55:55,553 - mmseg - INFO - Iter [11350/35238] lr: 7.196e-04, eta: 5:23:46, time: 0.505, data_time: 0.013, memory: 5422, decode.loss_seg: 0.1319, decode.acc_seg: 97.8515, loss: 0.1319
2020-08-13 22:56:20,822 - mmseg - INFO - Iter [11400/35238] lr: 7.183e-04, eta: 5:22:33, time: 0.505, data_time: 0.012, memory: 5422, decode.loss_seg: 0.1351, decode.acc_seg: 97.8480, loss: 0.1351
2020-08-13 22:56:52,062 - mmseg - INFO - Iter [11450/35238] lr: 7.170e-04, eta: 5:21:33, time: 0.625, data_time: 0.012, memory: 5422, decode.loss_seg: 0.1291, decode.acc_seg: 97.8107, loss: 0.1291
2020-08-13 22:57:17,234 - mmseg - INFO - Iter [11500/35238] lr: 7.158e-04, eta: 5:20:20, time: 0.503, data_time: 0.012, memory: 5422, decode.loss_seg: 0.1296, decode.acc_seg: 97.8579, loss: 0.1296
2020-08-13 22:57:42,480 - mmseg - INFO - Iter [11550/35238] lr: 7.145e-04, eta: 5:19:09, time: 0.505, data_time: 0.013, memory: 5422, decode.loss_seg: 0.1306, decode.acc_seg: 97.8200, loss: 0.1306
2020-08-13 22:58:07,794 - mmseg - INFO - Iter [11600/35238] lr: 7.133e-04, eta: 5:17:57, time: 0.506, data_time: 0.013, memory: 5422, decode.loss_seg: 0.1314, decode.acc_seg: 97.8134, loss: 0.1314
2020-08-13 22:58:38,951 - mmseg - INFO - Iter [11650/35238] lr: 7.120e-04, eta: 5:16:58, time: 0.623, data_time: 0.013, memory: 5422, decode.loss_seg: 0.1318, decode.acc_seg: 97.7512, loss: 0.1318
2020-08-13 22:59:04,141 - mmseg - INFO - Iter [11700/35238] lr: 7.107e-04, eta: 5:15:48, time: 0.504, data_time: 0.013, memory: 5422, decode.loss_seg: 0.1347, decode.acc_seg: 97.8318, loss: 0.1347
2020-08-13 22:59:29,405 - mmseg - INFO - Iter [11750/35238] lr: 7.095e-04, eta: 5:14:38, time: 0.505, data_time: 0.013, memory: 5422, decode.loss_seg: 0.1291, decode.acc_seg: 97.9076, loss: 0.1291
2020-08-13 23:00:00,575 - mmseg - INFO - Iter [11800/35238] lr: 7.082e-04, eta: 5:13:39, time: 0.623, data_time: 0.012, memory: 5422, decode.loss_seg: 0.1274, decode.acc_seg: 97.7749, loss: 0.1274
2020-08-13 23:00:25,718 - mmseg - INFO - Iter [11850/35238] lr: 7.069e-04, eta: 5:12:30, time: 0.503, data_time: 0.013, memory: 5422, decode.loss_seg: 0.1324, decode.acc_seg: 97.8026, loss: 0.1324
2020-08-13 23:00:51,036 - mmseg - INFO - Iter [11900/35238] lr: 7.057e-04, eta: 5:11:21, time: 0.506, data_time: 0.012, memory: 5422, decode.loss_seg: 0.1321, decode.acc_seg: 97.8912, loss: 0.1321
2020-08-13 23:01:16,244 - mmseg - INFO - Iter [11950/35238] lr: 7.044e-04, eta: 5:10:12, time: 0.504, data_time: 0.013, memory: 5422, decode.loss_seg: 0.1269, decode.acc_seg: 97.7969, loss: 0.1269
2020-08-13 23:02:12,590 - mmseg - INFO - per class results:
Class IoU Acc Dice
target-class 67.54 85.13 75.49
Summary:
Scope mIoU mAcc aAcc mDice
global 67.54 85.13 99.33 75.49

2020-08-13 23:02:12,629 - mmseg - INFO - Iter [11968/35238] lr: 7.040e-04, mIoU: 0.6754, mAcc: 0.8513, aAcc: 0.9933, mDice: 0.7549
2020-08-13 23:02:34,288 - mmseg - INFO - Exp name: unet_r34-d32_5bridge_decx1.5_NL000_att0_449x449_200ep_pneumonia_t4v10_fp16.py
2020-08-13 23:02:34,288 - mmseg - INFO - Iter [12000/35238] lr: 7.031e-04, eta: 5:11:43, time: 2.156, data_time: 1.492, memory: 5422, decode.loss_seg: 0.1338, decode.acc_seg: 97.9035, loss: 0.1338
2020-08-13 23:02:59,462 - mmseg - INFO - Iter [12050/35238] lr: 7.019e-04, eta: 5:10:34, time: 0.504, data_time: 0.012, memory: 5422, decode.loss_seg: 0.1297, decode.acc_seg: 97.8303, loss: 0.1297
2020-08-13 23:03:24,521 - mmseg - INFO - Iter [12100/35238] lr: 7.006e-04, eta: 5:09:25, time: 0.501, data_time: 0.012, memory: 5422, decode.loss_seg: 0.1319, decode.acc_seg: 97.7559, loss: 0.1319
2020-08-13 23:03:55,583 - mmseg - INFO - Iter [12150/35238] lr: 6.993e-04, eta: 5:08:27, time: 0.621, data_time: 0.012, memory: 5422, decode.loss_seg: 0.1323, decode.acc_seg: 97.8359, loss: 0.1323
2020-08-13 23:04:20,744 - mmseg - INFO - Iter [12200/35238] lr: 6.981e-04, eta: 5:07:19, time: 0.503, data_time: 0.013, memory: 5422, decode.loss_seg: 0.1317, decode.acc_seg: 97.6793, loss: 0.1317
2020-08-13 23:04:45,974 - mmseg - INFO - Iter [12250/35238] lr: 6.968e-04, eta: 5:06:11, time: 0.505, data_time: 0.012, memory: 5422, decode.loss_seg: 0.1338, decode.acc_seg: 97.8534, loss: 0.1338
2020-08-13 23:05:11,236 - mmseg - INFO - Iter [12300/35238] lr: 6.956e-04, eta: 5:05:04, time: 0.505, data_time: 0.013, memory: 5422, decode.loss_seg: 0.1258, decode.acc_seg: 97.9049, loss: 0.1258
2020-08-13 23:05:42,341 - mmseg - INFO - Iter [12350/35238] lr: 6.943e-04, eta: 5:04:08, time: 0.622, data_time: 0.013, memory: 5422, decode.loss_seg: 0.1279, decode.acc_seg: 97.8606, loss: 0.1279
2020-08-13 23:06:07,630 - mmseg - INFO - Iter [12400/35238] lr: 6.930e-04, eta: 5:03:01, time: 0.506, data_time: 0.013, memory: 5422, decode.loss_seg: 0.1298, decode.acc_seg: 97.8347, loss: 0.1298
2020-08-13 23:06:32,892 - mmseg - INFO - Iter [12450/35238] lr: 6.918e-04, eta: 5:01:55, time: 0.505, data_time: 0.013, memory: 5422, decode.loss_seg: 0.1271, decode.acc_seg: 97.8194, loss: 0.1271
2020-08-13 23:07:04,092 - mmseg - INFO - Iter [12500/35238] lr: 6.905e-04, eta: 5:00:59, time: 0.624, data_time: 0.012, memory: 5422, decode.loss_seg: 0.1293, decode.acc_seg: 97.8613, loss: 0.1293
2020-08-13 23:07:29,326 - mmseg - INFO - Iter [12550/35238] lr: 6.892e-04, eta: 4:59:54, time: 0.505, data_time: 0.013, memory: 5422, decode.loss_seg: 0.1281, decode.acc_seg: 97.8741, loss: 0.1281
2020-08-13 23:07:54,609 - mmseg - INFO - Iter [12600/35238] lr: 6.879e-04, eta: 4:58:48, time: 0.506, data_time: 0.013, memory: 5422, decode.loss_seg: nan, decode.acc_seg: 97.7602, loss: nan
2020-08-13 23:08:19,955 - mmseg - INFO - Iter [12650/35238] lr: 6.867e-04, eta: 4:57:43, time: 0.507, data_time: 0.013, memory: 5422, decode.loss_seg: nan, decode.acc_seg: 97.8386, loss: nan
2020-08-13 23:09:18,275 - mmseg - INFO - per class results:
Class IoU Acc Dice
target-class 0.00 0.00 5.56
Summary:
Scope mIoU mAcc aAcc mDice
global 0.00 0.00 98.37 5.56

2020-08-13 23:09:18,313 - mmseg - INFO - Iter [12672/35238] lr: 6.861e-04, mIoU: 0.0000, mAcc: 0.0000, aAcc: 0.9837, mDice: 0.0556

In the optimizer config, I even used gradient clipping like follows.

optimizer = dict(
    type='AdamW',
    lr=0.001,
    weight_decay=0.0005,
    paramwise_cfg=dict(custom_keys=dict(head=dict(lr_mult=4.0))))
optimizer_config = dict(
    type='Fp16OptimizerHook',
    loss_scale=512.0,
    grad_clip=dict(max_norm=10, norm_type=2))
lr_config = dict(policy='poly', power=0.9, min_lr=5e-05, by_epoch=False)

Hi. in another related thread, @xvjiarui gave two suggestions, which I do not think quite apply for my case. 1. warmup does not apply because my model converged fine for 10K iters. 2. The current lr was quite low in my settings although I can try lower. But what if that doesn't work?

Thanks.

How to stop downloading .pth files

Hello, I would like to ask, when I execute the train.py file, it will automatically download the .pth file. If I manually download the file and put it in the corresponding location, how should I stop the download that comes with the system, and Come to read in the .pth file I manually put in, thanks.

where is "/nfs/xxxx/psp_r50_512x1024_40ki_cityscapes"? and what does it mean? when i run --line 14: srun: command not found

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. The bug has not been fixed in the latest version.

Describe the bug
A clear and concise description of what the bug is.

Reproduction

  1. What command or script did you run?
A placeholder for the command.
  1. Did you make any modifications on the code or config? Did you understand what you have modified?
  2. What dataset did you use?

Environment

  1. Please run python mmseg/utils/collect_env.py to collect necessary environment infomation and paste it here.
  2. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback
If applicable, paste the error trackback here.

A placeholder for trackback.

Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

Train my own data, and when the validate is over, the program gets stuck.

Why I am training my own data, and when the verification is over, the program gets stuck.
Like this.

2020-08-13 15:19:04,379 - mmseg - INFO - Iter [1950/20000] lr: 9.127e-03, eta: 11:16:39, time: 2.096, data_time: 0.001, memory: 25119, decode.loss_seg: 0.0001, decode.acc_seg: 0.8459, aux.loss_seg: 0.0001, aux.acc_seg: 0.8459, loss: 0.0002
2020-08-13 15:20:50,742 - mmseg - INFO - Saving checkpoint at 2000 iterations
[>>>>>>>>>>> ] 1873/7811, 1.8 task/s, elapsed: 1069[ ] 1874/7811,[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 7811/7811, 1.7 task/s, elapsed: 4545s, ETA: 0s

^CTraceback (most recent call last):
File "tools/train.py", line 157, in
main()
File "tools/train.py", line 153, in main
meta=meta)

support for fp16 training

Describe the feature
FP16 training

Motivation
FP16 training facilitates faster training. Apex is recommended.
Using default FP32, the possible batch size is small leading to slower training with possible suboptimal performance.

Related resources
https://github.com/NVIDIA/apex

Additional context
No

loss goes to NAN

GPU: NVIDIA 1080TI
I defined a custom datasets with 2 classes(backgroud, class1), i train the datasets with OCRnet,the config like below:

_base_ = [
    '../_base_/models/ocrnet_hr18.py', 
    '../_base_/datasets/mycustom.py',
    '../_base_/default_runtime.py', 
    '../_base_/schedules/schedule_120k.py'
]
norm_cfg = dict(type='SyncBN', requires_grad=True)
model = dict(decode_head=[
    dict(
        type='FCNHead',
        in_channels=[18, 36, 72, 144],
        channels=sum([18, 36, 72, 144]),
        in_index=(0, 1, 2, 3),
        input_transform='resize_concat',
        kernel_size=1,
        num_convs=1,
        concat_input=False,
        dropout_ratio=-1,
        num_classes=2,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
    dict(
        type='OCRHead',
        in_channels=[18, 36, 72, 144],
        in_index=(0, 1, 2, 3),
        input_transform='resize_concat',
        channels=512,
        ocr_channels=256,
        dropout_ratio=-1,
        num_classes=2,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
])
# batch size
data = dict(samples_per_gpu=2, workers_per_gpu=2)
test_cfg = dict(mode='whole')
work_dir = './work_dirs/ocrnet_hr18_1024x1024_120k'
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)

but, i got a problem which decode_0.loss and total loss is NAN after a few iters:

2020-07-23 10:00:23,185 - mmseg - INFO - Iter [3100/120000]	lr: 9.770e-03, eta: 1 day, 15:08:41, time: 1.217, data_time: 0.012, memory: 10420, decode_0.loss_seg: 0.0002, decode_0.acc_seg: 63.5359, decode_1.loss_seg: 0.0001, decode_1.acc_seg: 63.5359, loss: 0.0002
2020-07-23 10:01:21,304 - mmseg - INFO - Iter [3150/120000]	lr: 9.766e-03, eta: 1 day, 15:06:21, time: 1.162, data_time: 0.012, memory: 10420, decode_0.loss_seg: 0.0002, decode_0.acc_seg: 64.1833, decode_1.loss_seg: 0.0001, decode_1.acc_seg: 64.1833, loss: 0.0003
2020-07-23 10:02:21,155 - mmseg - INFO - Iter [3200/120000]	lr: 9.762e-03, eta: 1 day, 15:05:07, time: 1.197, data_time: 0.012, memory: 10420, decode_0.loss_seg: 0.0002, decode_0.acc_seg: 65.1630, decode_1.loss_seg: 0.0003, decode_1.acc_seg: 65.1630, loss: 0.0005
2020-07-23 10:03:23,215 - mmseg - INFO - Iter [3250/120000]	lr: 9.758e-03, eta: 1 day, 15:05:12, time: 1.241, data_time: 0.011, memory: 10420, decode_0.loss_seg: nan, decode_0.acc_seg: 10.9057, decode_1.loss_seg: nan, decode_1.acc_seg: 18.0268, loss: nan
2020-07-23 10:04:24,325 - mmseg - INFO - Iter [3300/120000]	lr: 9.755e-03, eta: 1 day, 15:04:41, time: 1.222, data_time: 0.011, memory: 10420, decode_0.loss_seg: nan, decode_0.acc_seg: 4.1666, decode_1.loss_seg: nan, decode_1.acc_seg: 10.4796, loss: nan
2020-07-23 10:05:25,908 - mmseg - INFO - Iter [3350/120000]	lr: 9.751e-03, eta: 1 day, 15:04:27, time: 1.232, data_time: 0.013, memory: 10420, decode_0.loss_seg: nan, decode_0.acc_seg: 3.5404, decode_1.loss_seg: nan, decode_1.acc_seg: 18.5238, loss: nan
2020-07-23 10:06:27,386 - mmseg - INFO - Iter [3400/120000]	lr: 9.747e-03, eta: 1 day, 15:04:07, time: 1.230, data_time: 0.012, memory: 10420, decode_0.loss_seg: nan, decode_0.acc_seg: 3.5969, decode_1.loss_seg: nan, decode_1.acc_seg: 13.0671, loss: nan
2020-07-23 10:07:27,093 - mmseg - INFO - Iter [3450/120000]	lr: 9.744e-03, eta: 1 day, 15:02:45, time: 1.193, data_time: 0.012, memory: 10420, decode_0.loss_seg: nan, decode_0.acc_seg: 4.8962, decode_1.loss_seg: nan, decode_1.acc_seg: 11.4185, loss: nan

TypeError: forward_test() missing 1 required positional argument: 'imgs'

Thanks for your work. I want to observe the performance of the model during train. So I changed the workflow from [('train', 1)] to [('train', 10), ('val', 1)] in default_runtime.py. The 'val' of 'IterBasedRunner' will call 'forward_test'. The 'forward_test' require positional argument: 'imgs'. But data_batch doesn't have 'imgs' key. What should I do?

CustomDataset problem

@hi~
My Dataset path and structure

├── data
│   ├── road
│   │   ├── img_dir
│   │   │   ├── train
│   │   │   │   ├── img-2_gray_0_0.png
│   │   │   │   ├── img-2_gray_0_1.png
│   │   │   │   ├── img-2_gray_0_2.png
│   │   │   ├── val
│   │   ├── ann_dir
│   │   │   ├── train
│   │   │   │   ├── img-2_gray_0_0.png
│   │   │   │   ├── img-2_gray_0_1.png
│   │   │   │   ├── img-2_gray_0_2.png
│   │   │   ├── val

I'm reuse the configs-file in folder configs/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes.py and modified as follow :

_base_ = [
    '../_base_/models/pspnet_r50-d8.py', **'../_base_/datasets/road.py'**,
    '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py'
]

'../base/datasets/road.py' is create as follow:

# dataset settings
dataset_type = 'CustomDataset'
data_root = 'data/road/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (512, 512)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations'),
    dict(type='Resize', img_scale=(512, 512), ratio_range=(0.5, 2.0)),
    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(512, 512),
        # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='img_dir/train',
        ann_dir='ann_dir/train',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='img_dir/val',
        ann_dir='ann_dir/val',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='img_dir/test',
        ann_dir='ann_dir/test',
        pipeline=test_pipeline))

mmseg/datasets/custom.py setting as follow:

import os.path as osp
from functools import reduce

import mmcv
import numpy as np
from mmcv.utils import print_log
from torch.utils.data import Dataset

from mmseg.core import mean_iou
from mmseg.utils import get_root_logger
from .builder import DATASETS
from .pipelines import Compose


@DATASETS.register_module()
class CustomDataset(Dataset):
    """Custom dataset for semantic segmentation.

    An example of file structure is as followed.

    .. code-block:: none

        ├── data
        │   ├── my_dataset
        │   │   ├── img_dir
        │   │   │   ├── train
        │   │   │   │   ├── xxx{img_suffix}
        │   │   │   │   ├── yyy{img_suffix}
        │   │   │   │   ├── zzz{img_suffix}
        │   │   │   ├── val
        │   │   ├── ann_dir
        │   │   │   ├── train
        │   │   │   │   ├── xxx{seg_map_suffix}
        │   │   │   │   ├── yyy{seg_map_suffix}
        │   │   │   │   ├── zzz{seg_map_suffix}
        │   │   │   ├── val

    The img/gt_semantic_seg pair of CustomDataset should be of the same
    except suffix. A valid img/gt_semantic_seg filename pair should be like
    ``xxx{img_suffix}`` and ``xxx{seg_map_suffix}`` (extension is also included
    in the suffix). If split is given, then ``xxx`` is specified in txt file.
    Otherwise, all files in ``img_dir/``and ``ann_dir`` will be loaded.
    Please refer to ``docs/tutorials/new_dataset.md`` for more details.


    Args:
        pipeline (list[dict]): Processing pipeline
        img_dir (str): Path to image directory
        img_suffix (str): Suffix of images. Default: '.png'
        ann_dir (str, optional): Path to annotation directory. Default: None
        seg_map_suffix (str): Suffix of segmentation maps. Default: '.png'
        split (str, optional): Split txt file. If split is specified, only
            file with suffix in the splits will be loaded. Otherwise, all
            images in img_dir/ann_dir will be loaded. Default: None
        data_root (str, optional): Data root for img_dir/ann_dir. Default:
            None.
        test_mode (bool): If test_mode=True, gt wouldn't be loaded.
        ignore_index (int): The label index to be ignored. Default: 255
        reduce_zero_label (bool): Whether to mark label zero as ignored.
            Default: False
    """

    CLASSES = ('background', 'road')

    PALETTE = [[0, 0, 0], [255, 255, 255]]

    def __init__(self,
                 pipeline,
                 img_dir,
                 img_suffix='.png',
                 ann_dir=None,
                 seg_map_suffix='.png',
                 split=None,
                 data_root=None,
                 test_mode=False,
                 ignore_index=255,
                 reduce_zero_label=False):
        self.pipeline = Compose(pipeline)
        self.img_dir = img_dir
        self.img_suffix = img_suffix
        self.ann_dir = ann_dir
        self.seg_map_suffix = seg_map_suffix
        self.split = split
        self.data_root = data_root
        self.test_mode = test_mode
        self.ignore_index = ignore_index
        self.reduce_zero_label = reduce_zero_label

        # join paths if data_root is specified
        # self.img_dir = osp.join(self.data_root, self.img_dir)
        # self.ann_dir = osp.join(self.data_root, self.ann_dir)
        if self.data_root is not None:
            if not osp.isabs(self.img_dir):
                self.img_dir = osp.join(self.data_root, self.img_dir)
                print(self.img_dir)
            if not (self.ann_dir is None or osp.isabs(self.ann_dir)):
                self.ann_dir = osp.join(self.data_root, self.ann_dir)
            if not (self.split is None or osp.isabs(self.split)):
                self.split = osp.join(self.data_root, self.split)
        # print(self.img_dir,'***************************')
        # load annotations
        self.img_infos = self.load_annotations(self.img_dir, self.img_suffix,
                                               self.ann_dir,
                                               self.seg_map_suffix, self.split)

    def __len__(self):
        """Total number of samples of data."""
        return len(self.img_infos)

    def load_annotations(self, img_dir, img_suffix, ann_dir, seg_map_suffix,
                         split):
        """Load annotation from directory.

        Args:
            img_dir (str): Path to image directory
            img_suffix (str): Suffix of images.
            ann_dir (str|None): Path to annotation directory.
            seg_map_suffix (str|None): Suffix of segmentation maps.
            split (str|None): Split txt file. If split is specified, only file
                with suffix in the splits will be loaded. Otherwise, all images
                in img_dir/ann_dir will be loaded. Default: None

        Returns:
            list[dict]: All image info of dataset.
        """

        img_infos = []
        if split is not None:
            with open(split) as f:
                for line in f:
                    img_name = line.strip()
                    img_file = osp.join(img_dir, img_name + img_suffix)
                    img_info = dict(filename=img_file)
                    if ann_dir is not None:
                        seg_map = osp.join(ann_dir, img_name + seg_map_suffix)
                        img_info['ann'] = dict(seg_map=seg_map)
                    img_infos.append(img_info)
        else:
            for img in mmcv.scandir(img_dir, img_suffix, recursive=True):
                img_file = osp.join(img_dir, img)
                img_info = dict(filename=img_file)
                if ann_dir is not None:
                    seg_map = osp.join(ann_dir,
                                       img.replace(img_suffix, seg_map_suffix))
                    img_info['ann'] = dict(seg_map=seg_map)
                img_infos.append(img_info)

        print_log(f'Loaded {len(img_infos)} images', logger=get_root_logger())
        return img_infos

    def get_ann_info(self, idx):
        """Get annotation by index.

        Args:
            idx (int): Index of data.

        Returns:
            dict: Annotation info of specified index.
        """

        return self.img_infos[idx]['ann']

    def pre_pipeline(self, results):
        """Prepare results dict for pipeline."""
        results['seg_fields'] = []

    def __getitem__(self, idx):
        """Get training/test data after pipeline.

        Args:
            idx (int): Index of data.

        Returns:
            dict: Training/test data (with annotation if `test_mode` is set
                False).
        """

        if self.test_mode:
            return self.prepare_test_img(idx)
        else:
            return self.prepare_train_img(idx)

    def prepare_train_img(self, idx):
        """Get training data and annotations after pipeline.

        Args:
            idx (int): Index of data.

        Returns:
            dict: Training data and annotation after pipeline with new keys
                introduced by pipeline.
        """

        img_info = self.img_infos[idx]
        ann_info = self.get_ann_info(idx)
        results = dict(img_info=img_info, ann_info=ann_info)
        self.pre_pipeline(results)
        return self.pipeline(results)

    def prepare_test_img(self, idx):
        """Get testing data after pipeline.

        Args:
            idx (int): Index of data.

        Returns:
            dict: Testing data after pipeline with new keys intorduced by
                piepline.
        """

        img_info = self.img_infos[idx]
        results = dict(img_info=img_info)
        self.pre_pipeline(results)
        return self.pipeline(results)

    def format_results(self, results, **kwargs):
        """Place holder to format result to dataset specific output."""
        pass

    def get_gt_seg_maps(self):
        """Get ground truth segmentation maps for evaluation."""
        gt_seg_maps = []
        for img_info in self.img_infos:
            gt_seg_map = mmcv.imread(
                img_info['ann']['seg_map'], flag='unchanged', backend='pillow')
            if self.reduce_zero_label:
                # avoid using underflow conversion
                gt_seg_map[gt_seg_map == 0] = 255
                gt_seg_map = gt_seg_map - 1
                gt_seg_map[gt_seg_map == 254] = 255

            gt_seg_maps.append(gt_seg_map)

        return gt_seg_maps

    def evaluate(self, results, metric='mIoU', logger=None, **kwargs):
        """Evaluate the dataset.

        Args:
            results (list): Testing results of the dataset.
            metric (str | list[str]): Metrics to be evaluated.
            logger (logging.Logger | None | str): Logger used for printing
                related information during evaluation. Default: None.

        Returns:
            dict[str, float]: Default metrics.
        """

        if not isinstance(metric, str):
            assert len(metric) == 1
            metric = metric[0]
        allowed_metrics = ['mIoU']
        if metric not in allowed_metrics:
            raise KeyError('metric {} is not supported'.format(metric))

        eval_results = {}
        gt_seg_maps = self.get_gt_seg_maps()
        if self.CLASSES is None:
            num_classes = len(
                reduce(np.union1d, [np.unique(_) for _ in gt_seg_maps]))
        else:
            num_classes = len(self.CLASSES)

        all_acc, acc, iou = mean_iou(
            results, gt_seg_maps, num_classes, ignore_index=self.ignore_index)
        summary_str = ''
        summary_str += 'per class results:\n'

        line_format = '{:<15} {:>10} {:>10}\n'
        summary_str += line_format.format('Class', 'IoU', 'Acc')
        if self.CLASSES is None:
            class_names = tuple(range(num_classes))
        else:
            class_names = self.CLASSES
        for i in range(num_classes):
            iou_str = '{:.2f}'.format(iou[i] * 100)
            acc_str = '{:.2f}'.format(acc[i] * 100)
            summary_str += line_format.format(class_names[i], iou_str, acc_str)
        summary_str += 'Summary:\n'
        line_format = '{:<15} {:>10} {:>10} {:>10}\n'
        summary_str += line_format.format('Scope', 'mIoU', 'mAcc', 'aAcc')

        iou_str = '{:.2f}'.format(np.nanmean(iou) * 100)
        acc_str = '{:.2f}'.format(np.nanmean(acc) * 100)
        all_acc_str = '{:.2f}'.format(all_acc * 100)
        summary_str += line_format.format('global', iou_str, acc_str,
                                          all_acc_str)
        print_log(summary_str, logger)

        eval_results['mIoU'] = np.nanmean(iou)
        eval_results['mAcc'] = np.nanmean(acc)
        eval_results['aAcc'] = all_acc

        return eval_results

Last i am encounter the error as follow, that mean the dataset is incorrect load,can you help me to solve this problem.

Traceback (most recent call last):
  File "tools/train.py", line 160, in <module>
    main()
  File "tools/train.py", line 156, in main
    meta=meta)
  File "/home/ding/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmseg/apis/train.py", line 53, in train_segmentor
    drop_last=True) for ds in dataset
  File "/home/ding/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmseg/apis/train.py", line 53, in <listcomp>
    drop_last=True) for ds in dataset
  File "/home/ding/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmseg/datasets/builder.py", line 150, in build_dataloader
    **kwargs)
  File "/home/ding/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 213, in __init__
    sampler = RandomSampler(dataset)
  File "/home/ding/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 94, in __init__
    "value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

Deeplabv3plus RutimeError:it is expected output_size equals to 2 ,but got size 3 occur in the /mmseg/ops/wrappers.py

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. The bug has not been fixed in the latest version.

Describe the bug
A clear and concise description of what the bug is.

Reproduction

  1. What command or script did you run?
A placeholder for the command.
  1. Did you make any modifications on the code or config? Did you understand what you have modified?
  2. What dataset did you use?

Environment

  1. Please run python mmseg/utils/collect_env.py to collect necessary environment infomation and paste it here.
  2. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback
If applicable, paste the error trackback here.

A placeholder for trackback.

Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

CUDA error: an illegal memory access was encountered

sys.platform: linux
Python: 3.7.7 (default, May  7 2020, 21:25:33) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.0, V10.0.130
GPU 0,1: GeForce GTX 1080 Ti
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - Intel(R) Math Kernel Library Version 2020.0.1 Product Build 20200208 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CUDA Runtime 10.0
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.3
  - Magma 2.5.1
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

TorchVision: 0.5.0
OpenCV: 4.2.0
MMCV: 1.0.4
MMSegmentation: 0.5.0+b57fb2b
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 10.0

Error was encountered during training process with condfigs:

Config:
norm_cfg = dict(type='BN', requires_grad=True)
model = dict(
    type='EncoderDecoder',
    pretrained='open-mmlab://resnet50_v1c',
    backbone=dict(
        type='ResNetV1c',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        dilations=(1, 1, 2, 4),
        strides=(1, 2, 1, 1),
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=False,
        style='pytorch',
        contract_dilation=True),
    decode_head=dict(
        type='PSPHead',
        in_channels=2048,
        in_index=3,
        channels=512,
        pool_scales=(1, 2, 3, 6),
        dropout_ratio=0.1,
        num_classes=9,
        norm_cfg=dict(type='BN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    auxiliary_head=dict(
        type='FCNHead',
        in_channels=1024,
        in_index=2,
        channels=256,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=9,
        norm_cfg=dict(type='BN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)))
train_cfg = dict()
test_cfg = dict(mode='whole')
dataset_type = 'Aircraft'
data_root = '/mmdetection_aircraft/data/segm2/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (512, 512)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations'),
    dict(type='Resize', img_scale=(640, 480), ratio_range=(0.5, 2.0)),
    dict(type='RandomCrop', crop_size=(512, 512), cat_max_ratio=0.75),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size=(512, 512), pad_val=0, seg_pad_val=255),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(640, 480),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=1,
    train=dict(
        type='Aircraft',
        data_root='/mmdetection_aircraft/data/segm2/',
        img_dir='JPEGImages',
        ann_dir='PaletteClass',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations'),
            dict(type='Resize', img_scale=(640, 480), ratio_range=(0.5, 2.0)),
            dict(type='RandomCrop', crop_size=(512, 512), cat_max_ratio=0.75),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(type='PhotoMetricDistortion'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size=(512, 512), pad_val=0, seg_pad_val=255),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_semantic_seg'])
        ],
        split='train.txt'),
    val=dict(
        type='Aircraft',
        data_root='/mmdetection_aircraft/data/segm2/',
        img_dir='JPEGImages',
        ann_dir='PaletteClass',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(640, 480),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        split='val.txt'),
    test=dict(
        type='Aircraft',
        data_root='/mmdetection_aircraft/data/segm2/',
        img_dir='JPEGImages',
        ann_dir='PaletteClass',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(640, 480),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        split='val.txt'))
log_config = dict(
    interval=1, hooks=[dict(type='TextLoggerHook', by_epoch=False)])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = 'checkpoints/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth'
resume_from = None
workflow = [('train', 1)]
cudnn_benchmark = True
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
optimizer_config = dict()
lr_config = dict(policy='poly', power=0.9, min_lr=0.0001, by_epoch=False)
total_iters = 400
checkpoint_config = dict(by_epoch=False, interval=200)
evaluation = dict(interval=1, metric='mIoU')
work_dir = './work_dirs/pspnet'
seed = 0
gpu_ids = [1]

The script take an approximately 4-5GB of GPU from 11GB available and return this error:

#ERROR

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-8-fec2661e1f4c> in <module>
     16 mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
     17 train_segmentor(model, datasets, cfg, distributed=False, validate=True, 
---> 18                 meta=dict())

~/mmsegmentation/mmseg/apis/train.py in train_segmentor(model, dataset, cfg, distributed, validate, timestamp, meta)
    104     elif cfg.load_from:
    105         runner.load_checkpoint(cfg.load_from)
--> 106     runner.run(data_loaders, cfg.workflow, cfg.total_iters)

~/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py in run(self, data_loaders, workflow, max_iters, **kwargs)
    117                     if mode == 'train' and self.iter >= max_iters:
    118                         break
--> 119                     iter_runner(iter_loaders[i], **kwargs)
    120 
    121         time.sleep(1)  # wait for some hooks like loggers to finish

~/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py in train(self, data_loader, **kwargs)
     53         self.call_hook('before_train_iter')
     54         data_batch = next(data_loader)
---> 55         outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
     56         if not isinstance(outputs, dict):
     57             raise TypeError('model.train_step() must return a dict')

~/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py in train_step(self, *inputs, **kwargs)
     29 
     30         inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
---> 31         return self.module.train_step(*inputs[0], **kwargs[0])
     32 
     33     def val_step(self, *inputs, **kwargs):

~/mmsegmentation/mmseg/models/segmentors/base.py in train_step(self, data_batch, optimizer, **kwargs)
    150         #data_batch['gt_semantic_seg'] = data_batch['gt_semantic_seg'][:,:,:,:,0]
    151         #print(data_batch['gt_semantic_seg'].shape)
--> 152         losses = self.forward_train(**data_batch, **kwargs)
    153         loss, log_vars = self._parse_losses(losses)
    154 

~/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py in forward_train(self, img, img_metas, gt_semantic_seg)
    155 
    156         loss_decode = self._decode_head_forward_train(x, img_metas,
--> 157                                                       gt_semantic_seg)
    158         losses.update(loss_decode)
    159 

~/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py in _decode_head_forward_train(self, x, img_metas, gt_semantic_seg)
     99         loss_decode = self.decode_head.forward_train(x, img_metas,
    100                                                      gt_semantic_seg,
--> 101                                                      self.train_cfg)
    102 
    103         losses.update(add_prefix(loss_decode, 'decode'))

~/mmsegmentation/mmseg/models/decode_heads/decode_head.py in forward_train(self, inputs, img_metas, gt_semantic_seg, train_cfg)
    184         """
    185         seg_logits = self.forward(inputs)
--> 186         losses = self.losses(seg_logits, gt_semantic_seg)
    187         return losses
    188 

~/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py in new_func(*args, **kwargs)
    162                                 'method of nn.Module')
    163             if not (hasattr(args[0], 'fp16_enabled') and args[0].fp16_enabled):
--> 164                 return old_func(*args, **kwargs)
    165             # get the arg spec of the decorated method
    166             args_info = getfullargspec(old_func)

~/mmsegmentation/mmseg/models/decode_heads/decode_head.py in losses(self, seg_logit, seg_label)
    229             seg_label,
    230             weight=seg_weight,
--> 231             ignore_index=self.ignore_index)
    232         loss['acc_seg'] = accuracy(seg_logit, seg_label)
    233         return loss

~/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

~/mmsegmentation/mmseg/models/losses/cross_entropy_loss.py in forward(self, cls_score, label, weight, avg_factor, reduction_override, **kwargs)
    175             class_weight=class_weight,
    176             reduction=reduction,
--> 177             avg_factor=avg_factor)
    178         return loss_cls

~/mmsegmentation/mmseg/models/losses/cross_entropy_loss.py in cross_entropy(pred, label, weight, class_weight, reduction, avg_factor, ignore_index)
     28         weight = weight.float()
     29     loss = weight_reduce_loss(
---> 30         loss, weight=weight, reduction=reduction, avg_factor=avg_factor)
     31 
     32     return loss

~/mmsegmentation/mmseg/models/losses/utils.py in weight_reduce_loss(loss, weight, reduction, avg_factor)
     45     # if avg_factor is not specified, just reduce the loss
     46     if avg_factor is None:
---> 47         loss = reduce_loss(loss, reduction)
     48     else:
     49         # if reduction is mean, then average the loss by avg_factor

~/mmsegmentation/mmseg/models/losses/utils.py in reduce_loss(loss, reduction)
     19         return loss
     20     elif reduction_enum == 1:
---> 21         return loss.mean()
     22     elif reduction_enum == 2:
     23         return loss.sum()

RuntimeError: CUDA error: an illegal memory access was encountered

But if i reduce the size the image size twice with the same images per GPU (2) ,script takes approxiamtely 2GB from GPU and everything works fine.
Also,i want to add that using another PyTorch script with my own Dataloader i'm able to fill in GPU on full (11GB) by training process with the same Torch version and the same hardware.

“show_result()” Function error: ValueError: operands could not be broadcast together with shapes (1600,1600,3) (512,512,3)

Describe the bug
when i ran tools/test.py to test your trained PSPnet model with single GPU on ade20k, an error occured in show_result()

Reproduction

  1. What command or script did you run?
python tools/test.py configs/pspnet/pspnet_r50-d8_512x512_80k_ade20k.py checkpoints/pspnet_r50-d8_512x512_80k_ade20k_20200615_014128-15a8b914.pth --show

Error traceback

Traceback (most recent call last):
  File "tools/test.py", line 142, in <module>
    main()
  File "tools/test.py", line 120, in main
    outputs = single_gpu_test(model, data_loader, args.show, args.show_dir)
  File "/root/code/mmsegmentation/mmseg/apis/test.py", line 62, in single_gpu_test
    out_file=out_file)
  File "/root/code/mmsegmentation/mmseg/models/segmentors/base.py", line 253, in show_result
    img = img * 0.5 + color_seg * 0.5
ValueError: operands could not be broadcast together with shapes (1600,1600,3) (512,512,3)

About mmseg OHEMSampler Implementation

I noticed that maybe there are some problems of OHEMSampler Implementation in mmseg:
    1. The following code will not work , no matter what mask and sort_prob are , the values in seg_weight will not change.
         seg_weight[mask][sort_prob < threshold] = 0.
    1. Your OHEMSampler Implementation try to assign 0 weight to pixels which have low confidence. But these pixels always get high loss value in cross_entropy,maybe there are some logical probloms in the implementation?
  • I try to improve the implementation in following way :

             def sample(self, seg_logit, seg_label):
    
                   with torch.no_grad():
                          assert seg_logit.shape[2:] == seg_label.shape[2:]
                          assert seg_label.shape[1] == 1
                          seg_label = seg_label.squeeze(1).long()
                          batch_kept = self.min_kept * seg_label.size(0)
                          seg_prob = F.softmax(seg_logit, dim=1)
                          mask = seg_label.contiguous().view(-1, ) != self.ignore_index
    
                          tmp_seg_label = seg_label.clone()
                          tmp_seg_label[tmp_seg_label == self.ignore_index] = 0
                          seg_prob = seg_prob.gather(1, tmp_seg_label.unsqueeze(1))
                          sort_prob, sort_indices = seg_prob.contiguous().view(
                                   -1, )[mask].contiguous().sort()
    
                          if sort_prob.numel() > 0:
                                 min_threshold = sort_prob[min(batch_kept,
                                                  sort_prob.numel() - 1)]
                          else:
                                 min_threshold = 0.0
                          threshold = max(min_threshold, self.thresh)
    
                          seg_weight = seg_logit.new_ones(size=seg_label.size())
                          seg_weight = seg_weight.view(-1)
                          tmp = seg_weight[mask]
                          tmp[sort_prob > threshold] = 0
                          seg_weight[mask] = tmp
                          seg_weight = seg_weight.view_as(seg_label)
    
                         return seg_weight
  • I‘ll be really appreciated if you could give me some advices! @xvjiarui @hellock

multi-stage training

Describe the feature
When transferring pre-trained encoders to a segmentation setting, e.g. Unet, one may want to first freeze the encoder and warmup the newly-added decoder for several epochs (stage 1); then unfreeze the encoder and train the whole network together (stage 2).

Motivation
Ex1. This multi-stage training scheme supposedly results in better performance.
Ex2. It can also facilitate easy transition from FP32 training to FP16 because sometimes using FP16 from right beginning may fail due to large loss that cannot be represented by FP16.

Related resources

Additional context

FWIoU

The eval metric is Miou now, I need to add FWIou,how can I operate?,I search the configs.MD and eval_hook.py, but I can not solve my problem.
Thank you for your help.

About test and train

Thanks for your work. In the process of using this framework, I found some problems.
About test:
I download the trained model(DeepLabV3+ | R-101-D8 | 769x769 | 80000), and test the model on city val. I got 80.75 mIoU instead of 80.98.

About train:
I train the deeplabv3+ with this configure(DeepLabV3+ | R-50-D8 | 769x769 | 40000|4GPU). I got 78.49 mIoU instead of 78.97.

What could be the reason for the difference ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.