csailvision / semantic-segmentation-pytorch Goto Github PK

View Code? Open in Web Editor NEW

4.9K 126.0 1.1K 5.04 MB

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

Home Page: http://sceneparsing.csail.mit.edu/

License: BSD 3-Clause "New" or "Revised" License

Shell 0.90% Python 94.89% Jupyter Notebook 4.21%

pytorch semantic-segmentation scene-recognition ade20k

semantic-segmentation-pytorch's Introduction

Semantic Segmentation on MIT ADE20K dataset in PyTorch

This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing dataset (http://sceneparsing.csail.mit.edu/).

ADE20K is the largest open source dataset for semantic segmentation and scene parsing, released by MIT Computer Vision team. Follow the link below to find the repository for our dataset and implementations on Caffe and Torch7: https://github.com/CSAILVision/sceneparsing

If you simply want to play with our demo, please try this link: http://scenesegmentation.csail.mit.edu You can upload your own photo and parse it!

You can also use this colab notebook playground here to tinker with the code for segmenting an image.

All pretrained models can be found at: http://sceneparsing.csail.mit.edu/model/pytorch

[From left to right: Test Image, Ground Truth, Predicted Result]

Color encoding of semantic categories can be found here: https://docs.google.com/spreadsheets/d/1se8YEtb2detS7OuPE86fXGyD269pMycAWe2mtKUj2W8/edit?usp=sharing

Updates

HRNet model is now supported.
We use configuration files to store most options which were in argument parser. The definitions of options are detailed in config/defaults.py.
We conform to Pytorch practice in data preprocessing (RGB [0, 1], substract mean, divide std).

Highlights

Syncronized Batch Normalization on PyTorch

This module computes the mean and standard-deviation across all devices during training. We empirically find that a reasonable large batch size is important for segmentation. We thank Jiayuan Mao for his kind contributions, please refer to Synchronized-BatchNorm-PyTorch for details.

The implementation is easy to use as:

It is pure-python, no C++ extra extension libs.
It is completely compatible with PyTorch's implementation. Specifically, it uses unbiased variance to update the moving average, and use sqrt(max(var, eps)) instead of sqrt(var + eps).
It is efficient, only 20% to 30% slower than UnsyncBN.

Dynamic scales of input for training with multiple GPUs

For the task of semantic segmentation, it is good to keep aspect ratio of images during training. So we re-implement the DataParallel module, and make it support distributing data to multiple GPUs in python dict, so that each gpu can process images of different sizes. At the same time, the dataloader also operates differently.

^{Now the batch size of a dataloader always equals to the number of GPUs, each element will be sent to a GPU. It is also compatible with multi-processing. Note that the file index for the multi-processing dataloader is stored on the master process, which is in contradict to our goal that each worker maintains its own file list. So we use a trick that although the master process still gives dataloader an index for __getitem__ function, we just ignore such request and send a random batch dict. Also, the multiple workers forked by the dataloader all have the same seed, you will find that multiple workers will yield exactly the same data, if we use the above-mentioned trick directly. Therefore, we add one line of code which sets the defaut seed for numpy.random before activating multiple worker in dataloader.}

State-of-the-Art models

PSPNet is scene parsing network that aggregates global representation with Pyramid Pooling Module (PPM). It is the winner model of ILSVRC'16 MIT Scene Parsing Challenge. Please refer to https://arxiv.org/abs/1612.01105 for details.
UPerNet is a model based on Feature Pyramid Network (FPN) and Pyramid Pooling Module (PPM). It doesn't need dilated convolution, an operator that is time-and-memory consuming. Without bells and whistles, it is comparable or even better compared with PSPNet, while requiring much shorter training time and less GPU memory. Please refer to https://arxiv.org/abs/1807.10221 for details.
HRNet is a recently proposed model that retains high resolution representations throughout the model, without the traditional bottleneck design. It achieves the SOTA performance on a series of pixel labeling tasks. Please refer to https://arxiv.org/abs/1904.04514 for details.

Supported models

We split our models into encoder and decoder, where encoders are usually modified directly from classification networks, and decoders consist of final convolutions and upsampling. We have provided some pre-configured models in the config folder.

Encoder:

MobileNetV2dilated
ResNet18/ResNet18dilated
ResNet50/ResNet50dilated
ResNet101/ResNet101dilated
HRNetV2 (W48)

Decoder:

C1 (one convolution module)
C1_deepsup (C1 + deep supervision trick)
PPM (Pyramid Pooling Module, see PSPNet paper for details.)
PPM_deepsup (PPM + deep supervision trick)
UPerNet (Pyramid Pooling + FPN head, see UperNet for details.)

Performance:

IMPORTANT: The base ResNet in our repository is a customized (different from the one in torchvision). The base models will be automatically downloaded when needed.

Architecture	MultiScale Testing	Mean IoU	Pixel Accuracy(%)	Overall Score	Inference Speed(fps)
MobileNetV2dilated + C1_deepsup	No	34.84	75.75	54.07	17.2
MobileNetV2dilated + C1_deepsup	Yes	33.84	76.80	55.32	10.3
MobileNetV2dilated + PPM_deepsup	No	35.76	77.77	56.27	14.9
MobileNetV2dilated + PPM_deepsup	Yes	36.28	78.26	57.27	6.7
ResNet18dilated + C1_deepsup	No	33.82	76.05	54.94	13.9
ResNet18dilated + C1_deepsup	Yes	35.34	77.41	56.38	5.8
ResNet18dilated + PPM_deepsup	No	38.00	78.64	58.32	11.7
ResNet18dilated + PPM_deepsup	Yes	38.81	79.29	59.05	4.2
ResNet50dilated + PPM_deepsup	No	41.26	79.73	60.50	8.3
ResNet50dilated + PPM_deepsup	Yes	42.14	80.13	61.14	2.6
ResNet101dilated + PPM_deepsup	No	42.19	80.59	61.39	6.8
ResNet101dilated + PPM_deepsup	Yes	42.53	80.91	61.72	2.0
UperNet50	No	40.44	79.80	60.12	8.4
UperNet50	Yes	41.55	80.23	60.89	2.9
UperNet101	No	42.00	80.79	61.40	7.8
UperNet101	Yes	42.66	81.01	61.84	2.3
HRNetV2	No	42.03	80.77	61.40	5.8
HRNetV2	Yes	43.20	81.47	62.34	1.9

The training is benchmarked on a server with 8 NVIDIA Pascal Titan Xp GPUs (12GB GPU memory), the inference speed is benchmarked a single NVIDIA Pascal Titan Xp GPU, without visualization.

Environment

The code is developed under the following configurations.

Hardware: >=4 GPUs for training, >=1 GPU for testing (set [--gpus GPUS] accordingly)
Software: Ubuntu 16.04.3 LTS, CUDA>=8.0, Python>=3.5, PyTorch>=0.4.0
Dependencies: numpy, scipy, opencv, yacs, tqdm

Quick start: Test on an image using our trained model

Here is a simple demo to do inference on a single image:

chmod +x demo_test.sh
./demo_test.sh

This script downloads a trained model (ResNet50dilated + PPM_deepsup) and a test image, runs the test script, and saves predicted segmentation (.png) to the working directory.

To test on an image or a folder of images ($PATH_IMG), you can simply do the following:

python3 -u test.py --imgs $PATH_IMG --gpu $GPU --cfg $CFG

Training

Download the ADE20K scene parsing dataset:

chmod +x download_ADE20K.sh
./download_ADE20K.sh

Train a model by selecting the GPUs ($GPUS) and configuration file ($CFG) to use. During training, checkpoints by default are saved in folder ckpt.

python3 train.py --gpus $GPUS --cfg $CFG

To choose which gpus to use, you can either do --gpus 0-7, or --gpus 0,2,4,6.

For example, you can start with our provided configurations:

Train MobileNetV2dilated + C1_deepsup

python3 train.py --gpus GPUS --cfg config/ade20k-mobilenetv2dilated-c1_deepsup.yaml

Train ResNet50dilated + PPM_deepsup

python3 train.py --gpus GPUS --cfg config/ade20k-resnet50dilated-ppm_deepsup.yaml

Train UPerNet101

python3 train.py --gpus GPUS --cfg config/ade20k-resnet101-upernet.yaml

You can also override options in commandline, for example python3 train.py TRAIN.num_epoch 10 .

Evaluation

Evaluate a trained model on the validation set. Add VAL.visualize True in argument to output visualizations as shown in teaser.

For example:

Evaluate MobileNetV2dilated + C1_deepsup

python3 eval_multipro.py --gpus GPUS --cfg config/ade20k-mobilenetv2dilated-c1_deepsup.yaml

Evaluate ResNet50dilated + PPM_deepsup

python3 eval_multipro.py --gpus GPUS --cfg config/ade20k-resnet50dilated-ppm_deepsup.yaml

Evaluate UPerNet101

python3 eval_multipro.py --gpus GPUS --cfg config/ade20k-resnet101-upernet.yaml

Integration with other projects

This library can be installed via pip to easily integrate with another codebase

pip install git+https://github.com/CSAILVision/semantic-segmentation-pytorch.git@master

Now this library can easily be consumed programmatically. For example

from mit_semseg.config import cfg
from mit_semseg.dataset import TestDataset
from mit_semseg.models import ModelBuilder, SegmentationModule

Reference

If you find the code or pre-trained models useful, please cite the following papers:

Semantic Understanding of Scenes through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, T. Xiao, S. Fidler, A. Barriuso and A. Torralba. International Journal on Computer Vision (IJCV), 2018. (https://arxiv.org/pdf/1608.05442.pdf)

@article{zhou2018semantic,
  title={Semantic understanding of scenes through the ade20k dataset},
  author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Xiao, Tete and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
  journal={International Journal on Computer Vision},
  year={2018}
}

Scene Parsing through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso and A. Torralba. Computer Vision and Pattern Recognition (CVPR), 2017. (http://people.csail.mit.edu/bzhou/publication/scene-parse-camera-ready.pdf)

@inproceedings{zhou2017scene,
    title={Scene Parsing through ADE20K Dataset},
    author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
    year={2017}
}

semantic-segmentation-pytorch's People

Contributors

Stargazers

Watchers

Forkers

scholltan guo2004131 jdc08161063 thomasdic2000 benjamesbabala wangjingbo1219 waleedgondal hyzcn mahlermozart ali-design shubhampachori12110095 felicia126 lxh-123 olgaliak sinianyutian xiangyangshi rosenfeldamir sunyiyou yaroslavschubert erinchen824 jma100 cfosco bu-cs542-2018 wpf535236337 bityangke keyky gzzgz suzhenghang tbetterlife zcrwind guidachengong ifighting farnazjazayeri rongchangzhao locussam qzane tianyafu sunshinezhihuo fengjunxi liangdu liuyuying0829 sanmusunrise hunnudl hxl1990 zxw1992 hzhang57 mathfinder xiangliu886 zhangkexin1996 yousongzhu rkshuai ml-lab withli peterxiaoguo heinlein-vi binwang-shu codes-kzhan limt15 happyzhouch marcoforte hulalazz faysalmahamud mysee1989 liupzone asetsuna zmlshiwo shaunstanislauslau zhjpqq jianqiangq weitaoatvison justchenhao zhangyuancv roy-engineering weeang763162 donproc baifengbai wuchengzhu wh-forker ai-jie01 kartheekmedathati sirlps moree0 maoxin aust-hansen capri2014 wuzeen guker quxiaofeng kanbo0409 kekedan james20141606 zhzixuan ieyer brucechen13 alanyannick ascenoputing lg878398509 jingweiz hippop92 pkurainbow

semantic-segmentation-pytorch's Issues

Use of scipy imresize function

The use of the imresize function from scipy.misc in dataset.py (line 130) is highly problematic in my opinion. It changes the range of values from [0,n] to [0,255]. In my case n was rather small, less than 10. And it lead to an error in a CUDA assumption later on, it took me 3 hours to track down the error.

It looks like you somehow address the issue in the next lines, but that method is not stable when changing the number of classes.

How to generate information like object150_info.csv

Hi,
I want generate all names of 150 classes, object names of each class, and number of pictures which contain this class. How can I do that? or how to generate information like object150.csv?

When setting batchsize too small, the IOU is very low

I use UserScatteredDataParallel and patch_replication_callback in my code, but, when i set the batchsize small (batchsize=2 per GPU), the IOU is very low and as enlarging the batchsize, the IOU is increasing.
I think this result is unusual since the results using DataParallel is better in the same batchsize.

Load From Checkpoint

Is it not possible to load from a checkpoint?

When using the same --id parameter the train.py script will restart at Epoch 0, but it would be very helpful if the training could restart where it left off.

Alternatively, is there a PyTorch pretrained model for the ADE20K dataset to get started with?

Loss with NaN by using self-defined model

Thanks for your clean and flexible frameworks :)
I met a question "Loss appears NaN" when I changed the decoder to self-defined model structure， even though I used the same operation "log_softmax + NLLloss" to avoid the unstable probability with CrossEntropyLoss.
Did I miss something? Looking forward to your response.

Usage of segm_downsampling_rate

Hi,
Thanks for your codes.
My question is, can I change the parameter segm_downsampling_rate if I want to use the original size of seg_label?
I tried it but maybe the decoder network is fixed to output downsample 8x of seg_label.
Could you share any advice to use original size of seg_label?

Thanks very much.

It seems that the link to resnet50 is broke and when will you provide the model for resnet 100 and resnet 150

Why not mean-std normalization for the input image?

Hi thank you very much for this great toolkit! I have a question, I found in semantic segmentation or ImageNet based image classification, the input image was commonly subtracted by the global mean for each channel (RGB) in training set, as shown here.
Why didn't the image be divided by the standard deviation for each channel?

The error while test

os.path.exists(args.weights_encoder), 'checkpoint does not exitst!
AssertionError: checkpoint does not exitst!`

I use the command python test.py --test_img ADE_val_00001519.jpg --model_path ./ckpt/baseline-resnet50_dilated8-ppm_bilinear_deepsup-ngpus2-batchSize8-imgMaxSize384-paddingConst8-segmDownsampleRate8-LR_encoder0.02-LR_decoder0.02-epoch20-decay0.0001-fixBN0/ --suffix _epoch20.pth
both encoder and decoder have been trained,why does this error happen and how should I solve this problem?

Train on CityScapes

Hi, @cvondrick @quantombone @metalbubble @hangzhaomit ,
I want to train this model on CityScapes. I must to modify the TrainDataset for the fixed size of cityscapes image. But there is no train.odgt about cityscapes, and I think it is not necessary for cityscapes.
I am not sure that If I use this code, whether the inputs crossed multi-gpu is a different mini-batch.
`class cityscapes(Dataset):

def __init__(self, root, co_transform=None, subset='train'):
    self.images_root = os.path.join(root, 'leftImg8bit/')
    self.labels_root = os.path.join(root, 'gtFine/')
    
    self.images_root += subset
    self.labels_root += subset

    print (self.images_root)
    #self.filenames = [image_basename(f) for f in os.listdir(self.images_root) if is_image(f)]
    self.filenames = [os.path.join(dp, f) for dp, dn, fn in os.walk(os.path.expanduser(self.images_root)) for f in fn if is_image(f)]
    self.filenames.sort()

    #[os.path.join(dp, f) for dp, dn, fn in os.walk(os.path.expanduser(".")) for f in fn]
    #self.filenamesGt = [image_basename(f) for f in os.listdir(self.labels_root) if is_image(f)]
    self.filenamesGt = [os.path.join(dp, f) for dp, dn, fn in os.walk(os.path.expanduser(self.labels_root)) for f in fn if is_label(f)]
    self.filenamesGt.sort()

    self.co_transform = co_transform 


def __getitem__(self, index):
    filename = self.filenames[index]
    filenameGt = self.filenamesGt[index]

    with open(image_path_city(self.images_root, filename), 'rb') as f:
        image = load_image(f).convert('RGB')
    with open(image_path_city(self.labels_root, filenameGt), 'rb') as f:
        label = load_image(f).convert('P')

    if self.co_transform is not None:
        image, label = self.co_transform(image, label)

    return image, label

def __len__(self):
    return len(self.filenames)`

Can you give me some advice.
Thanks a lot!

Can you share the ResNet101 pretrained model?

Hi, great project.

I find the link of the pretrained models do not contain the ResNet-101(replace the 7X7 conv with 3 3X3 conv) model.
the url you provide is valid .....

http://sceneparsing.csail.mit.edu/model/pretrained_resnet/resnet101-imagenet.pth

Could you share us when it will be convenient for you to share the model?

train.py's problem

Hi~man
when I run the python train.py it broken and I don't know the reason.
I will be appreciate if u can help me.

Evaluating at 0 epochs...
THCudaCheck FAIL file=torch/csrc/cuda/Module.cpp line=87 error=10 : invalid device ordinal
Traceback (most recent call last):
File "train.py", line 400, in
main(args)
File "train.py", line 277, in main
evaluate(nets, loader_val, history, 0, args)
File "train.py", line 135, in evaluate
pred, err = forward_with_loss(nets, batch_data, args, is_train=False)
File "train.py", line 35, in forward_with_loss
pred = net_decoder(net_encoder(input_img))
File "/home/t/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/home/t/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 56, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
File "/home/t/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 67, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)
File "/home/t/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim)
File "/home/t/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/home/t/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 18, in scatter_map
return tuple(zip(*map(scatter_map, obj)))
File "/home/t/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 15, in scatter_map
return Scatter(target_gpus, dim=dim)(obj)
File "/home/t/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 60, in forward
outputs = comm.scatter(input, self.target_gpus, self.chunk_sizes, self.dim, streams)
File "/home/t/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 159, in scatter
with torch.cuda.device(device), torch.cuda.stream(stream):
File "/home/t/anaconda3/lib/python3.6/site-packages/torch/cuda/init.py", line 128, in enter
torch._C._cuda_setDevice(self.idx)
RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:87

Segmentation fault (core dumped)

When I run the test.py, I got:

Namespace(arch_decoder='ppm_bilinear_deepsup', arch_encoder='resnet50_dilated8', batch_size=1, fc_dim=2048, gpu_id=0, imgMaxSize=1000, imgSize=[300, 400, 500, 600], model_path='baseline-resnet50_dilated8-ppm_bilinear_deepsup/', num_class=150, num_val=-1, padding_constant=8, result='./', segm_downsampling_rate=8, suffix='_epoch_20.pth', test_img='ADE_val_00001519.jpg')
baseline-resnet50_dilated8-ppm_bilinear_deepsup/encoder_epoch_20.pth
Loading weights for net_encoder
Loading weights for net_decoder
samples: 1
Segmentation fault (core dumped)

I have 2E5 CPU and 2NVIDIA Tian xp GPU.
Any idea?
Thanks!

Evaluation on validation set while training

Hi,

The previous version of code (before synchronous BN) had evaluation while training (after each epoch). It seems that the new version does not have it. Is it because of Sync BN training/evaluation difference?

Thanks

net.eval for dropout

Using net.eval() for fixing the BN layer parameters make the dropout layer to act as in test time. Shouldn't only be used for BN layers only and not the whole network?

Will you release the results on CityScapes?

Great work!

I am wondering the reproduced performance on the CityScapes.
It would be great if you could share the related results.

About Dataloader

Hi @Tete-Xiao @hangzhaomit ,thanks for your work.
I have a question about dataloader.

Why you do batch_segms = batch_segms-1 to change label from 0->150 to -1->149 and why use BGR instead of RGB?.

If I use another dataset, Do I need to change the segms alse and use BGR mode? Thansk a lot.

Can't use cuda.

Hi,
I use conda install -c prigoyal pytorch=0.4.0 to install pytorch and run ./demo_test.sh.
Everything goes well until line 105 of test.py:

segmentation_module.cuda()

The output is:

 ./demo_test.sh: line 30: 27040 Segmentation fault      (core dumped) python3 -u test.py --model_path $MODEL_PATH --test_img $TEST_IMG --arch_encoder resnet50_dilated8 --arch_decoder ppm_bilinear_deepsup --fc_dim 2048 --result $RESULT_PATH

I guess it's the problem of pytorch, and test the code:

import torch
x = torch.Tensor(3,3)
x.cuda()

Similar error : Segmentation fault (core dumped).

Any suggestions?

BTW, my other projects of pytorch 0.2.0 and 0.3.0 on this machine work well.

Thanks.

Look forward to the new arxiv paper

Thanks for the awesome work! I am also looking forward to the new paper, Unified Perceptual Parsing for Scene Understanding. Hope that it could be avalible soon.

Thanks a lot!

How to decrease the performance gap between the val set and test set?

Hi, I have trained some models and achieved new state-of-the-art performance on the val set of the cityscapes.(based on the ResNet you released.)

But when I submit the results on the test set. I find the performance gap is very large!

With single crop, the performance on the val set will drop 2 points on the test set. I guess maybe it is related to the DSN structure or related to the learning rate and so on.

However, my previous methods without DSN only suffers 1 point drop between the val set and test set.

Could you help me on how to decrease this gap?

Difference with my previous method:

backbone: resnet -> modified resnet (replace the 7x7 conv with three 3x3 convs)
learning rate: 7e-3 -> 2e-2
pretrain: COCO pretrain -> ImageNet pretrain.
loss: no extra loss ->deeply supervisde loss

There exist some differences, I just want to ask advice how to avoid this problem.

test.py not working properly

Can you tell what is the model id ? The models folder in your site is a bit confusing. Any help will be appreciated.

can you share the model performance?

A little problem about training

How can I load the pretrained model to tain? I have just seen the argument '--start_epoch',if I want to use the test model 'epoch_20' you offered to train,how can I do?

ImportError: cannot import name '_set_worker_signal_handlers'

Hi, I found that the pytorch requires python>=3.6, so I installed pytorch0.4 with python3.6. When I do ./demo_test.sh, it gave me the error:
File "test.py", line 13, in
from dataset import TestDataset
File "/home/aizz/Documents/kaggle/semantic-segmentation-pytorch/dataset.py", line 4, in
import lib.utils.data as torchdata
File "/home/aizz/Documents/kaggle/semantic-segmentation-pytorch/lib/utils/data/init.py", line 3, in
from .dataloader import DataLoader
File "/home/aizz/Documents/kaggle/semantic-segmentation-pytorch/lib/utils/data/dataloader.py", line 3, in
from torch._C import _set_worker_signal_handlers, _update_worker_pids,
ImportError: cannot import name '_set_worker_signal_handlers'
is there any solution?Thanks!

Is there anything special about the training base model?

I find that you said you trained the base model by yourself in this repo.
I want to know that there are anything different settings from the base model given by Facebook during training.
Thanks a lot!

undefined name 'conv_out' in models.py

flake8 testing of https://github.com/CSAILVision/semantic-segmentation-pytorch on Python 3.6.3

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./models.py:330:17: F821 undefined name 'conv_out'
        conv5 = conv_out[-1]
                ^
1     F821 undefined name 'conv_out'

input and target size don't match for loss function

It looks like every combination but the default resnet50_dilated8/ppm_bilinear_deepsup leads to a mismatch in size between the input and the target of the loss function. I'm a bit mystified, I did not change any of the models. What I adapted was the number of labels (to 8 as one can see below).

Encoder: resnet50_dilated8. Decoder: upernet
RuntimeError: input and target batch or spatial sizes don't match: target [1 x 85 x 106], input [1 x 8 x 170 x 212] at /opt/conda/conda-bld/pytorch_1524582441669/work/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:24
Encoder: Resnet101. Decoder: ppm_bilinear_deepsup
return torch._C._nn.nll_loss2d(input, target, weight, size_average, ignore_index, reduce) RuntimeError: input and target batch or spatial sizes don't match: target [1 x 75 x 94], input [1 x 8 x 19 x 24] at /opt/conda/conda-bld/pytorch_1524582441669/work/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:24

Encoder: Resnet101. Decoder: Upernet
RuntimeError: input and target batch or spatial sizes don't match: target [1 x 85 x 106], input [1 x 8 x 170 x 212] at /opt/conda/conda-bld/pytorch_1524582441669/work/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:24

In cases where the program runs the last two dimensions seem to be consistent
torch.Size([1, 8, 75, 94]) torch.Size([1, 75, 94])

list indices must be integers or slices, not str

Could you help me about this issue?

GPU 0 memory problem

Hi, thanks for your repo.
I am using your code to train semantic segmentation task, but I find that the No. 0 GPU memory occupancy is much bigger than others, like this:

This problem will not appear when training image classification, so I want to ask that do you have a method to solve this problem?
Thank you again.

Training using parts too

Is it planned to use part segments to improve the granularity of the detected classes?
Thank you.

Evaluate with multi GPU and mini batch

Hi all,
I found that this code still does not implement evaluation on multi GPU, I have tried the same manner as in train.py, but meet infinite recursioon problem during forwad propagation with data_parallel.py.
Hope somebody could help, or will the evaluation using multi GPU will be supported?

RuntimeError: While copying the parameter named layer1.0.conv1.weight

@Tete-Xiao Hi, tete. It seems that network resnet.py is not consistent with the released checkpoints.

        self.conv1 = conv3x3(3, 64, stride=2)
        self.bn1 = BatchNorm2d(64)
        self.relu1 = nn.ReLU(inplace=False)
        self.conv2 = conv3x3(64, 64)
        self.bn2 = BatchNorm2d(64)
        self.relu2 = nn.ReLU(inplace=False)
        self.conv3 = conv3x3(64, 128)
        self.bn3 = BatchNorm2d(128)
        self.relu3 = nn.ReLU(inplace=False)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.layer1 = self._make_layer(block, 64, layers[0])

After conv3+bn3+relu3, the features' channel is 128 but the layer1 is dealing with input of channel 64.

I got the follow bug 🚔

RuntimeError: While copying the parameter named layer1.0.conv1.weight, whose dimensions in the model are torch.Size([64, 64, 1, 1]) and whose dimensions in the checkpoint are torch.Size([64, 128, 1, 1]).

What is the speed of the prediction in test.py ?

I was wondering if anyone has tested the speed of the prediction in test.py. I tried it on a system with nvidia K80, cuda8.0 and it took around 2-3s for the prediction.

Test.py -- unexpected input size

Hi,

I'm trying to use the test.py script, but I'm getting an input.size error. Reportedly, input.size[1] should be 3, but the 384 (presumably imgSize) is being passed there instead. Any thoughts?

bash-4.1$ python test.py --ckpt ./ckpt/ --test_img "./data/ADEChallengeData2016/images/training/ADE_train_00000001.jpg" --id baseline-resnet34_dilated8-psp_bilinear --arch_decoder psp_bilinear --visualize VISUALIZE --batch_size 1
Namespace(arch_decoder='psp_bilinear', arch_encoder='resnet34_dilated8', batch_size=1, ckpt='./ckpt/', fc_dim=512, id='baseline-resnet34_dilated8-psp_bilinear', imgSize=384, num_class=150, num_val=-1, result='.', segSize=-1, suffix='_best.pth', test_img='./data/ADEChallengeData2016/images/training/ADE_train_00000001.jpg', visualize='VISUALIZE')
Loading weights for net_encoder
Loading weights for net_decodertar
/apps/python/2.7.11/gcc-5.3.0/lib/python2.7/site-packages/scipy/ndimage/interpolation.py:600: UserWarning: From scipy 0.13.0, the output shape of zoom() is calculated with round() instead of int() - for these inputs the size of the returned array has changed.
  "the returned array has changed.", UserWarning)
Traceback (most recent call last):
  File "test.py", line 162, in <module>
    main(args)
  File "test.py", line 108, in main
    test(nets, args)
  File "test.py", line 85, in test
    pred = forward_test_multiscale(nets, img, args)
  File "test.py", line 36, in forward_test_multiscale
    pred_scale = net_decoder(net_encoder(input_img),
  File "/apps/pytorch/0.2.0/python-2.7_gcc-5.3_cuda-8.0_cudnn-6.0/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "./scripts/models.py", line 221, in forward
    x = self.relu1(self.bn1(self.conv1(x)))
  File "/apps/pytorch/0.2.0/python-2.7_gcc-5.3_cuda-8.0_cudnn-6.0/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/apps/pytorch/0.2.0/python-2.7_gcc-5.3_cuda-8.0_cudnn-6.0/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 254, in forward
    self.padding, self.dilation, self.groups)
  File "/apps/pytorch/0.2.0/python-2.7_gcc-5.3_cuda-8.0_cudnn-6.0/lib/python2.7/site-packages/torch/nn/functional.py", line 52, in conv2d
    return f(input, weight, bias)
RuntimeError: Need input.size[1] == 3 but got 384 instead.

python train.py is stuck at evaluate(nets, loader_val, history, 0, args)

When I run python.py it writes:

Namespace(arch_decoder='c1bilinear', arch_encoder='resnet34_dilated8', batch_size_per_gpu=16, beta1=0.9, ckpt='./ckpt', ckpt_epoch=5, disp_iter=20, eval_epoch=1, fc_dim=512, fix_bn=0, flip=1, id='baseline', imgSize=384, list_train='./data/ADE20K_object150_train.txt', list_val='./data/ADE20K_object150_val.txt', lr_decoder=0.001, lr_encoder=0.0001, lr_step=20, num_epoch=50, num_gpus=2, num_val=64, optim='SGD', root_img='./data/ADEChallengeData2016/images', root_seg='./data/ADEChallengeData2016/annotations', seed=1234, segDepth=150, segSize=384, vis='./vis', weight_decay=0.0001, weights_decoder='', weights_encoder='', workers=16)
Model ID: baseline-resnet34_dilated8-c1bilinear-ngpus2-batchSize32-imgSize384-segSize384-lr_encoder0.0001-lr_decoder0.001-epoch50-step20-decay0.0001
# samples: 20210
# samples: 64
1 Epoch = 631 iters
Evaluating at 0 epochs...

and then stuck for hours without writing anything.
When I ctrl+c it says:

^CProcess Process-2:
Traceback (most recent call last):
File "train.py", line 403, in
main(args)
File "train.py", line 280, in main
evaluate(nets, loader_val, history, 0, args)
File "train.py", line 132, in evaluate
for i, batch_data in enumerate(loader):
File "/root/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 195, in next
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/root/anaconda3/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/root/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 34, in _worker_loop
r = index_queue.get()
File "/root/anaconda3/lib/python3.6/multiprocessing/queues.py", line 342, in get
res = self._reader.recv_bytes()
File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
idx, batch = self.data_queue.get()
KeyboardInterrupt
File "/root/anaconda3/lib/python3.6/multiprocessing/queues.py", line 342, in get
res = self._reader.recv_bytes()
File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/root/anaconda3/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
KeyboardInterrupt

which means it stucks on

evaluate(nets, loader_val, history, 0, args)

and specifically at

for i, batch_data in enumerate(loader):

and

chunk = read(handle, remaining)

Any ideas?

code about cascade network

Hi, I can't find Cascade-SegNet or Cascade-DilatedNet in the code described in 'Scene Parsing through ADE20K Dataset' paper. so when will it be released? Thanks

Missing number of classes for eval and test

Firstly, thanks for creating the framework, it is a good way to get started with semantic segmentation. The issue is simple: when building the decoder in test.py (line 84) and eval.py (line 107), you do not set the number of classes from the arguments.

net_decoder = builder.build_decoder(arch=args.arch_decoder, fc_dim=args.fc_dim, weights=args.weights_decoder, use_softmax=True)

This leads to an error when loading any model with a different number of classes than 150.

Pre-trained models in pytorch?

Hi,
Are you going to release the pre-trained weights for one of the supported models in pytorch?

Thanks!

Class labels

Hello! @hangzhaomit thanks for the ade20k pytorch code!

I wonder if you could provide me an idea of how I can map RGB colors from the *_seg.png annotations and the names of the 150 classes. I am planning to redefine the labels to train with different target classes and I am a little bit confused about this mapping.

Thanks in advance!

Failed loading image/segmentation: slice indices must be integers or None or have an index method in Python 3.5

TLDR; For python 3.5+ users, In _scale_and_crop function, change the computation of x1 and y1 coordinates to
x1 = (w_s - cropSize) // 2 ; y1 = (h_s - cropSize) // 2
as single slash division (/) in py3.5 returns a float instead of integer in py2.7 therefore double slashes are required.

@hangzhaomit First of all thanks for this repository. The above error invokes the exception in dataset reader and makes the dummy segmentation masks of -1 value. Now if by any chance, none of the segmentation masks in a batch get read (for e.g. due to above stated py3.5 issue), we end up getting the following error

    File "/lustre/home/wgondal/semantic-segmentation-pytorch/utils.py", line 99, in accuracy
    acc = 1.0 * torch.sum(valid * (preds == segs)) / torch.sum(valid)
ZeroDivisionError: float division by zero

The reason is the following line in accuracy ftn in utils
valid = (segs >= 0)

It rightly checks for the segmentation masks value. However it would be great if you could provide a check if all the segmentation masks in a batch are dummy. OR In my opinion, the exception in datasets is misleading as it doesn't lead to figuring out the right error.

pre-activated ResNet Bottleneck

Have anyone tried pre-activated ResNet Bottleneck ?
Will it better than original ResNet in Semantic Segmentation ?

paper:
https://arxiv.org/pdf/1603.05027.pdf

RuntimeError: argument 1 (padding) must be tuple of int but got tuple of (float, float)

when I run the train.py with python 3.6, this error appeared, why?

/home/xxx/.pyenv/versions/anaconda3-4.4.0/bin/python /home/xxx/semantic-segmentation-pytorch/train.py
samples: 20210
samples: 64
1 Epoch = 1263 iters
Evaluating at 0 epochs...
Traceback (most recent call last):
File "/home/xxx/semantic-segmentation-pytorch/train.py", line 400, in
main(args)
File "/home/xxx/semantic-segmentation-pytorch/train.py", line 276, in main
evaluate(nets, loader_val, history, 0, args)
File "/home/xxx/semantic-segmentation-pytorch/train.py", line 133, in evaluate
pred, err = forward_with_loss(nets, batch_data, args, is_train=False)
File "/home/xxx/semantic-segmentation-pytorch/train.py", line 35, in forward_with_loss
pred = net_decoder(net_encoder(input_img))
File "/home/xxx/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/home/xxx/semantic-segmentation-pytorch/models.py", line 175, in forward
x = self.features(x)
File "/home/xxx/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/home/xxx/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/torch/nn/modules/container.py", line 67, in forward
input = module(input)
File "/home/xxx/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/home/xxx/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/torch/nn/modules/container.py", line 67, in forward
input = module(input)
File "/home/xxx/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/home/xxx/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/torchvision-0.1.9-py3.6.egg/torchvision/models/resnet.py", line 41, in forward
File "/home/xxx/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __cal__l
result = self.forward(*input, **kwargs)
File "/home/xxx/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 254, in forward
self.padding, self.dilation, self.groups)
File "/home/xxx/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/torch/nn/functional.py", line 51, in conv2d
_pair(0), groups, torch.backends.cudnn.benchmark, torch.backends.cudnn.enabled)
RuntimeError: argument 1 (padding) must be tuple of int but got tuple of (float, float)

cannot import name '_set_worker_signal_handlers'

Eval.py only support batch 1 ? Is there any way to decrease the eval time?

PyTorch performance?

I'm about to run the code by myself, but just eager to know ASAP.

How is the performance after running your code? (Pixel acc, mIoU, etc)

Process the batch_data to get training going

I have to do the following to process the batch_data to get the training going:

    batch_data = next(iterator)[0]
    for k in batch_data.keys():
      batch_data[k] = batch_data[k].cuda()

Without these fixes, the batch_data will be a length-one list and data in the dictionary will be on CPU instead of GPU (which causes error at conv1 layer).

Hope these help!

Segmentation fault ?

Could anyone help me about this problem?

./demo_test.sh: line 27: 221 Segmentation fault (core dumped) python3 -u test.py --model_path $MODEL_PATH --test_img $TEST_IMG --arch_encoder resnet50_dilated8 --arch_decoder ppm_bilinear_deepsup --fc_dim 2048 --result $RESULT_PATH

How to train the model on my own dataset

Hi, I just want to train the model on my own dataset with only one class. I modified the dataset into ADE format, and changed the "num_class" to 1, and when I do the train.py, it raised an error like:

/opt/conda/conda-bld/pytorch_1522170684359/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [959,0,0] Assertion t >= 0 && t < n_classes failed.
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1522170684359/work/aten/src/THC/generic/THCTensorMath.cu:15

I hope someone could give me some solution about that. Thanks!

IndexError: index 150 is out of bounds for axis 0 with size 150

When I run the train.py, the error appeared, why?

Traceback (most recent call last):
File "/home/xxx/semantic-segmentation-pytorch/train.py", line 400, in
main(args)
File "/home/xxx/semantic-segmentation-pytorch/train.py", line 276, in main
evaluate(nets, loader_val, history, 0, args)
File "/home/xxx/semantic-segmentation-pytorch/train.py", line 142, in evaluate
visualize(batch_data, pred, args)
File "/home/xxx/semantic-segmentation-pytorch/train.py", line 72, in visualize
pred_color = colorEncode(pred_, colors)
File "/home/xxx/semantic-segmentation-pytorch/utils.py", line 90, in colorEncode
np.tile(colors[label],
IndexError: index 150 is out of bounds for axis 0 with size 150

Train and evaluate on VOC

Hi, all

I am trying to train the network on VOC dataset and I did:

modify the dataset into ADE format
change the "num_class" to 21
segm += 1: add the ground truth label by 1 both for training and evaluation. (because in your code class 0 didn't count in loss and mIoU)

Am I doing this right?

Any help is appreciated.