Giter Site home page Giter Site logo

kazuto1011 / deeplab-pytorch Goto Github PK

View Code? Open in Web Editor NEW
1.1K 12.0 278.0 92.61 MB

PyTorch re-implementation of DeepLab v2 on COCO-Stuff / PASCAL VOC datasets

License: MIT License

Python 94.84% Shell 5.16%
pytorch deeplab semantic-segmentation cocostuff coco voc

deeplab-pytorch's Introduction

DeepLab with PyTorch

This is an unofficial PyTorch implementation of DeepLab v2 [1] with a ResNet-101 backbone.

  • COCO-Stuff dataset [2] and PASCAL VOC dataset [3] are supported.
  • The official Caffe weights provided by the authors can be used without building the Caffe APIs.
  • DeepLab v3/v3+ models with the identical backbone are also included (not tested).
  • torch.hub is supported.

Performance

COCO-Stuff

Train set Eval set Code Weight CRF? Pixel
Accuracy
Mean
Accuracy
Mean IoU FreqW IoU
10k train 10k val Official [2] 65.1 45.5 34.4 50.4
This repo Download 65.8 45.7 34.8 51.2
67.1 46.4 35.6 52.5
164k train 164k val This repo Download 66.8 51.2 39.1 51.5
67.6 51.5 39.7 52.3

† Images and labels are pre-warped to square-shape 513x513
‡ Note for SPADE followers: The provided COCO-Stuff 164k weight has been kept intact since 2019/02/23.

PASCAL VOC 2012

Train set Eval set Code Weight CRF? Pixel
Accuracy
Mean
Accuracy
Mean IoU FreqW IoU
trainaug val Official [3] - - 76.35 -
- - 77.69 -
This repo Download 94.64 86.50 76.65 90.41
95.04 86.64 77.93 91.06

Setup

Requirements

Required Python packages are listed in the Anaconda configuration file configs/conda_env.yaml. Please modify the listed cudatoolkit=10.2 and python=3.6 as needed and run the following commands.

# Set up with Anaconda
conda env create -f configs/conda_env.yaml
conda activate deeplab-pytorch

Download datasets

Download pre-trained caffemodels

Caffemodels pre-trained on COCO and PASCAL VOC datasets are released by the DeepLab authors. In accordance with the papers [1,2], this repository uses the COCO-trained parameters as initial weights.

  1. Run the follwing script to download the pre-trained caffemodels (1GB+).
$ bash scripts/setup_caffemodels.sh
  1. Convert the caffemodels to pytorch compatibles. No need to build the Caffe API!
# Generate "deeplabv1_resnet101-coco.pth" from "init.caffemodel"
$ python convert.py --dataset coco
# Generate "deeplabv2_resnet101_msc-vocaug.pth" from "train2_iter_20000.caffemodel"
$ python convert.py --dataset voc12

Training & Evaluation

To train DeepLab v2 on PASCAL VOC 2012:

python main.py train \
    --config-path configs/voc12.yaml

To evaluate the performance on a validation set:

python main.py test \
    --config-path configs/voc12.yaml \
    --model-path data/models/voc12/deeplabv2_resnet101_msc/train_aug/checkpoint_final.pth

Note: This command saves the predicted logit maps (.npy) and the scores (.json).

To re-evaluate with a CRF post-processing:

python main.py crf \
    --config-path configs/voc12.yaml

Execution of a series of the above scripts is equivalent to bash scripts/train_eval.sh.

To monitor a loss, run the following command in a separate terminal.

tensorboard --logdir data/logs

Please specify the appropriate configuration files for the other datasets.

Dataset Config file #Iterations Classes
PASCAL VOC 2012 configs/voc12.yaml 20,000 20 foreground + 1 background
COCO-Stuff 10k configs/cocostuff10k.yaml 20,000 182 thing/stuff
COCO-Stuff 164k configs/cocostuff164k.yaml 100,000 182 thing/stuff

Note: Although the label indices range from 0 to 181 in COCO-Stuff 10k/164k, only 171 classes are supervised.

Common settings:

  • Model: DeepLab v2 with ResNet-101 backbone. Dilated rates of ASPP are (6, 12, 18, 24). Output stride is 8.
  • GPU: All the GPUs visible to the process are used. Please specify the scope with CUDA_VISIBLE_DEVICES=.
  • Multi-scale loss: Loss is defined as a sum of responses from multi-scale inputs (1x, 0.75x, 0.5x) and element-wise max across the scales. The unlabeled class is ignored in the loss computation.
  • Gradient accumulation: The mini-batch of 10 samples is not processed at once due to the high occupancy of GPU memories. Instead, gradients of small batches of 5 samples are accumulated for 2 iterations, and weight updating is performed at the end (batch_size * iter_size = 10). GPU memory usage is approx. 11.2 GB with the default setting (tested on the single Titan X). You can reduce it with a small batch_size.
  • Learning rate: Stochastic gradient descent (SGD) is used with momentum of 0.9 and initial learning rate of 2.5e-4. Polynomial learning rate decay is employed; the learning rate is multiplied by (1-iter/iter_max)**power at every 10 iterations.
  • Monitoring: Moving average loss (average_loss in Caffe) can be monitored in TensorBoard.
  • Preprocessing: Input images are randomly re-scaled by factors ranging from 0.5 to 1.5, padded if needed, and randomly cropped to 321x321.

Processed images and labels in COCO-Stuff 164k:

Data

Inference Demo

You can use the pre-trained models, the converted models, or your models.

To process a single image:

python demo.py single \
    --config-path configs/voc12.yaml \
    --model-path deeplabv2_resnet101_msc-vocaug-20000.pth \
    --image-path image.jpg

To run on a webcam:

python demo.py live \
    --config-path configs/voc12.yaml \
    --model-path deeplabv2_resnet101_msc-vocaug-20000.pth

To run a CRF post-processing, add --crf. To run on a CPU, add --cpu.

Misc

torch.hub

Model setup with two lines

import torch.hub
model = torch.hub.load("kazuto1011/deeplab-pytorch", "deeplabv2_resnet101", pretrained='cocostuff164k', n_classes=182)

Difference with Caffe version

  • While the official code employs 1/16 bilinear interpolation (Interp layer) for downsampling a label for only 0.5x input, this codebase does for both 0.5x and 0.75x inputs with nearest interpolation (PIL.Image.resize, related issue).
  • Bilinear interpolation on images and logits is performed with the align_corners=False.

Training batch normalization

This codebase only supports DeepLab v2 training which freezes batch normalization layers, although v3/v3+ protocols require training them. If training their parameters on multiple GPUs as well in your projects, please install the extra library below.

pip install torch-encoding

Batch normalization layers in a model are automatically switched in libs/models/resnet.py.

try:
    from encoding.nn import SyncBatchNorm
    _BATCH_NORM = SyncBatchNorm
except:
    _BATCH_NORM = nn.BatchNorm2d

References

  1. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE TPAMI, 2018.
    Project / Code / arXiv paper

  2. H. Caesar, J. Uijlings, V. Ferrari. COCO-Stuff: Thing and Stuff Classes in Context. In CVPR, 2018.
    Project / arXiv paper

  3. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman. The PASCAL Visual Object Classes (VOC) Challenge. IJCV, 2010.
    Project / Paper

deeplab-pytorch's People

Contributors

johnnylu305 avatar kazuto1011 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deeplab-pytorch's Issues

A issue related the testing crop size

I find that you choose a small center crop for testing, thus you are not computing the mIou over the whole image, thus I am wondering this is wrong or this is a standard operation for coco-stuff dataset.

Pretrained weights for coco stuff 164K

Are there pretrained weights available for the coco stuff 164K dataset?

If I understand the readme correctly, the lines:

bash scripts/setup_caffemodels.sh

This will download the pretrained weights for the 80 class COCO. Is there something similar for the 182 class dataset or do I need to train it myself?

ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm)

Run python eval.py --config config/cocostuff.yaml -- model-path. /checkpoint_final. PTH
Unexpected bus ERROR encountered in worker. This might be caused by insufficient Shared memory (SHM)

After query, the reason for analysis should be lack of obvious memory. I modified to reduce BATCH_SIZE: 1 in the stuff cocoa.yaml configuration file, and still reported the same error after 13 iterations.

The graphics card Running on Tesla p100-pcie-16gb

data preprocessing difference between this repo and the original paper

Hi! I found that you resize the images directly in data preprocessing which can change length-width ratio and it is different from the process in origin paper, and when I loaded the ckpt 'deeplabv2_resnet101_VOC2012_trainaug.pth' which converted from original caffe model and make evaluation, the miou was 74.767, it has some difference from the paper's result(76.3), I think the difference of preprocessing may cause this loss of miou. I used another script (which borrowed from this repo) to do the evaluation, it can get the miou of 76.24, which is very close to the result of paper

"main.py test" running slowly

I run "main.py test" on cloud GPU Tesla P100-PCIE-16GB and got 39.00s/it, with the config BATCH_SIZE 16 and NUM_WORKERS 8 and no CRF. That's 2.5s/image.
I tried some other BATCH_SIZE and NUM_WORKERS, and still about 2.5s/image. Is this situation normal?

(deeplab-pytorch) [dev@ecs-6f75 deeplab-pytorch]$ python main.py test --config config/cocostuff164k.yaml --model-path /data/model/deeplab-pytorch-master/cocostuff164k_iter100k.pth
Mode: test
cuda = True
Device: Tesla P100-PCIE-16GB

dataset = Dataset CocoStuff164k
Number of datapoints: 5000
Split: val2017
Root Location: /data/data/COCO/2017

len(loader) = 313
20%|████████████████▎ | 64/313 [41:36<2:41:50, 39.00s/it]

GPU usage:
image

The loss can't be reduced when I use VOC dataset

I tried to use this code for the segmentation of VOC dataset. It has about 2900 images, I use the deeplabV2 model.
The loss hardly decreases after 2k steps (I use 3 as batch size ), and it's around 0.7-1.
I don't know why.

Make up the gap between reimplementation and official one

I find in your code, both traning and test need multi-scale input. But as I read, MSC is not necessary for traning. Today I ran an experiment, train_no_MSC.test_with_MSC, and it easily surpassed the official values. The results of my experiments are as bellow:
DeepLabv2(no CRF):

pAcc | mAcc | mIoU | fIoU
65.76 | 45.04 | 34.52 | 50.95

No matter what, thanks for your code.
Your repo is the best reimplementation of deeplab-pytorch, as least in my mind.

Invalid split name: train_aug

When I try to train the VOC2012 dataset. It show the error : ValueError: Invalid split name: train_aug

It happen in the voc.py file line 43:
raise ValueError("Invalid split name: {}".format(self.split))

I tried to set the root to the "/data/datasets/voc12/VOCdevkit" but it still cannot fix the problem.

What should I do?

Output size of the deeplabv3+ model

When I run the deeplab v3+ model in "libs/models/deeplabv3plus.py", the size of output I get is 1 x 21 x 260 x 260, not the input image size (but half). Is that correct? In segmentation, the output usually has the same size as input.

cuda error caused by negative tensor value

Hi, thanks for your nice code again! But I got a wired error when run your code, error info as below:

THCudaCheck FAIL file=/opt/conda/conTHC/generic/THCTensorCopy.c line=20 error=59 : device-side assert triggered Traceback (most recent call last): File "train.py", line 229, in <module> main() File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/site-packages/click/core.py", line return self.main(*args, **kwargs) File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/site-packages/click/core.py", line rv = self.invoke(ctx) File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/site-packages/click/core.py", line return ctx.invoke(self.callback, **ctx.params) File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/site-packages/click/core.py", line return callback(*args, **kwargs) File "train.py", line 183, in main target_ = target_.to(device) RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_TensorCopy.c:20 Exception ignored in: <bound method _DataLoaderIter.__del__ of <torch.utils.data.dataloader._DataLoad Traceback (most recent call last): File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/site-packages/torch/utils/data/data self._shutdown_workers() File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/site-packages/torch/utils/data/data self.worker_result_queue.get() File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/multiprocessing/queues.py", line 33 return ForkingPickler.loads(res) File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/site-packages/torch/multiprocessinge_fd fd = df.detach() File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/multiprocessing/resource_sharer.py" with _resource_sharer.get_connection(self._id) as conn: File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/multiprocessing/resource_sharer.py" c = Client(address, authkey=process.current_process().authkey) File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/multiprocessing/connection.py", lin c = SocketClient(address) File "/data1/jayzjwang/opt/anaconda3/envs/deeplab/lib/python3.5/multiprocessing/connection.py", lin s.connect(address) ConnectionRefusedError: [Errno 111] Connection refused

it may be caused by negative tensor value when set ignore_label to -1 in preprocessing label map according to this issue torch/cutorch#708, after I set the ignore label to 255 (I make minor change to your codes to run it on voc12), it can work fine

Show performance on main page

I am the author of COCO-Stuff and I must say this is pretty cool stuff!
Do you mind showing the validation set performance (mean IOU) on this GitHub page?
It would be interesting to compare to the performance I achieved in the arXiv paper.
Furthermore, in a few days we will publish the new COCO-Stuff with 164K images.
Would be cool if we could then update this code accordingly.

ValueError: Expected input batch_size (182) to match target batch_size (2).

When I run the deeplabv2 model on cocostuff164k, with batch_size=2,I get the error

#ValueError: Expected input batch_size (182) to match target batch_size (2).

And I check the output's of model is [2,182,41,41],and the labels'shape is [2,41,41],I think the problem is loss function.And I do not how to fix this issue.Could you help me ?Thanks

Train with COCO-Stuff 164K

Hi. With your permission I'd love to recommend this repo on the COCO-Stuff page. Do you think you could train it on the train set of COCO-Stuff 164K and provide that model? That should also significantly boost the performance. Let me know if you have further questions.

Deeplab V3+

Hi,
Can I simply change the model you used in train.py in to deeplab v3+?
I was wondering why you didn't do so, should it be quite straight forward?

Thanks!

a naive question about crf.

Thanks for your excellent code!
I am a novice at crf. I notice that crf is used during evaling but not during training. Could someone tell me why? Thanks for your answer!

Crashing on test

Hi Kazuto,

For some reason when I launch the test on coco stuff 10k, the script crashes at about 38%. This is the error message I got:

python main.py test --config config/cocostuff10k.yaml --model-path data/models/deeplab_resnet101/cocostuff10k/checkpoint_final.pth
Mode: test
Device: TITAN X (Pascal)
/hardmnt/kraken0/home/poiesi/data/research/deeplearning/deeplab-pytorch/libs/utils/metric.py:21: RuntimeWarning: invalid value encountered in true_divide
  acc_cls = np.diag(hist) / hist.sum(axis=1)
/hardmnt/kraken0/home/poiesi/data/research/deeplearning/deeplab-pytorch/libs/utils/metric.py:23: RuntimeWarning: invalid value encountered in true_divide
  iu = np.diag(hist) / (hist.sum(axis=1) + hist.sum(axis=0) - np.diag(hist))

Do you know what it might be due to?

The ImageNet training

I am confused about the ImageNet training.
Is your model architectures of ImageNet is resnet101?
Thank you very much

Misaligned segmentation masks

Downsampling by PyTorch nearest interpolation in main.py results in misaligned masks. That's going to be replaced with Pillow nearest interpolation. Codes and pre-trained models will be updated sometime soon.

Issues about resize ground-truth

During evaluation, shouldn't you resize the prediction to have the same size of ground-truth and compute these stats? I don't think change ground-truth is the correct thing to do ideally.

Missing keys in state_dict

I try to run cocostuff pretrained model on a voc12 image using demo.py. After converting caffemodel of coco into pytorch model, there arises an error when runing demo.py in line 57 , which is model.load_state_dict(state_dict) .

RuntimeError: Error in loading state_dict for MSC:
Missing key(s) in state_dict : "scale.aspp.stages.c0.bias" , "scale.aspp.stages.c0.weight" ,"scale.aspp.stages.c1.bias" , "scale.aspp.stages.c1.weight" , "scale.aspp.stages.c2.bias" , "scale.aspp.stages.c2.weight" , "scale.aspp.stages.c3.bias" , "scale.aspp.stages.c3.weight".

and when I look for display information when converting coco_init caffemodel into .pth file , indeed I don't see any related information too. It seems there is no scale.aspp related layers' parameters. I don't know why and how to solve this issue. Thanks!

Optimizer initialization issue in DeepLabv3+

Sorry to bother!
Recently, I try to use DeepLabv3+ and train the new model.
Also, I'm very thank that you can provide the code of model.
However, there is some error that will occur:

TypeError: optimizer can only optimize Tensors, but one of the params is NoneType

I think the issue is that the bias term in first convolution layer is set as False.
This is the default setting in standard ResNet.
However, the initialization part will yield the bias term into SGD constructor.
Hence the SGD raise Exception since the param is Nonetype.
Here is the part of the SGD source:

for param in param_group['params']:
    if not isinstance(param, Variable):
        raise TypeError("optimizer can only optimize Variables, "
                        "but one of the params is " + torch.typename(param))
    if not param.requires_grad:
        raise ValueError("optimizing a parameter that doesn't require gradients")
    if not param.is_leaf:
        raise ValueError("can't optimize a non-leaf Variable")

I give some advice at the end!
Maybe we can add some constraint to check if the bias term is None in train.py.
Just like the following:

def get_lr_params(model, key):
    # For Dilated FCN
    if key == "1x":
        for m in model.named_modules():
            if "layer" in m[0]:
                if isinstance(m[1], nn.Conv2d):
                    for p in m[1].parameters():
                        yield p
    # For conv weight in the ASPP module
    if key == "10x":
        for m in model.named_modules():
            if "aspp" in m[0]:
                if isinstance(m[1], nn.Conv2d):
                    yield m[1].weight
    # For conv bias in the ASPP module
    if key == "20x":
        for m in model.named_modules():
            if "aspp" in m[0]:
                if isinstance(m[1], nn.Conv2d):
                    if m[1].bias is not None:    # Add this line
                        yield m[1].bias

After this small revision, the code can run normally.

CUDA error: run out of memery when training

Hi , I'm trying to use your code on voc2012 dataset , when I train the net , it says CUDA error : out ou memory. I got 2 GPUs and their information presented below:

image

and error message is like this:
image

May I ask how much memory did you command when training on coco stuff dataset on each GPU? And how can I slove this issue? Change GPU server or maybe somthing else?

hubconf.py KeyError/NotImplementedError/TypeError

Hi, hubconf.py is wrong -- either you get a KeyError if you don't include pretrained in kwargs or you get a NotImplementedError if pretrained is in kwargs and evaluates to True or you get a TypeError (TypeError: __init__() got an unexpected keyword argument 'pretrained') from line 26 if you include pretrained in kwargs that evaluates to False

That means, that right now, hubconf.py is unusable, an easy fix would be to either accept unknown kwargs by DeepLabV2 constructor or check for presence of the key before checking for True value of kwargs['pretrained'] on line 17 in hubconf.py

What the meaning of ITER_SIZE and always get stuck during training?

Hi, When I train voc dataset using your code, it always stuck at some time (when I use 4xM40 to train my model, it always stuck in iter 282/20000), no error no crash, just got stuck, do you have any idea about this error? (miou is about 74 after train about 7 hours, which is reasonable).

and I saw you set the ITER_SIZE in voc12.yaml, what the purpose of this parameter and does it have any connection with my error?

Thanks in advance!

different layer names with official pytorch ResNet

Although the _ConvBatchNormReLU abstraction is very handy for building the model, it requires extra conversion when loading a pretrained ResNet model as the names of parameters differ. Could you use the official pytorch ResNet code? Thanks.

align_corners for interpolation in eval.py

logits = F.interpolate(logits, size=images.shape[2:], mode="bilinear")

I check the implementation for caffe and pytorch in detail, and find that
you should explicitly set align_corners=True for interpolation in pytorch. As in caffe's Interp layer, align_corners is the default setting.
Before pytorch 0.4.0, alighn_corners=True is the default setting, but not the case nowaday, after 0.4.0's release.

About the Train

Thanks for your work! Could you please provide the training scripts of the DeepLab v3/v3+ model? Thank you very much!

Could you please check that the way you read labels gives wrong labels?

As coded in cocostuff.py, you read the the semantic png map as gray-scale.

label = cv2.imread(label_path, cv2.IMREAD_GRAYSCALE)

It's convenient but actually this way give labels from 0 -> 181, while the original labels contains 182 (including unlabeled) (ref: https://github.com/nightrome/cocostuff/blob/master/labels.txt).
I also double-checked the images corresponding to the labels this way give as follows:
Image for label 1, yours is person but original is bicyble:
000000224051

Image for label 23, yours is bear but original is zebra:
000000031269

I understand your way still works but should we add +1 for the labels?

High MIoU of VOC2012 on train2_iter_20000.caffemodel

When I tried to get the test scores of PASCAL VOC2012 on Deeplab v2 with Resnet101. The scores from train2_iter_20000.caffemodel are somehow wired:

"Frequency Weighted IoU": 0.9649676213079842,
"Mean Accuracy": 0.9353094054537866,
"Mean IoU": 0.9088821693273592,
"Pixel Accuracy": 0.9819734262014723

The scores from train1_iter_20000.caffemodel is reasonable, e.g. 0.7642 MIOU before CRF. But why I got such higher scores on train2_iter_20000.caffemodel?

About the MIOU in the exp results repo

Hello!kazuto, thank you for your nice work!
According to the description in the Readme.md:

The label indices range from 0 to 181 and the model outputs a 182-dim categorical distribution, but only 171 classes are supervised with COCO-Stuff.

I want to know is that mean there is no 11 classes label in the coco stuff dataset actually. and your repo MIOU 37.6 is the MIOU of 171 class?

Download link to trained model?

Hi @kazuto1011 ,
Thank you for making your code public.
I was hoping that you would put a download link of your trained model in the readme so that people can use it without training themselves.
Thanks!

RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:383

Checkpoint dst: data/models/voc12/deeplabv2_resnet101_msc/train_aug
0%| | 0/20000 [00:00<?, ?it/s]THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=11 : invalid argument

Traceback (most recent call last):
File "main.py", line 503, in
main()
File "/usr/local/anaconda3/lib/python3.6/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/usr/local/anaconda3/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/anaconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/anaconda3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/anaconda3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "main.py", line 229, in train
logits = model(images.to(device))
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/media/wj/bangong/wang/deeplab-pytorch-master/libs/models/msc.py", line 28, in forward
logits = self.base(x)
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 338, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:383

Support customize input for DeeplabV3+?

Hi:
Thanks for the DeepLabV3+ in pytorch implementation, it may the first DeepLabV3+ pytorch implementation in github and i wait for this for a while.
My problem is it works fine in the default input size (513,513),
But when i tried to implement the DeepLabV3+ with a different input size, like (256,513).
it given the following error:

>>> model = DeepLabV3Plus(n_classes=21, n_blocks=[3, 4, 23, 3], pyramids=[6, 12, 18])
>>> image = torch.autograd.Variable(torch.randn(1, 3, 256, 513), volatile=True)
>>> print model(image)[0].size()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/skin_demo/Tooth/deeplabV3/deeplab-pytorch-master/libs/models/msc.py", line 37, in forward
    self.interp100(output050),
  File "/usr/local/lib/python2.7/dist-packages/torch/functional.py", line 64, in stack
    return torch.cat(inputs, dim)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 260 and 132 in dimension 4 at /pytorch/torch/lib/TH/generic/THTensorMath.c:2897

My pytorch version is '0.3.1'
Any suggestion to support customize input?

about coco stuff result format ?

  • first thanks you works.

  • i want to use the coco-stuff Segmentation Results Format ? how can i complete this ? thanks !

  • and why you change the labels_2.txt and delete the 0 unlabeled ?

ResBlock's stride

I wonder why you set stride=2 when implement the 'layer3' as:
self.add_module("layer3", _ResBlock(n_blocks[1], 256, 128, 512, 2, 1))
Is there any reason to do that?

VOC12

Hi Kazuto,

Are you planning to update your repo adding the full support for VOC12 dataset? Some parts have the possibility to configure it, some others not, i.e. get_dataset().

Thanks

Get "StopIteration" while training

Hi, thanks for the code

I got some problem at "python train.py --config config/cocostuff164k.yaml" part.
It pop out "StopIteration" error soon after the training process began.

Error message i get

(deeplab-pytorch) b03901017@dcs02:~/DeepLab/v2/deeplab-pytorch$ python2 train.py --config config/cocostuff164k.yaml
Running on GeForce GTX 1080
/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='elementwise_mean' instead.
warnings.warn(warning.format(ret))
0%| | 0/100000 [00:00<?, ?it/s]Exception KeyError: KeyError(<weakref at 0x7fb3542838e8; to 'tqdm' at 0x7fb354270f50>,) in <bound method tqdm.del of 0%| | 0/100000 [00:00<?, ?it/s]> ignored
Traceback (most recent call last):
File "train.py", line 236, in
main()
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "train.py", line 174, in main
data, target = next(loader_iter)
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 313, in next
indices = next(self.sample_iter) # may raise StopIteration
StopIteration

Batch Normalization freeze

Hi @kazuto1011 , thanks for this wonderful work. While reading your code, I noticed that before the training, you freezed the batch_normalization layers.

model.train()
model.module.scale.freeze_bn()

I am wondering what is the propose of this freeze, and if i train a network from scratch, does it still make sense to impose the freeze_bn()?

Thanks by advance for your kind reply!

simple questions about your code

Hi ! thanks for sharing your code.
1, I want to know if the key(name) match the pretraind model in torch vision in your resnet models so that I can load the pretrained models
2, I test your mode but I found the input and output scale doesn't match. I haven't see the code in details, but based on my knowledge of semantic segmetation the input and output scale should match? I didn't run on COCO, I run some code on cityscapes,and VOC

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.