Giter Site home page Giter Site logo

deeplabv3plus-pytorch's Introduction

DeepLabv3Plus-Pytorch

Pretrained DeepLabv3, DeepLabv3+ for Pascal VOC & Cityscapes.

Quick Start

1. Available Architectures

DeepLabV3 DeepLabV3+
deeplabv3_resnet50 deeplabv3plus_resnet50
deeplabv3_resnet101 deeplabv3plus_resnet101
deeplabv3_mobilenet deeplabv3plus_mobilenet
deeplabv3_hrnetv2_48 deeplabv3plus_hrnetv2_48
deeplabv3_hrnetv2_32 deeplabv3plus_hrnetv2_32
deeplabv3_xception deeplabv3plus_xception

please refer to network/modeling.py for all model entries.

Download pretrained models: Dropbox, Tencent Weiyun

Note: The HRNet backbone was contributed by @timothylimyl. A pre-trained backbone is available at google drive.

2. Load the pretrained model:

model = network.modeling.__dict__[MODEL_NAME](num_classes=NUM_CLASSES, output_stride=OUTPUT_SRTIDE)
model.load_state_dict( torch.load( PATH_TO_PTH )['model_state']  )

3. Visualize segmentation outputs:

outputs = model(images)
preds = outputs.max(1)[1].detach().cpu().numpy()
colorized_preds = val_dst.decode_target(preds).astype('uint8') # To RGB images, (N, H, W, 3), ranged 0~255, numpy array
# Do whatever you like here with the colorized segmentation maps
colorized_preds = Image.fromarray(colorized_preds[0]) # to PIL Image

4. Atrous Separable Convolution

Note: All pre-trained models in this repo were trained without atrous separable convolution.

Atrous Separable Convolution is supported in this repo. We provide a simple tool network.convert_to_separable_conv to convert nn.Conv2d to AtrousSeparableConvolution. Please run main.py with '--separable_conv' if it is required. See 'main.py' and 'network/_deeplab.py' for more details.

5. Prediction

Single image:

python predict.py --input datasets/data/cityscapes/leftImg8bit/train/bremen/bremen_000000_000019_leftImg8bit.png  --dataset cityscapes --model deeplabv3plus_mobilenet --ckpt checkpoints/best_deeplabv3plus_mobilenet_cityscapes_os16.pth --save_val_results_to test_results

Image folder:

python predict.py --input datasets/data/cityscapes/leftImg8bit/train/bremen  --dataset cityscapes --model deeplabv3plus_mobilenet --ckpt checkpoints/best_deeplabv3plus_mobilenet_cityscapes_os16.pth --save_val_results_to test_results

6. New backbones

Please refer to this commit (Xception) for more details about how to add new backbones.

7. New datasets

You can train deeplab models on your own datasets. Your torch.utils.data.Dataset should provide a decoding method that transforms your predictions to colorized images, just like the VOC Dataset:

class MyDataset(data.Dataset):
    ...
    @classmethod
    def decode_target(cls, mask):
        """decode semantic mask to RGB image"""
        return cls.cmap[mask]

Results

1. Performance on Pascal VOC2012 Aug (21 classes, 513 x 513)

Training: 513x513 random crop
validation: 513x513 center crop

Model Batch Size FLOPs train/val OS mIoU Dropbox Tencent Weiyun
DeepLabV3-MobileNet 16 6.0G 16/16 0.701 Download Download
DeepLabV3-ResNet50 16 51.4G 16/16 0.769 Download Download
DeepLabV3-ResNet101 16 72.1G 16/16 0.773 Download Download
DeepLabV3Plus-MobileNet 16 17.0G 16/16 0.711 Download Download
DeepLabV3Plus-ResNet50 16 62.7G 16/16 0.772 Download Download
DeepLabV3Plus-ResNet101 16 83.4G 16/16 0.783 Download Download

2. Performance on Cityscapes (19 classes, 1024 x 2048)

Training: 768x768 random crop
validation: 1024x2048

Model Batch Size FLOPs train/val OS mIoU Dropbox Tencent Weiyun
DeepLabV3Plus-MobileNet 16 135G 16/16 0.721 Download Download
DeepLabV3Plus-ResNet101 16 N/A 16/16 0.762 Download N/A

Segmentation Results on Pascal VOC2012 (DeepLabv3Plus-MobileNet)

Segmentation Results on Cityscapes (DeepLabv3Plus-MobileNet)

Visualization of training

trainvis

Pascal VOC

1. Requirements

pip install -r requirements.txt

2. Prepare Datasets

2.1 Standard Pascal VOC

You can run train.py with "--download" option to download and extract pascal voc dataset. The defaut path is './datasets/data':

/datasets
    /data
        /VOCdevkit 
            /VOC2012 
                /SegmentationClass
                /JPEGImages
                ...
            ...
        /VOCtrainval_11-May-2012.tar
        ...

2.2 Pascal VOC trainaug (Recommended!!)

See chapter 4 of [2]

    The original dataset contains 1464 (train), 1449 (val), and 1456 (test) pixel-level annotated images. We augment the dataset by the extra annotations provided by [76], resulting in 10582 (trainaug) training images. The performance is measured in terms of pixel intersection-over-union averaged across the 21 classes (mIOU).

./datasets/data/train_aug.txt includes the file names of 10582 trainaug images (val images are excluded). Please to download their labels from Dropbox or Tencent Weiyun. Those labels come from DrSleep's repo.

Extract trainaug labels (SegmentationClassAug) to the VOC2012 directory.

/datasets
    /data
        /VOCdevkit  
            /VOC2012
                /SegmentationClass
                /SegmentationClassAug  # <= the trainaug labels
                /JPEGImages
                ...
            ...
        /VOCtrainval_11-May-2012.tar
        ...

3. Training on Pascal VOC2012 Aug

3.1 Visualize training (Optional)

Start visdom sever for visualization. Please remove '--enable_vis' if visualization is not needed.

# Run visdom server on port 28333
visdom -port 28333

3.2 Training with OS=16

Run main.py with "--year 2012_aug" to train your model on Pascal VOC2012 Aug. You can also parallel your training on 4 GPUs with '--gpu_id 0,1,2,3'

Note: There is no SyncBN in this repo, so training with multple GPUs and small batch size may degrades the performance. See PyTorch-Encoding for more details about SyncBN

python main.py --model deeplabv3plus_mobilenet --enable_vis --vis_port 28333 --gpu_id 0 --year 2012_aug --crop_val --lr 0.01 --crop_size 513 --batch_size 16 --output_stride 16

3.3 Continue training

Run main.py with '--continue_training' to restore the state_dict of optimizer and scheduler from YOUR_CKPT.

python main.py ... --ckpt YOUR_CKPT --continue_training

3.4. Testing

Results will be saved at ./results.

python main.py --model deeplabv3plus_mobilenet --enable_vis --vis_port 28333 --gpu_id 0 --year 2012_aug --crop_val --lr 0.01 --crop_size 513 --batch_size 16 --output_stride 16 --ckpt checkpoints/best_deeplabv3plus_mobilenet_voc_os16.pth --test_only --save_val_results

Cityscapes

1. Download cityscapes and extract it to 'datasets/data/cityscapes'

/datasets
    /data
        /cityscapes
            /gtFine
            /leftImg8bit

2. Train your model on Cityscapes

python main.py --model deeplabv3plus_mobilenet --dataset cityscapes --enable_vis --vis_port 28333 --gpu_id 0  --lr 0.1  --crop_size 768 --batch_size 16 --output_stride 16 --data_root ./datasets/data/cityscapes 

Reference

[1] Rethinking Atrous Convolution for Semantic Image Segmentation

[2] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

deeplabv3plus-pytorch's People

Contributors

aoxu2000 avatar danielzhangau avatar dawars avatar horseee avatar m-just avatar timothylimyl avatar vainf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deeplabv3plus-pytorch's Issues

[!] Retrain

[!] Retrain输出这个是什么原因啊?

Question about padding in Mobilenetv2

Dear VainF,

self.input_padding = fixed_padding( 3, dilation )

x_pad = F.pad(x, self.input_padding)

Notice that these two lines are different from the original Mobilenetv2. Could you please share the reason why you implement padding in these two lines and what's consequence of removing them?

Thank you very much.

With kind regards.

Training and Version

Hi, Could please give some clear instruction of the changes in Main.py and cityscapes.py if I want to train my own data set. Also tell the versions of libraries you used.

question about --continue training

Hello, thanks for your nice work.
I met a bug on --continue training.

python main.py --model deeplabv3plus_mobilenet --dataset cityscapes --gpu_id 6 --lr 0.1 --crop_size 768 --batch_size 12 --output_stride 16 --data_root ./datasets/data/cityscapes --ckpt checkpoints/best_deeplabv3plus_mobilenet_cityscapes_os16.pth --continue_training

J9R5PWZFB6({0T6096AE%0V

Can you fix it?

Pre-training model

Hello author, can the pre-training model provide the download address in China? Like Baidu Cloud

Questions about evaluating cityscapes dataset

Thanks for your great work, I just wandering how do you evaluate cityscapes dataset, after reading your code, it seems like you trained the model on input size 512x512, and directly evaluate on the original image size(1024 x 2048):

  if opts.crop_val:
            val_transform = et.ExtCompose([
                et.ExtResize(opts.crop_size),     # random crop to 512 x 512
                et.ExtCenterCrop(opts.crop_size),
                et.ExtToTensor(),
                et.ExtNormalize(mean=[0.485, 0.456, 0.406],
                                std=[0.229, 0.224, 0.225]),
            ])
        else:
            val_transform = et.ExtCompose([
                et.ExtToTensor(),    
                et.ExtNormalize(mean=[0.485, 0.456, 0.406],
                                std=[0.229, 0.224, 0.225]),
            ])

Why use the same model to evaluate the different input image size? Thanks.

Question about train

Hi

Thanks for your repo! I successfully trained the DeeplabV3Plus-Mobilenetv2 model on the Pascal2012 dataset, but my mIOU is only 69.41%(python3 main.py --model deeplabv3plus_mobilenet --separable_conv --gpu_id 0 --year 2012_aug --crop_val --lr 0.007 --crop_size 513 --batch_size 10 --output_stride 16).How can I improve?
Another question, why is the experimental section of mobilenetv2's paper up to 75.70% mIOU?Was it because his model had been pretrained on COCO?I'm so confused...

Look forward to your answers!

Nice Repo!

This repo is really nice, performance on pascal voc could be reproduce using 2 gpus with batchsize=16.

Testing 问题

请问新添加的test脚本预测后的效果是什么样的?我使用自己训练的模型进行预测后得到的是全黑的图片,是否是我中间的流程出了问题?

--year 2012_aug

Hi VainF,

I am able to train --year 2012 with following command:

python main.py --model deeplabv3plus_mobilenet --enable_vis --vis_port 28333 --gpu_id 0 --year 2012 --crop_val --lr 0.01 --crop_size 513 --batch_size 14 --output_stride 16 --continue_training

But when I try to train --year 2012_aug, I encounter following error:


Setting up a new session...
Device: cuda
Dataset: voc, Train set: 10582, Val set: 1449
[!] Retrain
Traceback (most recent call last):
  File "main.py", line 390, in <module>
    main()
  File "main.py", line 335, in main
    for (images, labels) in train_loader:
  File "/home/paul/segmentation/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/home/paul/segmentation/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/home/paul/segmentation/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/home/paul/segmentation/lib/python3.6/site-packages/torch/_utils.py", line 425, in reraise
    raise self.exc_type(msg)
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/paul/segmentation/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/paul/segmentation/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/paul/segmentation/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/paul/segmentation/DeepLabV3Plus-Pytorch/datasets/voc.py", line 145, in __getitem__
    target = Image.open(self.masks[index])
  File "/home/paul/segmentation/lib/python3.6/site-packages/PIL/Image.py", line 2912, in open
    fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: './datasets/data/VOCdevkit/VOC2012/SegmentationClassAug/2008_002913.png'

In my ./datasets/data/VOCdevkit/VOC2012/SegmentationClassAug directory, I have train_aug.txt file in it. What am I missing? Please help. Thanks a lot.

P.S. I did check 2008_002913.png exists under ./datasets/data/VOCdevkit/VOC2012/JPEGImages
So do I need to copy all the .png files to ./datasets/data/VOCdevkit/VOC2012/SegmentationClassAug? or what should I do to fix this problem? Thanks for your help.

Edited: after follow the instruction to download labels from the dropbox and extract to ./datasets/data/VOCdevkit/VOC2012/SegmentationClassAug then every thing works as expected.

can not upzip the file DeepLabV3Plus-ResNet101

Hello

i can not unzip the file best_deeplabv3plus_resnet101_cityscapes_os16.pth.tar

is it the file damaged or which tool i should use to unzip the file?

thank you!

Best Reguards
Yiru

Error while loading DeepLabV3Plus-ResNet50 model from checkpoint with --separable_conv flag

It seems that DeepLabV3Plus-ResNet50 model is not trained while --separable_conv flag is active because trying to load the weights when this flag is active, causes an error at the checkpoint loading stage.

Command:

python main.py --model deeplabv3plus_resnet50 --separable_conv --ckpt checkpoints/best_deeplabv3plus_resnet50_voc_os16.pth --test_only --save_val_results

Error:

RuntimeError: Error(s) in loading state_dict for DeepLabV3:
Missing key(s) in state_dict: "classifier.aspp.convs.1.0.body.0.weight", "classifier.aspp.convs.1.0.body.1.weight", "classifier.aspp.convs.2.0.body.0.weight", "classifier.aspp.convs.2.0.body.1.weight", "classifier.aspp.convs.3.0.body.0.weight", "classifier.aspp.convs.3.0.body.1.weight", "classifier.classifier.0.body.0.weight", "classifier.classifier.0.body.1.weight".
Unexpected key(s) in state_dict: "classifier.aspp.convs.1.0.weight", "classifier.aspp.convs.2.0.weight", "classifier.aspp.convs.3.0.weight", "classifier.classifier.0.weight".

Perhaps, if you are still keeping the commands that you used for training the models for which you shared the weights and publish those commands, it might be easier to use the pre-trained models.

I just wanted to note this point in case somebody else also experiences the same issue. Overall, the repo is really helpful. Thank you.

Train DeeplabV3Plus-MobileNetV3 for Road Only Segmentation

Hi there,

I am trying to find a model which just do segmentation for road with this model:

'tf_mobilenetv3_small_075': {
        'imagenet': 'https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_mobilenetv3_small_075-da427f52.pth'
    },

I need to run it on Rpi + OAK-D camera, so I'd like it to be as not that slow on these edge devices.

Could you please provide this trained model, or help to show me how to do it?

Thanks,

Winston

resnet50 training problem

Hello. I'm trying to reproducing the result. However, when training with deeplabv3plus_resnet50, the mIoU can't reach 0.772. Instead the best performance is 0.714. I wonder is there modification of hyper-parameter when you train it yourself. Thank you very much.

Reproduce issue.

With the default training setting of this code, I train "deeplabv3plus_resnet101" model on voc12.
The best mIOU I can get is 0.763, whereas the provided corresponding model can score 0.783.

question to the trainingdata

hi, guys, i'm recently meeting a problem which very confuses me. I'm using deeplabv3+ to train a 5 classes segmentation model include forest, ground, sky, runway asphalt and runway lane. i used 3100 images and cooresponding labels. But i exchange the label index 1,2 by mistake from 1706th label up and i trained the network. But finally i get a better segmentation than before accidently. Do you know what causes this, because i fixed the problem and modifed the wrong label index as correct index afterwards and the results is bad. Thank you in advance.

Is the separable_conv is better than standard conv?

Hello!
I train moblenet-deeplabv3+ and mobilenet-deeplabv3+ with --separable_conv open
and find that latter is better than former by 1.8% (MIoU).
So I want your result and I guess it is because the reducing of overfitting?
Thanks a lot!

question about test

@VainF ,Hi,I can train normally on cityscapes datasets, but the test results are obviously wrong. What's the matter?

Failed to reproduce the results on VOC 2012 dataset

Hi, VainF. Thanks for sharing this nice repo, where the code has great readability and practicability. However, I failed to reproduce the results on the voc dataset.

I trained deeplabv3plus-resnet101 (os 16, provided pre-trained weights for ResNet101) on the VOC 2012_aug dataset with all the other default settings but only changed the gpu_id to '0,1,2,3' as I couldn't train the model on one 2080ti gpu with the batch size of 16. And I also applied the SyncBN: https://github.com/vacancy/Synchronized-BatchNorm-PyTorch to avoid the performance decrease caused by multi-gpu training. And the best miou is 0.7539. Then I asked my friend to help to train the model, he trained the model on a TITAN RTX gpu with no SyncBN, and his best miou is 0.7535. Therefore I think multi-gpu training is fine with SyncBN.

Did you further apply multi-scale inference for the validation? Do I need to change some settings to achieve 0.783 on VOC 2012_aug dataset?

Looking forward to your reply and suggestions. Thank you again for your effort.

can't reproduce your results

Thank you for your amazing work. I was trying to reproduce your results on cityscapes dataset. However, I couldn't reach mIoU > 70 % for both mobilenet and resnet based model. Could you share your training hyperparameters? Also, do you have any training tips that could help to reach your results?

With kind regards.

train --year 2007 failed

Hi, while waiting to download PascalVOC2012.zip, I try to run 2007 dataset I already downloaded before.

When run, I got the following error message:

(segmentation) paul@tensor:~/segmentation/DeepLabV3Plus-Pytorch$ python main.py --model deeplabv3plus_mobilenet --enable_vis --vis_port 28333 --gpu_id 0 --year 2007 --crop_val --lr 0.01 --crop_size 513 --batch_size 16 --output_stride 16
Setting up a new session...
Device: cuda
Dataset: voc, Train set: 209, Val set: 213
[!] Retrain
/home/paul/segmentation/lib/python3.6/site-packages/torchvision/transforms/functional.py:387: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
"Argument interpolation should be of type InterpolationMode instead of int. "
/home/paul/segmentation/lib/python3.6/site-packages/torchvision/transforms/functional.py:387: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
"Argument interpolation should be of type InterpolationMode instead of int. "
Epoch 1, Itrs 10/30000, Loss=1.980302
Traceback (most recent call last):
File "main.py", line 390, in
main()
File "main.py", line 342, in main
outputs = model(images)
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/paul/segmentation/DeepLabV3Plus-Pytorch/network/utils.py", line 16, in forward
x = self.classifier(features)
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/paul/segmentation/DeepLabV3Plus-Pytorch/network/_deeplab.py", line 49, in forward
output_feature = self.aspp(feature['out'])
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/paul/segmentation/DeepLabV3Plus-Pytorch/network/_deeplab.py", line 160, in forward
res.append(conv(x))
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/paul/segmentation/DeepLabV3Plus-Pytorch/network/_deeplab.py", line 130, in forward
x = super(ASPPPooling, self).forward(x)
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 178, in forward
self.eps,
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/functional.py", line 2279, in batch_norm
_verify_batch_size(input.size())
File "/home/paul/segmentation/lib/python3.6/site-packages/torch/nn/functional.py", line 2247, in _verify_batch_size
raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])

what should I do to fix this error? Thank you for your help.

reporduced your code

Hello, I would like to know whether you started training from scratch without loading any weight and how many epochs you have trained

how can i know miou score

Hello, I would like to know whether you started training from scratch without loading any weight and how many epochs you have trained

预训练模型输入维数

大神,想请教使用cityscapes预训练的resnet101时,需要把图像的维度预处理成多少后喂进网络?
毕业设计,万分感激

IntermediateLayerGetter parameters

hi,
thanks for this implementation.

do you think it is safe to grab the parameters of the backbone after it has been passed through IntermediateLayerGetter? as done in here.

it seems that calling backbone.parameters() will retrieve only few parameters and not the entire backbone's parameters as one expects.
see here for an example using resnet.
thanks

Deeplab code not implemented

Why everything comes to this line calling DeepLabV3 but it is not implemented as shown in the figure below?

if name=='deeplabv3plus':
    return_layers = {'high_level_features': 'out', 'low_level_features': 'low_level'}
    classifier = DeepLabHeadV3Plus(inplanes, low_level_planes, num_classes, aspp_dilate)
elif name=='deeplabv3':
    return_layers = {'high_level_features': 'out'}
    classifier = DeepLabHead(inplanes , num_classes, aspp_dilate)
backbone = IntermediateLayerGetter(backbone, return_layers=return_layers)

model = DeepLabV3(backbone, classifier)
return model

deeplab

AdaptiveAvgPool2d

Why is this nn.AdaptiveAvgPool2d(1) done here?

class ASPPPooling(nn.Sequential):

def __init__(self, in_channels, out_channels):
    super(ASPPPooling, self).__init__(
        nn.AdaptiveAvgPool2d(1),
        nn.Conv2d(in_channels, out_channels, 1, bias=False),
        nn.BatchNorm2d(out_channels),
        nn.ReLU(inplace=True))

def forward(self, x):
    size = x.shape[-2:]
    x = super(ASPPPooling, self).forward(x)
    return F.interpolate(x, size=size, mode='bilinear', align_corners=False)

I am doing segmentation task and this abive pooling changes my output from torch.Size([1, 256, 16, 16]) to torch.Size([1, 256, 1, 1])
giving the error,
"Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])"

What could have gone wrong?

FocalLoss params alpha and gamma

@VainF
I use deeplabv3plus_resnet101 to train my own dataset, and set loss='Focal_Loss'.
But I found the params in focalloss are set as α=1,γ=0, it means the same to cross_entroy loss.
image
image
Is this something you did on purpose or is this a code error ?

pth -> onnx

请问怎么将训练好的 pth 分割模型转换为 onnx?用的网络的 deeplabv3plus_resnet101

Model performance index

Hi @VainF ,

I used THOP to add two lines of code to calculate the model parameters and flops in the modeling.py,but the result is not ideal.How does your code calculate the flops and parameters of the model as your chart shows?
Looking forward to your answer!Thanks!

Cityscapes training on Full Res image

Hi,

Thanks for this wonderful repo!

I would like to ask you whether you have trained Cityscapes images on full resolution images using DeeplabV3 + Mobilenet architecture model you have provided in this repo?

low GPU utility

| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:01:00.0 Off | N/A |
| 36% 64C P2 93W / 250W | 5556MiB / 11018MiB | 38% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:02:00.0 Off | N/A |
| 37% 64C P2 118W / 250W | 5486MiB / 11019MiB | 32% Default |

I'm using two RTX 2080Ti GPUs, and the average utility is around 35%. I also tried the implementation in SMP, and the utility is low as well.
Wonder anyone else also experiences this problem? And what may be the cause? Thanks.
I'm sure it's not caused by the data loader, as when I use unet or my own model, the utility is always over 90%.

TypeError: the JSON object must be str, bytes or bytearray, not bool

I work with Cityscapes dataset but when training there is a error like this :
Traceback (most recent call last):
File "main.py", line 388, in
main()
File "main.py", line 217, in main
vis = Visualizer(port=opts.vis_port,
File "I:\DeepLabV3Plus-Pytorch-master\utils\visualizer.py", line 14, in init
ori_win = json.loads(ori_win)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\json_init_.py", line 341, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not bool

关于训练结果

我使用cityscapes数据集训练,PR值都还好,但是看了一下results文件夹下产生的pre_img发现语义分割都画乱了,之后我自己又下载了一下你的模型,看一下图片检测结果,发现也不行。我不知道是我根据网络输出结果把语义分割画到原图的过程错误还是有忽略其他什么问题,希望您可以给一个将神经网络结果画回原图的示例代码。谢谢!

"

import cv2
import numpy as np

from network import *
from PIL import Image
from torchvision.transforms.transforms import *
import torch

val_transform = Compose([
#et.ExtResize( 512 ),
ToTensor(),
Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
])
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_path = 'models_res/best_deeplabv3plus_mobilenet_cityscapes_os16.pth'
model = deeplabv3plus_mobilenet(num_classes = 19,output_stride=16)

model.load_state_dict(torch.load(model_path)['model_state'])
model.to(device)
model.eval()

img_path = 'results/4_image.png'
image = Image.open(img_path).convert('RGB')
input = cv2.cvtColor(np.asarray(image),cv2.COLOR_RGB2BGR)
if name == 'main':
import torch

cv2.namedWindow('img_draw',0)

model_dict = torch.load(model_path)
test_input = val_transform(image).unsqueeze(dim=0)
test_input = test_input.to(device)
print('输入图像:',test_input.size())
output =model(test_input).cpu().detach().clone()
print('输出:',output.size())

preds = output.max(dim=1)[1].cpu().numpy()#中括号里对应输出 19 个维度中其中一个
print(preds)
mask = (output.detach().max(dim=1)[1].cpu()==5).nonzero()
mask = mask[...,1:].numpy()
print(mask)
cv2.drawContours(input, [mask], -1, (0, 0, 255), -1)
cv2.imshow('img_draw',input)
cv2.waitKey(0)

"

How to modify the structure to fit more than three channels of input pictures?

The work I am currently facing needs to add a mask as a four-channel input based on the three-channel picture. I do n’t know how to change the network structure. For example, when using resnet101 as the backbone,how to modify the network structure to fit the four-channel Picture input?
Hope for your help, Thanks

how can i get your score?

我想要复现胰腺癌您的结果,想问一下您训练resnet-101时候是权重都不加载的情况下训练的嘛?还是加载了哪些预训练权重呢?训练了多少个epoch呀

Only support single GPU?

I find that there is no 'Parallel' or 'parallel' in the codes, so I think it only supports single GPU, right?
Then how can you put 16 images on one GPU when trained on CityScapes……

Thanks for your effort!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.