Giter Site home page Giter Site logo

spade's Introduction

License CC BY-NC-SA 4.0 Python 3.6

Semantic Image Synthesis with SPADE

GauGAN demo

New implementation available at imaginaire repository

We have a reimplementation of the SPADE method that is more performant. It is avaiable at Imaginaire

Semantic Image Synthesis with Spatially-Adaptive Normalization.
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu.
In CVPR 2019 (Oral).

Copyright (C) 2019 NVIDIA Corporation.

All rights reserved. Licensed under the CC BY-NC-SA 4.0 (Attribution-NonCommercial-ShareAlike 4.0 International)

The code is released for academic research use only. For commercial use or business inquiries, please contact [email protected].

For press and other inquiries, please contact Hector Marinez

Installation

Clone this repo.

git clone https://github.com/NVlabs/SPADE.git
cd SPADE/

This code requires PyTorch 1.0 and python 3+. Please install dependencies by

pip install -r requirements.txt

This code also requires the Synchronized-BatchNorm-PyTorch rep.

cd models/networks/
git clone https://github.com/vacancy/Synchronized-BatchNorm-PyTorch
cp -rf Synchronized-BatchNorm-PyTorch/sync_batchnorm .
cd ../../

To reproduce the results reported in the paper, you would need an NVIDIA DGX1 machine with 8 V100 GPUs.

Dataset Preparation

For COCO-Stuff, Cityscapes or ADE20K, the datasets must be downloaded beforehand. Please download them on the respective webpages. In the case of COCO-stuff, we put a few sample images in this code repo.

Preparing COCO-Stuff Dataset. The dataset can be downloaded here. In particular, you will need to download train2017.zip, val2017.zip, stuffthingmaps_trainval2017.zip, and annotations_trainval2017.zip. The images, labels, and instance maps should be arranged in the same directory structure as in datasets/coco_stuff/. In particular, we used an instance map that combines both the boundaries of "things instance map" and "stuff label map". To do this, we used a simple script datasets/coco_generate_instance_map.py. Please install pycocotools using pip install pycocotools and refer to the script to generate instance maps.

Preparing ADE20K Dataset. The dataset can be downloaded here, which is from MIT Scene Parsing BenchMark. After unzipping the datgaset, put the jpg image files ADEChallengeData2016/images/ and png label files ADEChallengeData2016/annotatoins/ in the same directory.

There are different modes to load images by specifying --preprocess_mode along with --load_size. --crop_size. There are options such as resize_and_crop, which resizes the images into square images of side length load_size and randomly crops to crop_size. scale_shortside_and_crop scales the image to have a short side of length load_size and crops to crop_size x crop_size square. To see all modes, please use python train.py --help and take a look at data/base_dataset.py. By default at the training phase, the images are randomly flipped horizontally. To prevent this use --no_flip.

Generating Images Using Pretrained Model

Once the dataset is ready, the result images can be generated using pretrained models.

  1. Download the tar of the pretrained models from the Google Drive Folder, save it in 'checkpoints/', and run

    cd checkpoints
    tar xvf checkpoints.tar.gz
    cd ../
    
  2. Generate images using the pretrained model.

    python test.py --name [type]_pretrained --dataset_mode [dataset] --dataroot [path_to_dataset]

    [type]_pretrained is the directory name of the checkpoint file downloaded in Step 1, which should be one of coco_pretrained, ade20k_pretrained, and cityscapes_pretrained. [dataset] can be one of coco, ade20k, and cityscapes, and [path_to_dataset], is the path to the dataset. If you are running on CPU mode, append --gpu_ids -1.

  3. The outputs images are stored at ./results/[type]_pretrained/ by default. You can view them using the autogenerated HTML file in the directory.

Generating Landscape Image using GauGAN

In the paper and the demo video, we showed GauGAN, our interactive app that generates realistic landscape images from the layout users draw. The model was trained on landscape images scraped from Flickr.com. We released an online demo that has the same features. Please visit https://www.nvidia.com/en-us/research/ai-playground/. The model weights are not released.

Training New Models

New models can be trained with the following commands.

  1. Prepare dataset. To train on the datasets shown in the paper, you can download the datasets and use --dataset_mode option, which will choose which subclass of BaseDataset is loaded. For custom datasets, the easiest way is to use ./data/custom_dataset.py by specifying the option --dataset_mode custom, along with --label_dir [path_to_labels] --image_dir [path_to_images]. You also need to specify options such as --label_nc for the number of label classes in the dataset, --contain_dontcare_label to specify whether it has an unknown label, or --no_instance to denote the dataset doesn't have instance maps.

  2. Train.

# To train on the Facades or COCO dataset, for example.
python train.py --name [experiment_name] --dataset_mode facades --dataroot [path_to_facades_dataset]
python train.py --name [experiment_name] --dataset_mode coco --dataroot [path_to_coco_dataset]

# To train on your own custom dataset
python train.py --name [experiment_name] --dataset_mode custom --label_dir [path_to_labels] -- image_dir [path_to_images] --label_nc [num_labels]

There are many options you can specify. Please use python train.py --help. The specified options are printed to the console. To specify the number of GPUs to utilize, use --gpu_ids. If you want to use the second and third GPUs for example, use --gpu_ids 1,2.

To log training, use --tf_log for Tensorboard. The logs are stored at [checkpoints_dir]/[name]/logs.

Testing

Testing is similar to testing pretrained models.

python test.py --name [name_of_experiment] --dataset_mode [dataset_mode] --dataroot [path_to_dataset]

Use --results_dir to specify the output directory. --how_many will specify the maximum number of images to generate. By default, it loads the latest checkpoint. It can be changed using --which_epoch.

Code Structure

  • train.py, test.py: the entry point for training and testing.
  • trainers/pix2pix_trainer.py: harnesses and reports the progress of training.
  • models/pix2pix_model.py: creates the networks, and compute the losses
  • models/networks/: defines the architecture of all models
  • options/: creates option lists using argparse package. More individuals are dynamically added in other files as well. Please see the section below.
  • data/: defines the class for loading images and label maps.

Options

This code repo contains many options. Some options belong to only one specific model, and some options have different default values depending on other options. To address this, the BaseOption class dynamically loads and sets options depending on what model, network, and datasets are used. This is done by calling the static method modify_commandline_options of various classes. It takes in theparser of argparse package and modifies the list of options. For example, since COCO-stuff dataset contains a special label "unknown", when COCO-stuff dataset is used, it sets --contain_dontcare_label automatically at data/coco_dataset.py. You can take a look at def gather_options() of options/base_options.py, or models/network/__init__.py to get a sense of how this works.

VAE-Style Training with an Encoder For Style Control and Multi-Modal Outputs

To train our model along with an image encoder to enable multi-modal outputs as in Figure 15 of the paper, please use --use_vae. The model will create netE in addition to netG and netD and train with KL-Divergence loss.

Citation

If you use this code for your research, please cite our papers.

@inproceedings{park2019SPADE,
  title={Semantic Image Synthesis with Spatially-Adaptive Normalization},
  author={Park, Taesung and Liu, Ming-Yu and Wang, Ting-Chun and Zhu, Jun-Yan},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2019}
}

Acknowledgments

This code borrows heavily from pix2pixHD. We thank Jiayuan Mao for his Synchronized Batch Normalization code.

spade's People

Contributors

mingyuliutw avatar taesungp avatar tcwang0509 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spade's Issues

images in val_img unecessary for test.py?

Hi, great job on this paper! One question: if I am not retraining, just trying to generate images from a sketch, the validation images should not be necessary. However, if I delete the images, there is an error. On the other hand, if I change the images for something completely different, the output result doesn't change. Am I missing something or could the requirement to have validation images be dropped? Thanks a bunch!

AttributeError: 'Namespace' object has no attribute 'no_pairing_check'

I tried COCO-stuff dataset and it worked flawlessly, but when I switch to Cityscapes dataset I get this error message.

Traceback (most recent call last):
  File "test.py", line 17, in <module>
    dataloader = data.create_dataloader(opt)
  File "C:\SPADE\data\__init__.py", line 44, in create_dataloader
    instance.initialize(opt)
  File "C:\SPADE\data\pix2pix_dataset.py", line 39, in initialize
    if not opt.no_pairing_check:
AttributeError: 'Namespace' object has no attribute 'no_pairing_check'

I downloaded "gtFine_trainvaltest.zip" and "leftImg8bit_trainvaltest.zip", unzipped in C:\SPADE\datasets\cityscapes , create C:\SPADE\results\cityscapes_pretrained folder
Then run: python test.py --name cityscapes_pretrained --dataset_mode cityscapes --dataroot C:\SPADE\datasets\cityscapes

If I disable lines 39 to 42 in C:\SPADE\data\pix2pix_dataset.py:

        if not opt.no_pairing_check:
            for path1, path2 in zip(label_paths, image_paths):
                assert self.paths_match(path1, path2), \
                    "The label-image pair (%s, %s) do not look like the right pair because the filenames are quite different. Are you sure about the pairing? Please see data/pix2pix_dataset.py to see what is going on, and use --no_pairing_check to bypass this." % (path1, path2)

It will generate images but I believe that's not an optimal solution.
Also, if I check the "gtFine_trainvaltest.zip" file I downloaded, inside "gtFine\test" folder, all pictures are 100% pitch black "#000000", is it normal?

Thank you for your kind attention.

[Idea] Crowd-training of unreleased Flickr landscape model

As you probably know, Flickr landscape pre-trained model could not be released in this repo. But that model can draw landscapes with unbelievable quality, much higher than that of coco-stuff, due to training on 40k Flickr images; the fact that it hasn't been released is disappointing.

Some of us probably want to train it yourself. (Me for one, and also @Lokiiiiii brought this up) Typically it would cost a few thousand dollars. Thankfully, Product Hunt offers a paid subscription which basically offers $5000 AWS credits for $720: https://www.producthunt.com/ship#launch (Product Hunt Ship Pro, yearly subscription)

This gets the cost down to $720, but it's still a lot. Since a few of us are going to do the same exact thing, why don't we train the model together and share the cost? $720 split among 5 people is already $144, which is fair for such a powerful model.

Once we have a few people in, we can start a crowdfunding campaign, pledge funds, train the model and share it among us.

What do you think of this?

Coco staff dataset and inst maps

Hi, after I read the code and the paper carefully. I still have three questions to confuse me, would anybody please so kind to guide me?

  1. I think Coco stuff is just that the 92 categories of stuff instead of 182 categories to be used in stuff task, just as described in my blog: https://blog.csdn.net/Scarlett_Guan/article/details/89916692. Maybe the author does not know Coco staff datasets accurately?Or if my comprehension is not accurate, please tell me.
    When I read the code, the annotation only contains the instances_train2017.json, which is a annotation for instance segmentation and only contains 80 categories of thing. When I use coco API to print the categories, it can be proved.
    So on earth instance segmentation or stuff task? 92 categories or 80 categories or 182 categories?

  2. I think the isnt map is invalid. there’s no need to use it, only label map it enough. Through the script coco_generate_instance_map.py, I think isnt map is almost the same with the label map.

  3. Why label adding one in SPADE/util/coco.py. it is seems and quite unnecessary.

Thank you so much!

Flickr Landscapes dataset release ?

Hi,
Will the Flickr Landscapes dataset be released ? If not could you give us the details of how the dataset was curated and pre-processed after being web scrapped ?

module 'models.networks' has no attribute 'modify_commandline_options'

When I run "python test.py --name coco_pretrained --dataset_mode coco --dataroot H:\datasets\COCO\train"

Traceback (most recent call last):
File "test.py", line 15, in
opt = TestOptions().parse()
File "H:\SPADE-master\options\base_options.py", line 150, in parse
opt = self.gather_options()
File "H:\SPADE-master\options\base_options.py", line 85, in gather_options
parser = model_option_setter(parser, self.isTrain)
File "H:\SPADE-master\models\pix2pix_model.py", line 14, in modify_commandline_options
networks.modify_commandline_options(parser, is_train)
AttributeError: module 'models.networks' has no attribute 'modify_commandline_options'

how can i fix that?thx

running on gtx 10 series

is it possible for me to run this on GTX 1060 6gb? cuz I really just want to mess around with the neural stuff and learn from it.

3D Application

What would prevent this to be used as a 3d renderer where material colors match the physical materials? Except the high cost of rendering it in realtime.

Not able to process PTH files in checkpoints folder

python test.py --name coco_pretrained --dataset_mode coco --dataroot c:\spade\checkpoints\coco_pretrained
This ends with
Traceback (most recent call last):
File "test.py", line 17, in
dataloader = data.create_dataloader(opt)
File "c:\SPADE\data_init_.py", line 44, in create_dataloader
instance.initialize(opt)
File "c:\SPADE\data\pix2pix_dataset.py", line 28, in initialize
label_paths, image_paths, instance_paths = self.get_paths(opt)
File "c:\SPADE\data\coco_dataset.py", line 34, in get_paths
label_paths = make_dataset(label_dir, recursive=False, read_cache=True)
File "c:\SPADE\data\image_folder.py", line 47, in make_dataset
assert os.path.isdir(dir) or os.path.islink(dir), '%s is not a valid directory' % dir
AssertionError: .checkpoint\coco_pretrained\val_label is not a valid directory

Looks like site directory setting file is required?

I just realized that these PTH files are pytorch models. So I manually created empty missing folders and it ran but it created 0 byte image?? Something is missing please help.

How to feed colored input maps?

Hey guys, what an amazing work!

That's probably a dumb question, but how would one use images like these as inputs to the model?
000000203744
colored

Is there a pre-processing step that we need to take? I can only see that the model uses greyscale val_inst and val_label pictures, and I don't really get how they're obtained.
000000017914
000000017914

Many thanks in advance!

Replicating COCO model training

I'm trying to train SPADE on the COCO dataset, and reproduce the FID numbers. I've already managed to replicate the FID score for the model trained with ADE20K, and just want to confirm a few of the settings. It's a bit unclear from Appendix A how the COCO settings differ from the ADE20K ones. So:

  1. I see the number of epochs is 100. For COCO, is the learning rate schedule constant for all 100, or is it constant for the first 50 and for the second 50 it linearly decays to zero?
  2. --batchSize is still 32 (assuming 8 GPUs)?
  3. --coco_no_portraits should not be set (False)?
  4. --use_vae should not be set (False)?

Any other setting differences I should be aware of?

Thanks again for all the helpful replies.

setup instructions bug

$ cp Synchronized-BatchNorm-PyTorch/sync_batchnorm . -rf
usage: cp [-R [-H | -L | -P]] [-fi | -n] [-apvX] source_file target_file
       cp [-R [-H | -L | -P]] [-fi | -n] [-apvX] source_file ... target_directory

Use this instead:

$ cp -rf Synchronized-BatchNorm-PyTorch/sync_batchnorm/ .

Questions about default arguments for discriminators

Hello, thank you for sharing this nice works!
I have a question about the default arguments for multi-scale discriminator. Could you let me know the parameters for the pretrained model for each dataset?

  1. In the paper, in each discriminator, there are 5 convolution layers including the last conv layer. However, the default n_layers_D is 3, so there are only 4 convolution layers in total. (may be there should be one 512 size conv layer without upsampling?)

  2. In the code, the default value of num_D is 2. However, the number of D in Pix2pixHD is 3.

  3. If the code use TTUR, the default values of beta2 is 0.9, but the paper describes beta2 as 0.999

SPADE vs pix2pixHD on Faces datasets

Hi! Adding segmentation masks directly to intermediate layers is a great idea. Thanks for releasing the code.

I have noticed that in the paper you didn't show results of face generation tasks unlike in pix2pixHD paper. I'm working on the similar pet project, so I have a couple of questions. I would appreciate your insights

Have you tried SPADE on faces datasets?
If yes: Is there significant improvement compared to pix2pixHD model?
If not: Why not? :)

The GAN_Feat loss and VGG loss can not decrease.

log:
(epoch: 57, iters: 13696, time: 12.332) KLD: 1.490 GAN: 0.383 GAN_Feat: 8.289 VGG: 10.871 D_Fake: 0.757 D_real: 0.595
Under normal conditions, how much can GAN_Feat loss and VGG loss reduce to? Thanks.

Segmentation image is always gray?

Hello
In code, you change the segmentation image to one-hot vector.

If segmentation images are gray, this process makes sense.

For example, if segmentation image shape = [256, 256, 1], max pixel value = 150
then one-hot vector of segmap shape =[256, 256, 1, 151] and squeeze [256, 256, 151]
last dimension = channel

But if it's color, how do you turn it into a one-hot vector? I don't understand.
I think one-hot vector of segmap shape = [256, 256, 3, 151]..
Do I always have to prepare the segmentation image to Gray?

Washed out results at bigger sizes

Using SPADE at default values will output 512 x 256 pictures, which is very small.
I used this command to increase the resolution:
python test.py --name cityscapes_pretrained --dataset_mode cityscapes --dataroot C:\SPADE\datasets\cityscapes --load_size 1024 --crop_size 1024 --display_winsize 1024 --label_nc 35 --aspect_ratio 2.0 --preprocess_mode fixed
But the results look washed out compared to the smaller version, here's a comparison:
Small 512 x 256: https://drive.google.com/open?id=1fc9Mye-IGweuN3iWjKTVgXDm237EUTa2
Big 1024 x 512: https://drive.google.com/open?id=1EoY_zgBj6D8ULuNfzpa-TsnOfCqC-gAE
My results using vanilla Pix2PixHD at 2048 x 1024 doen't seem washed out, it's more detailed and crispier (although messier):
At 2048 x 1024: https://drive.google.com/open?id=1TRzUUNfayoChVzODxoMPCVuxsOGgdC-Y
Am I doing something wrong or is it normal?
Also, CUDA will run out of memory on my GPU when setting at 2048 x 1024 with SPADE, but not with Pix2PixHD which allows some fine tuning to mitigate that, is there a way to not run out of memory on 8GB VRAM with SPADE?

What are the exact color values of pixels for each label?

As I know now, each pixel of label/instance maps has a grayscale brightness value 0, 1, 2, ..., n, which corresponds to the object label.

I first thought that for COCO stuff dataset the values are simply these: https://github.com/nightrome/cocostuff/blob/master/labels.md

But it turns out to be a bit different. For example, to get a sky-other label (id 157 on the link above), the color of both label and instance maps pixels needs to be 163 for some reason. The same goes for the snow (159) - it's 165; and for sand (154) - 160.
But this +6 logic doesn't always work. For example, to get railroad (147), we seem to set pixel values to 154, which is +7 logic.

So the question is - how can we get the exact color values for each label?

Many thanks!

THCudaCheck FAIL

I am running on an rtx 2080 with cuda 10.1.

Does this mean one rtx 2080 is not enough ?

python test.py --name coco_pretrained --dataset_mode coco --dataroot '/home/chrispie/projects/SPADE/datasets/coco_stuff' 

....
dataset [CocoDataset] of size 8 was created
Network [SPADEGenerator] was created. Total number of parameters: 97.5 million. To see the architecture, do print(network).
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument
Traceback (most recent call last):
  File "test.py", line 36, in <module>
    generated = model(data, mode='inference')
  File "/home/chrispie/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/chrispie/projects/SPADE/models/pix2pix_model.py", line 58, in forward
    fake_image, _ = self.generate_fake(input_semantics, real_image)
  File "/home/chrispie/projects/SPADE/models/pix2pix_model.py", line 197, in generate_fake
    fake_image = self.netG(input_semantics, z=z)
  File "/home/chrispie/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/chrispie/projects/SPADE/models/networks/generator.py", line 91, in forward
    x = self.head_0(x, seg)
  File "/home/chrispie/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/chrispie/projects/SPADE/models/networks/architecture.py", line 60, in forward
    dx = self.conv_0(self.actvn(self.norm_0(x, seg)))
  File "/home/chrispie/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 485, in __call__
    hook(self, input)
  File "/home/chrispie/.local/lib/python3.6/site-packages/torch/nn/utils/spectral_norm.py", line 100, in __call__
    setattr(module, self.name, self.compute_weight(module, do_power_iteration=module.training))
  File "/home/chrispie/.local/lib/python3.6/site-packages/torch/nn/utils/spectral_norm.py", line 86, in compute_weight
    sigma = torch.dot(u, torch.mv(weight_mat, v))
RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:116

AssertionError: The label_path and image_path don't match.

python3 train.py --name coco_pretrained --dataset_mode coco --dataroot /home/python/Model_1/SPADE/datasets/coco_stuff/ --use_vae --gpu_ids -1 --no_pairing_check
----------------- Options ---------------
D_steps_per_G: 1
aspect_ratio: 1.0
batchSize: 1
beta1: 0.5
beta2: 0.999
cache_filelist_read: True
cache_filelist_write: True
checkpoints_dir: ./checkpoints
coco_no_portraits: False
contain_dontcare_label: True
continue_train: False
crop_size: 256
dataroot: /home/python/Model_1/SPADE/datasets/coco_stuff/ [default: ./datasets/cityscapes/]
dataset_mode: coco
debug: False
display_freq: 100
display_winsize: 256
gan_mode: hinge
gpu_ids: -1 [default: 0]
init_type: xavier
init_variance: 0.02
isTrain: True [default: None]
label_nc: 182
lambda_feat: 10.0
lambda_kld: 0.05
lambda_vgg: 10.0
load_from_opt_file: False
load_size: 286
lr: 0.0002
max_dataset_size: 9223372036854775807
model: pix2pix
nThreads: 0
n_layers_D: 3
name: coco_pretrained [default: label2coco]
ndf: 64
nef: 16
netD: multiscale
netD_subarch: n_layer
netG: spade
ngf: 64
niter: 50
niter_decay: 0
no_TTUR: False
no_flip: False
no_ganFeat_loss: False
no_html: False
no_instance: False
no_pairing_check: True [default: False]
no_vgg_loss: False
norm_D: spectralinstance
norm_E: spectralinstance
norm_G: spectralspadesyncbatch3x3
num_D: 2
num_upsampling_layers: normal
optimizer: adam
output_nc: 3
phase: train
preprocess_mode: resize_and_crop
print_freq: 100
save_epoch_freq: 10
save_latest_freq: 5000
serial_batches: False
tf_log: False
use_vae: True [default: False]
which_epoch: latest
z_dim: 256
----------------- End -------------------
train.py --name coco_pretrained --dataset_mode coco --dataroot /home/python/Model_1/SPADE/datasets/coco_stuff/ --use_vae --gpu_ids -1 --no_pairing_check
dataset [CocoDataset] of size 118287 was created
Network [SPADEGenerator] was created. Total number of parameters: 112.6 million. To see the architecture, do print(network).
Network [MultiscaleDiscriminator] was created. Total number of parameters: 1.7 million. To see the architecture, do print(network).
Network [ConvEncoder] was created. Total number of parameters: 10.5 million. To see the architecture, do print(network).
create web directory ./checkpoints/coco_pretrained/web...
Traceback (most recent call last):
File "train.py", line 34, in
for i, data_i in enumerate(dataloader, start=iter_counter.epoch_iter):
File "/usr/local/python3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 615, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/usr/local/python3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 615, in
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/python/Model_1/SPADE/data/pix2pix_dataset.py", line 70, in getitem
(label_path, image_path)
AssertionError: The label_path /home/python/Model_1/SPADE/datasets/coco_stuff/train_label/000000008708.png and image_path /home/python/Model_1/SPADE/datasets/coco_stuff/train_img/000000012818.jpg don't match.

how to test the pretrained model?

How do I test or mount the pretrained model that I downloaded? like what do I need to do, im kind of lost with all the weird parameters.

[feature request] evaluate mIoU, accu, FID every N iteration

First, thanks for your amazing work!
In your paper, mIoU, accu, FID are evaluated to make quantitative comparison to related work.
And it seems that these evaluation codes are missing from this version of code?
I think it will be better evaluating and log (or print) these value when we training the model.
For those who what to develop a new method based on your codes (like me), the mIoU, acccu, FID curves will intuitively show us whether our work is promising or not.
Thanks!

--no_instance doesn't work

When I'm trying to run test.py with --no_instance flag, I get the following error (I tried this flag on COCO and Cityscapes checkpoints, and the error happens on both):

Traceback (most recent call last): File "test.py", line 19, in <module> model = Pix2PixModel(opt) File "D:\...path...\SPADE\models\pix2pix_model.py", line 26, in __init__ self.netG, self.netD, self.netE = self.initialize_networks(opt) File "D:\...path...\SPADE\models\pix2pix_model.py", line 98, in initialize_networks netG = util.load_network(netG, 'G', opt.which_epoch, opt) File "D:\...path...\SPADE\util\util.py", line 194, in load_network net.load_state_dict(weights) File "D:\...path...\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 769, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for SPADEGenerator: size mismatch for fc.weight: copying a param with shape torch.Size([1024, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 183, 3, 3]). size mismatch for head_0.norm_0.mlp_shared.0.weight: copying a param with shape torch.Size([128, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 183, 3, 3]). size mismatch for head_0.norm_1.mlp_shared.0.weight: copying a param with shape torch.Size([128, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 183, 3, 3]). size mismatch for G_middle_0.norm_0.mlp_shared.0.weight: copying a param with shape torch.Size([128, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 183, 3, 3]). size mismatch for G_middle_0.norm_1.mlp_shared.0.weight: copying a param with shape torch.Size([128, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 183, 3, 3]). size mismatch for G_middle_1.norm_0.mlp_shared.0.weight: copying a param with shape torch.Size([128, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 183, 3, 3]). size mismatch for G_middle_1.norm_1.mlp_shared.0.weight: copying a param with shape torch.Size([128, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 183, 3, 3]). size mismatch for up_0.norm_0.mlp_shared.0.weight: copying a param with shape torch.Size([128, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 183, 3, 3]). size mismatch for up_0.norm_1.mlp_shared.0.weight: copying a param with shape torch.Size([128, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 183, 3, 3]). size mismatch for up_0.norm_s.mlp_shared.0.weight: copying a param with shape torch.Size([128, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 183, 3, 3]). size mismatch for up_1.norm_0.mlp_shared.0.weight: copying a param with shape torch.Size([128, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 183, 3, 3]). size mismatch for up_1.norm_1.mlp_shared.0.weight: copying a param with shape torch.Size([128, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 183, 3, 3]). size mismatch for up_1.norm_s.mlp_shared.0.weight: copying a param with shape torch.Size([128, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 183, 3, 3]). size mismatch for up_2.norm_0.mlp_shared.0.weight: copying a param with shape torch.Size([128, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 183, 3, 3]). size mismatch for up_2.norm_1.mlp_shared.0.weight: copying a param with shape torch.Size([128, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 183, 3, 3]). size mismatch for up_2.norm_s.mlp_shared.0.weight: copying a param with shape torch.Size([128, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 183, 3, 3]). size mismatch for up_3.norm_0.mlp_shared.0.weight: copying a param with shape torch.Size([128, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 183, 3, 3]). size mismatch for up_3.norm_1.mlp_shared.0.weight: copying a param with shape torch.Size([128, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 183, 3, 3]). size mismatch for up_3.norm_s.mlp_shared.0.weight: copying a param with shape torch.Size([128, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 183, 3, 3]).

Image-to-image translation tasks

Hi

Thanks for sharing this awesome code! I see that you have primary dealt with image-to-image translation tasks for inverse semantic segmentation.
I am currious of why you have not tried any other image-to-image translation tasks. Is it possible to use the spade-network for day-2-night or summer-2-winther?

Best regards

The key name of pretrained model is different.

When I run the test with the pre-trained model of coco datasets, the error is :

RuntimeError: Error(s) in loading state_dict for SPADEGenerator:
Missing key(s) in state_dict: "head_0.conv_0.weight", "head_0.conv_1.weight", "G_middle_0.conv_0.weight", "G_middle_0.conv_1.weight", "G_middle_1.conv_0.weight", "G_middle_1.conv_1.weight", "up_0.conv_0.weight", "up_0.conv_1.weight", "up_0.conv_s.weight", "up_1.conv_0.weight", "up_1.conv_1.weight", "up_1.conv_s.weight", "up_2.conv_0.weight", "up_2.conv_1.weight", "up_2.conv_s.weight", "up_3.conv_0.weight", "up_3.conv_1.weight", "up_3.conv_s.weight".
Unexpected key(s) in state_dict: "head_0.conv_0.weight_v", "head_0.conv_1.weight_v", "G_middle_0.conv_0.weight_v", "G_middle_0.conv_1.weight_v", "G_middle_1.conv_0.weight_v", "G_middle_1.conv_1.weight_v", "up_0.conv_0.weight_v", "up_0.conv_1.weight_v", "up_0.conv_s.weight_v", "up_1.conv_0.weight_v", "up_1.conv_1.weight_v", "up_1.conv_s.weight_v", "up_2.conv_0.weight_v", "up_2.conv_1.weight_v", "up_2.conv_s.weight_v", "up_3.conv_0.weight_v", "up_3.conv_1.weight_v", "up_3.conv_s.weight_v".

How to solve this?

Video tutorial for installation

Hi there.

I have to ask, is there a video tutorial for how to install SPADE? To be specific, to follow step by step process as they are written in the installation page?
I am having trouble with "Dataset Preparation" and everything after. I don't know where to place those files. Visually it will really help anyone who decides to try this out.

Thanks in advance.

the output image of test.py and the intermediate image saved during training are different?

aa
I trained with some landscape images and label maps(use_vae: False). During training, I find intermediate images in checkpoint/web is great(third column), but when I test on the same label map using test.py, the output are different(last column). Is that because when saving intermediate results during training, VAE is always enabled regardless of whether use_vae option are set? Could you explain it?
Thank!

FileNotFoundError提示找不到SPADE\datasets\train_label\000000391895.png文件

当我尝试运行coco_generate_instance_map.py,出现该错误。我找遍了所有下载的文件都没有找到该png。请问怎么办?

U:\Downloads\SPADE\datasets>python373 coco_generate_instance_map.py
annotation file at ./annotations/instances_train2017.json
input label maps at ./train_label/
output dir at ./train_inst/
loading annotations into memory...
Done (t=19.78s)
creating index...
index created!
0 / 118287
D:\python373\lib\site-packages\skimage\io_io.py:48: UserWarning: as_grey has
been deprecated in favor of as_gray
warn('as_grey has been deprecated in favor of as_gray')
Traceback (most recent call last):
File "coco_generate_instance_map.py", line 41, in
img = io.imread(label_name, as_grey=True)
File "D:\python373\lib\site-packages\skimage\io_io.py", line 61, in imread
img = call_plugin('imread', fname, plugin=plugin, **plugin_args)
File "D:\python373\lib\site-packages\skimage\io\manage_plugins.py", line 210,
in call_plugin
return func(*args, **kwargs)
File "D:\python373\lib\site-packages\imageio\core\functions.py", line 221, in
imread
reader = read(uri, format, "i", **kwargs)
File "D:\python373\lib\site-packages\imageio\core\functions.py", line 130, in
get_reader
request = Request(uri, "r" + mode, **kwargs)
File "D:\python373\lib\site-packages\imageio\core\request.py", line 126, in __
init__
self._parse_uri(uri)
File "D:\python373\lib\site-packages\imageio\core\request.py", line 278, in _p
arse_uri
raise FileNotFoundError("No such file: '%s'" % fn)
FileNotFoundError: No such file: 'U:\Downloads\SPADE\datasets\train_label\000000
391895.png'

U:\Downloads\SPADE\datasets>

There Should Be No Space Between -- image_dir in image documentation.

To train on your own custom dataset

python train.py --name [experiment_name] --dataset_mode custom --label_dir [path_to_labels] -- image_dir [path_to_images] --label_nc [num_labels]

Should Be

To train on your own custom dataset

python train.py --name [experiment_name] --dataset_mode custom --label_dir [path_to_labels] --image_dir [path_to_images] --label_nc [num_labels]

Otherwise, it will give path not found error.

why coco-stuff dataset?

Hello,
COCO-Stuff dataset's segmentation map is labeled by machine, not human handwork and many of those annotations are pretty bad, especially on the boundary of seg-map.
It seems that the boundary of seg maps is quite important to SPADE.
SO Why use COCO-stuff dataset, rather than MS COCO?

Questions regarding mIoU, accuracy, FID

Hi,

Thank you for sharing this awesome code!
Base on this issue, I understand that you are not going to release the evaluation code, and I'm working on reimplementing them myself.
I have the following questions:

  1. When computing the FID scores, do you compare to the generated images the original images or the cropped images (the same size as the generated ones)?

  2. What are the image sizes you used for evaluation? Do you generate higher resolution ones for evaluation or just use the default size (512x256 for cityscape, and 256x256 for the others)?

  3. What are the pre-trained segmentation models and code base you use for each datasets? Based on the paper, I assume these are the ones you use. Could you please confirm them?

  1. When you evaluate mIoUs and accuracies, do you upsample the images or downsample the labels? If so, how do you interpolate them?

Thanks in advance.

Best,
Godo

AttributeError when with flag --use_vae

Cloned SPADE with cityscapes dataset on my google cloud instance. Inference is working, also training is possible. But when enabling the --use_vae flag i get the following error.

command:
python train.py --name cityscapes_selftrained --dataset_mode cityscapes --dataroot /home/user/SPADE/SPADE/datasets/cityscapes --use_vae

output:

Network [SPADEGenerator] was created. Total number of parameters: 101.1 million. To see the architecture, do print(network).
Network [MultiscaleDiscriminator] was created. Total number of parameters: 1.4 million. To see the architecture, do print(network).
Network [ConvEncoder] was created. Total number of parameters: 10.5 million. To see the architecture, do print(network).
create web directory ./checkpoints/cityscapes_selftrained/web...
/opt/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py:2423: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
Traceback (most recent call last):
  File "train.py", line 40, in <module>
    trainer.run_generator_one_step(data_i)
  File "/home/user/SPADE/SPADE/trainers/pix2pix_trainer.py", line 35, in run_generator_one_step
    g_losses, generated = self.pix2pix_model(data, mode='generator')
  File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/SPADE/SPADE/models/pix2pix_model.py", line 46, in forward
    input_semantics, real_image)
  File "/home/user/SPADE/SPADE/models/pix2pix_model.py", line 137, in compute_generator_loss
    input_semantics, real_image, compute_kld_loss=self.opt.use_vae)
  File "/home/user/SPADE/SPADE/models/pix2pix_model.py", line 192, in generate_fake
    z, mu, logvar = self.encode_z(real_image)
  File "/home/user/SPADE/SPADE/models/pix2pix_model.py", line 184, in encode_z
    mu, logvar = self.netE(real_image)
  File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/SPADE/SPADE/models/networks/encoder.py", line 46, in forward
    if self.opt.crop_size >= 256:
  File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 535, in __getattr__
    type(self).__name__, name))
AttributeError: 'ConvEncoder' object has no attribute 'opt'

Does anyone know howto fix this?

why do 'label_image' * 255.0 ?

Hello
Thank you for sharing the code.
If you look at this code, you multiply the label image by 255.0. What is the meaning?

For example, the maximum value of the label image in ADECallengeData2016 is 150. If you multiply it by 255, it will be 38250...
no problem ?

ade20k test

when I try to run test.py with ade20k i get this error

Traceback (most recent call last):
File "test.py", line 32, in
for i, data in enumerate(dataloader):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 615, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 615, in
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/content/SPADE/data/pix2pix_dataset.py", line 82, in getitem
(label_path, image_path)
AssertionError: The label_path /content/SPADE/ADE20K_2016_07_26/images/validation/a/abbey/ADE_val_00000001_parts_1.png and image_path /content/SPADE/ADE20K_2016_07_26/images/validation/a/abbey/ADE_val_00000001.jpg don't match.

I got the zip
wget http://groups.csail.mit.edu/vision/datasets/ADE20K/ADE20K_2016_07_26.zip
unzip
unzip ADE20K_2016_07_26.zip
set path of dataset
python test.py --name ade20k_pretrained --dataset_mode ade20k --dataroot /content/SPADE/ADE20K_2016_07_26/

Expected Training Time ?

Could you give us an idea of how long it would take to train(50 epochs) the model on the Flickr Landscapes (40k, 256x256) dataset on the DGX1 ?

meaning of contain_dontcare_label

Hi, thanks for the code.
I have questions about the contain_dontcare_label arg.

I saw this is set to True on COCO, ade20k, yet set to False on facades, custom, and not set in cityscapes.

Is this related to the label's index (0-based or 1-based)?
Or is this related to if the datasets' contains background label?

Thanks in advance.

Output of SPADE

Hi,
Would you mind explaining why in the implementation of SPADE you have:
out = normalized * (1 + gamma) + beta

This is different to what is described in the paper, where I understood it as:
out = normalized *gamma + beta

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.