Giter Site home page Giter Site logo

tohinz / multiple-objects-gan Goto Github PK

View Code? Open in Web Editor NEW
114.0 7.0 15.0 23.04 MB

Implementation for "Generating Multiple Objects at Spatially Distinct Locations" (ICLR 2019)

License: MIT License

Python 99.15% Shell 0.85%
image-generation gan attngan ms-coco multi-mnist clevr stackgan

multiple-objects-gan's Introduction

Generating Multiple Objects at Spatially Distinct Locations

Pytorch implementation for reproducing the results from the paper Generating Multiple Objects at Spatially Distinct Locations by Tobias Hinz, Stefan Heinrich, and Stefan Wermter accepted for publication at the International Conference on Learning Representations 2019.

For more information and visualizations also see our blog post

Our poster can be found here

Have a look at our follow-up work Semantic Object Accuracy for Generative Text-to-Image Synthesis with available code.

Model-Architecture

Dependencies

  • python 2.7
  • pytorch 0.4.1

Please add the project folder to PYTHONPATH and install the required dependencies:

pip install -r requirements.txt

Data

  • Multi-MNIST: adapted from here
    • contains the three data sets used in the paper: normal (three digits per image), split_digits (0-4 in top half of image, 5-9 in bottom half), and bottom_half_empty (no digits in bottom half of the image)
    • download our data, save it to data/ and extract
  • CLEVR: adapted from here
    • Main: download our data, save it to data/ and extract
    • CoGenT: download our data, save it to data/ and extract
  • MS-COCO:
    • download our preprocessed data (bounding boxes and bounding box labels), save it to data/ and extract
    • obtain the train and validation images from the 2014 split here, extract and save them in data/MS-COCO/train/ and data/MS-COCO/test/
    • for the StackGAN architecture: obtain the preprocessed char-CNN-RNN text embeddings from here and put the files in data/MS-COCO/train/ and data/MS-COCO/test/
    • for the AttnGAN architecture: obtain the preprocessed metadata and the pre-trained DAMSM model from here
      • extract the preprocessed metadata, then add the files downloaded in the first step (bounding boxes and bounding box labels) to the data/coco/coco/train/ and data/coco/coco/test/ folder
      • put the downloaded DAMSM model into code/coco/attngan/DAMSMencoders/ and extract

Training

  • to start training run sh train.sh data gpu-ids where you choose the desired data set and architecture (mnist/clevr/coco-stackgan-1/coco-stackgan-2/coco-attngan) and which/how many gpus to train on
  • e.g. to train on the Multi-MNIST data set on one GPU: sh train.sh mnist 0
  • e.g. to train the AttnGAN architecture on the MS-COCO data set on three GPUs: sh train.sh coco-attngan 0,1,2
  • training parameters can be adapted via code/dataset/cfg/dataset_train.yml
  • make sure the DATA_DIR in the respective code/dataset/cfg/dataset_train.yml points to the correct path
  • results are stored in output/

Evaluating

  • update the eval cfg file in code/dataset/cfg/dataset_eval.yml and adapt the path of NET_G to point to the model you want to use (default path is to the pretrained models linked below)
  • run sh sample.sh mnist/clevr/coco-stackgan-2/coco-attngan to generate images using the specified model

Pretrained Models

  • pretrained model for Multi-MNIST: download, save to models and extract
  • pretrained model for CLEVR: download, save to models and extract
  • pretrained model for MS-COCO:
    • StackGAN architecture: download, save to models and extract
    • AttnGAN architecture: download, save to models and extract

Examples Generated by the Pretrained Models

Multi-MNIST

Multi-Mnist Examples

CLEVR

CLEVR Examples

MS-COCO

StackGAN Architecture

COCO-StackGAN Examples

AttnGAN Architecture

COCO-AttnGAN Examples

Acknowledgement

  • Code for the experiments on Multi-MNIST and CLEVR data sets is adapted from StackGAN-Pytorch.
  • Code for the experiments on MS-COCO with the StackGAN architecture is adapted from StackGAN-Pytorch, while the code with the AttnGAN architecture is adapted from AttnGAN.

Citing

If you find our model useful in your research please consider citing:

@inproceedings{hinz2019generating,
title     = {Generating Multiple Objects at Spatially Distinct Locations},
author    = {Tobias Hinz and Stefan Heinrich and Stefan Wermter},
booktitle = {International Conference on Learning Representations},
year      = {2019},
url       = {https://openreview.net/forum?id=H1edIiA9KQ},
}

multiple-objects-gan's People

Contributors

heinrichst avatar tohinz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

multiple-objects-gan's Issues

Datasets

Hi,

Can anyone please have a check at the links provided for downloading the datasets? They seem to be broken.

Thank you!

generate images from arbitrary strings?

Hey I'm not deeply familiar with text to image GAN's or pytorch and I'm interested in trying your code out but from what I can tell it uses some preprocessed text embeddings rather than accepting arbitrary string input. Am I missing something? What would be the best way to generate new images with arbitrary strings?

it looks like https://github.com/reedscot/icml2016 that you linked to accepts arbitrary input, can I use that to output the latent space of those text queries to disk and then give them to your code base?

any guidance you have would be greatly appreciated

How to solve dataset AssertionError?

After I decide to use python2 to run, I use pycharm to create a new python2 virtual environment and run:

git clone https://github.com/tohinz/multiple-objects-gan
cd multiple-objects-gan
vim requirements.txt to del pkg-resources==0.0.0 to prevent errors.
pip install -r requirements.txt
cd models/
wget -c https://www2.informatik.uni-hamburg.de/wtm/software/multiple-objects-gan/model-ms-coco-attngan.zip
unzip model-ms-coco-attngan.zip
cd ../code/coco/attngan/
edit coco_eval.yml change to 
DATA_DIR: '/home/sam/code/python/pytorch/image_caption/dataset/coco2014'
IMG_DIR: "/home/sam/code/python/pytorch/image_caption/dataset/coco2014/val2014"
mkdir -p DAMSMencoders/coco/
wget -c https://www.dropbox.com/s/zj3z0lvkfd8vaga/image_encoder100.pth?dl=0 -O DAMSMencoders/coco/image_encoder100.pth
wget -c https://www.dropbox.com/s/jo325z064a7x07k/text_encoder100.pth?dl=0 -O DAMSMencoders/coco/text_encoder100.pth
python2 main.py --cfg cfg/coco_eval.yml

After I run above instructions I got errors:

(py2_t1) sam@sam-ub1804:~/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan$ python2 main.py --cfg cfg/coco_eval.yml
Using config:
{'B_VALIDATION': True,
 'CONFIG_NAME': 'attn2',
 'CUDA': True,
 'DATASET_NAME': 'coco',
 'DATA_DIR': '/home/sam/code/python/pytorch/image_caption/dataset/coco2014',
 'GAN': {'B_ATTENTION': True,
         'B_DCGAN': False,
         'CONDITION_DIM': 100,
         'DF_DIM': 96,
         'GF_DIM': 48,
         'R_NUM': 3,
         'Z_DIM': 100},
 'GPU_ID': '0',
 'IMG_DIR': '/home/sam/code/python/pytorch/image_caption/dataset/coco2014/val2014',
 'RNN_TYPE': 'LSTM',
 'TEXT': {'CAPTIONS_PER_IMAGE': 5, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 20},
 'TRAIN': {'BATCH_SIZE': 50,
           'B_NET_D': False,
           'DISCRIMINATOR_LR': 0.0002,
           'ENCODER_LR': 0.0002,
           'FLAG': False,
           'GENERATOR_LR': 0.0002,
           'MAX_EPOCH': 600,
           'NET_E': 'DAMSMencoders/coco/text_encoder100.pth',
           'NET_G': '../../../models/model-ms-coco-attngan-0100.pth',
           'RNN_GRAD_CLIP': 0.25,
           'SMOOTH': {'GAMMA1': 5.0,
                      'GAMMA2': 5.0,
                      'GAMMA3': 10.0,
                      'LAMBDA': 1.0},
           'SNAPSHOT_INTERVAL': 2000},
 'TREE': {'BASE_SIZE': 64, 'BRANCH_NUM': 3},
 'WORKERS': 1}
bboxes:  (40470, 3, 4)
labels:  (40470, 3, 1)
Save to:  /home/sam/code/python/pytorch/image_caption/dataset/coco2014/captions.pickle
Traceback (most recent call last):
  File "main.py", line 134, in <module>
    assert dataset
AssertionError
(py2_t1) sam@sam-ub1804:~/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan$

If I comment line 134 of main.py in code/coco/attngan directory and run the same instruction again, it shows:

(py2_t1) sam@sam-ub1804:~/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan$ python2 main.py --cfg cfg/coco_eval.yml
Using config:
{'B_VALIDATION': True,
 'CONFIG_NAME': 'attn2',
 'CUDA': True,
 'DATASET_NAME': 'coco',
 'DATA_DIR': '/home/sam/code/python/pytorch/image_caption/dataset/coco2014',
 'GAN': {'B_ATTENTION': True,
         'B_DCGAN': False,
         'CONDITION_DIM': 100,
         'DF_DIM': 96,
         'GF_DIM': 48,
         'R_NUM': 3,
         'Z_DIM': 100},
 'GPU_ID': '0',
 'IMG_DIR': '/home/sam/code/python/pytorch/image_caption/dataset/coco2014/val2014',
 'RNN_TYPE': 'LSTM',
 'TEXT': {'CAPTIONS_PER_IMAGE': 5, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 20},
 'TRAIN': {'BATCH_SIZE': 50,
           'B_NET_D': False,
           'DISCRIMINATOR_LR': 0.0002,
           'ENCODER_LR': 0.0002,
           'FLAG': False,
           'GENERATOR_LR': 0.0002,
           'MAX_EPOCH': 600,
           'NET_E': 'DAMSMencoders/coco/text_encoder100.pth',
           'NET_G': '../../../models/model-ms-coco-attngan-0100.pth',
           'RNN_GRAD_CLIP': 0.25,
           'SMOOTH': {'GAMMA1': 5.0,
                      'GAMMA2': 5.0,
                      'GAMMA3': 10.0,
                      'LAMBDA': 1.0},
           'SNAPSHOT_INTERVAL': 2000},
 'TREE': {'BASE_SIZE': 64, 'BRANCH_NUM': 3},
 'WORKERS': 1}
bboxes:  (40470, 3, 4)
labels:  (40470, 3, 1)
Load from:  /home/sam/code/python/pytorch/image_caption/dataset/coco2014/captions.pickle
/home/sam/anaconda3/envs/py2_t1/lib/python2.7/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
  "num_layers={}".format(dropout, num_layers))
Traceback (most recent call last):
  File "main.py", line 158, in <module>
    algo.sample(split_dir, num_samples=25, draw_bbox=True)
  File "/home/sam/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan/trainer.py", line 489, in sample
    text_encoder.load_state_dict(state_dict)
  File "/home/sam/anaconda3/envs/py2_t1/lib/python2.7/site-packages/torch/nn/modules/module.py", line 719, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for RNN_ENCODER:
        size mismatch for encoder.weight: copying a param of torch.Size([1, 300]) from checkpoint, where the shape is torch.Size([27297, 300]) in current model.
(py2_t1) sam@sam-ub1804:~/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan$ 

How could I solve these problems?
Thank you~

How to solve 'AttributeError: 'EasyDict' object has no attribute 'iteritems''?

Now I download this package just want to test pre-train model.
I run this package as follows:

git clone https://github.com/tohinz/multiple-objects-gan
cd multiple-objects-gan
vim requirements.txt to del pkg-resources==0.0.0 and change tensorboard==1.0.0a4 to tensorboard==1.6.0rc0 to prevent errors.
pip install -r requirements.txt
cd models/
wget -c https://www2.informatik.uni-hamburg.de/wtm/software/multiple-objects-gan/model-ms-coco-attngan.zip
unzip model-ms-coco-attngan.zip
cd ../code/coco/attngan/
python3 main.py --cfg cfg/coco_eval.yml

I got the error:

(t3) sam@sam-ub1804:~/PycharmProjects/t3/multiple-objects-gan/code/coco/attngan$ python3 main.py --cfg cfg/coco_eval.yml
Traceback (most recent call last):
  File "main.py", line 92, in <module>
    cfg_from_file(args.cfg_file)
  File "/home/sam/PycharmProjects/t3/multiple-objects-gan/code/coco/attngan/miscc/config.py", line 106, in cfg_from_file
    _merge_a_into_b(yaml_cfg, __C)
  File "/home/sam/PycharmProjects/t3/multiple-objects-gan/code/coco/attngan/miscc/config.py", line 74, in _merge_a_into_b
    for k, v in a.iteritems():
AttributeError: 'EasyDict' object has no attribute 'iteritems'
(t3) sam@sam-ub1804:~/PycharmProjects/t3/multiple-objects-gan/code/coco/attngan$

How to solve it? Am I missing something?
Thank you~

AttnGAN evaluation error

Hi,

I get an error while trying to generate images using the pretrained AttnGAN model:
sh sample.sh coco-attngan
...

  File "main.py", line 86, in gen_example
    algo.gen_example(data_dic)
  File ".../multiple-objects-gan-master/code/coco/attngan/trainer.py", line 604, in gen_example
    netG.load_state_dict(state_dict)
RuntimeError: Error(s) in loading state_dict for G_NET:
Unexpected key(s) in state_dict: "netG".

About the download link

It seems the link of preprocessed data (bounding boxes and bounding box labels) did not work, can you upload again? THX

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.