Giter Site home page Giter Site logo

attngan's Introduction

AttnGAN

Note

The key to solving the task in understanding GAN architectures described in these whitepapers and basic knowledge of natural language processing.

The first whitepaper describes Attention's approach to text-based image generation based on GAN. AttnGAN can synthesize fine-grained details on different sub-regions of the image, paying attention to the corresponding words in the description of the text in the natural language. However, this approach requires text pre-processing due to high noise levels. The solution described in this whitepaper is based on TF/IDF. The key module in architecture is the Deep Attentive Multimodal Similarity Model (DAMSM) capable of calculating similarity between the generated image and sentence. The model also represents an additional loss function for GAN learning.

The second whitepaper uses a Zero-Shot learning approach to a GAN-based classification task from previously unseen text categories based on Wikipedia raw articles. TF/IDF is also used for text preprocessing. The key feature of this solution is the text encoding module. It is based on an additional FC layer, which reduces dimensionality and suppresses noise. The presence of this layer provides an increase in accuracy of 2-3% compared to its absence.

As a result of whitepaper's analysis, FC layer was added to text encoding module, project ported to Python 3 (Python 2 is no longer supported since January 2020), dependencies were upgraded to stable versions, warnings and visualization errors were fixed and Tensorboard 2.1.0 support was added to DAMSM. As a result, the convergence of the validation loss function in the DAMSM model was 3-5% more successful in terms of sentence loss and word loss on CUB dataset on 200 epochs, which in my opinion is a critical indicator for this architecture.

sentence loss on validation

word loss on validation

Unfortunately, due to architecture specifics (changing DAMSM model affects the possibility of retraining) I didn't have enough resources to run DAMSM and GAN completely and run more tests, so I had to limit myself to 200 epochs. This is a evaluation result on generated images with Attention maps:

example evaluation

example evaluation 2

And some sample images:

sample 1

sample 2

Implementation

Pytorch implementation for reproducing AttnGAN results in the paper AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks by Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He. (This work was performed when Tao was an intern with Microsoft Research).

Dependencies

Python 3

Pytorch >=1.5

In addition, please add the project folder to PYTHONPATH and pip install the following packages:

  • python-dateutil
  • easydict
  • pandas
  • torchfile
  • nltk
  • scikit-image
  • tensorboard

Data

  1. Download our preprocessed metadata for birds coco and save them to data/
  2. Download the birds image data. Extract them to data/birds/
  3. Download coco dataset and extract the images to data/coco/

Training

  • Pre-train DAMSM models:

    • For bird dataset: python pretrain_DAMSM.py --cfg cfg/DAMSM/bird.yml --gpu 0
    • For coco dataset: python pretrain_DAMSM.py --cfg cfg/DAMSM/coco.yml --gpu 1
  • Train AttnGAN models:

    • For bird dataset: python main.py --cfg cfg/bird_attn2.yml --gpu 2
    • For coco dataset: python main.py --cfg cfg/coco_attn2.yml --gpu 3
  • *.yml files are example configuration files for training/evaluation our models.

Pretrained Model

Sampling

  • Run python main.py --cfg cfg/eval_bird.yml --gpu 1 to generate examples from captions in files listed in "./data/birds/example_filenames.txt". Results are saved to DAMSMencoders/.
  • Change the eval_*.yml files to generate images from other pre-trained models.
  • Input your own sentence in "./data/birds/example_captions.txt" if you wannt to generate images from customized sentences.

Validation

  • To generate images for all captions in the validation dataset, change B_VALIDATION to True in the eval_*.yml. and then run python main.py --cfg cfg/eval_bird.yml --gpu 1
  • We compute inception score for models trained on birds using StackGAN-inception-model.
  • We compute inception score for models trained on coco using improved-gan/inception_score.

Examples generated by AttnGAN [Blog]

bird example coco example

Creating an API

Evaluation code embedded into a callable containerized API is included in the eval\ folder.

Citing AttnGAN

If you find AttnGAN useful in your research, please consider citing:

@article{Tao18attngan,
  author    = {Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He},
  title     = {AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks},
  Year = {2018},
  booktitle = {{CVPR}}
}

Reference

attngan's People

Contributors

sethjuarez avatar taoxugit avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.