AttnGAN

Note

The key to solving the task in understanding GAN architectures described in these whitepapers and basic knowledge of natural language processing.

The first whitepaper describes Attention's approach to text-based image generation based on GAN. AttnGAN can synthesize fine-grained details on different sub-regions of the image, paying attention to the corresponding words in the description of the text in the natural language. However, this approach requires text pre-processing due to high noise levels. The solution described in this whitepaper is based on TF/IDF. The key module in architecture is the Deep Attentive Multimodal Similarity Model (DAMSM) capable of calculating similarity between the generated image and sentence. The model also represents an additional loss function for GAN learning.

The second whitepaper uses a Zero-Shot learning approach to a GAN-based classification task from previously unseen text categories based on Wikipedia raw articles. TF/IDF is also used for text preprocessing. The key feature of this solution is the text encoding module. It is based on an additional FC layer, which reduces dimensionality and suppresses noise. The presence of this layer provides an increase in accuracy of 2-3% compared to its absence.

As a result of whitepaper's analysis, FC layer was added to text encoding module, project ported to Python 3 (Python 2 is no longer supported since January 2020), dependencies were upgraded to stable versions, warnings and visualization errors were fixed and Tensorboard 2.1.0 support was added to DAMSM. As a result, the convergence of the validation loss function in the DAMSM model was 3-5% more successful in terms of sentence loss and word loss on CUB dataset on 200 epochs, which in my opinion is a critical indicator for this architecture.

Unfortunately, due to architecture specifics (changing DAMSM model affects the possibility of retraining) I didn't have enough resources to run DAMSM and GAN completely and run more tests, so I had to limit myself to 200 epochs. This is a evaluation result on generated images with Attention maps:

And some sample images:

Implementation

Pytorch implementation for reproducing AttnGAN results in the paper AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks by Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He. (This work was performed when Tao was an intern with Microsoft Research).

Dependencies

Python 3

Pytorch >=1.5

In addition, please add the project folder to PYTHONPATH and pip install the following packages:

python-dateutil
easydict
pandas
torchfile
nltk
scikit-image
tensorboard

Data

Download our preprocessed metadata for birds coco and save them to data/
Download the birds image data. Extract them to data/birds/
Download coco dataset and extract the images to data/coco/

Training

Pre-train DAMSM models:
- For bird dataset: python pretrain_DAMSM.py --cfg cfg/DAMSM/bird.yml --gpu 0
- For coco dataset: python pretrain_DAMSM.py --cfg cfg/DAMSM/coco.yml --gpu 1
Train AttnGAN models:
- For bird dataset: python main.py --cfg cfg/bird_attn2.yml --gpu 2
- For coco dataset: python main.py --cfg cfg/coco_attn2.yml --gpu 3
*.yml files are example configuration files for training/evaluation our models.

Pretrained Model

DAMSM for bird. Download and save it to DAMSMencoders/
DAMSM for coco. Download and save it to DAMSMencoders/
AttnGAN for bird. Download and save it to models/
AttnGAN for coco. Download and save it to models/
AttnDCGAN for bird. Download and save it to models/
- This is an variant of AttnGAN which applies the propsoed attention mechanisms to DCGAN framework.

Sampling

Run python main.py --cfg cfg/eval_bird.yml --gpu 1 to generate examples from captions in files listed in "./data/birds/example_filenames.txt". Results are saved to DAMSMencoders/.
Change the eval_*.yml files to generate images from other pre-trained models.
Input your own sentence in "./data/birds/example_captions.txt" if you wannt to generate images from customized sentences.

Validation

To generate images for all captions in the validation dataset, change B_VALIDATION to True in the eval_*.yml. and then run python main.py --cfg cfg/eval_bird.yml --gpu 1
We compute inception score for models trained on birds using StackGAN-inception-model.
We compute inception score for models trained on coco using improved-gan/inception_score.

Examples generated by AttnGAN [Blog]

bird example	coco example

Creating an API

Evaluation code embedded into a callable containerized API is included in the eval\ folder.

Citing AttnGAN

If you find AttnGAN useful in your research, please consider citing:

@article{Tao18attngan,
  author    = {Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He},
  title     = {AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks},
  Year = {2018},
  booktitle = {{CVPR}}
}

Reference

golemxlv / attngan Goto Github PK

attngan's Introduction

AttnGAN

Note

Implementation

Dependencies

Creating an API

Citing AttnGAN

attngan's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent