Disclaimer: I am not working on this anymore. I will be happy to answer questions and review & merge PRs though.

Image Captioning with Spatial Attention in Keras

This is a Keras & Tensorflow implementation of a captioning model. In particular, it uses the attention models described in this paper, which is depicted below:

where V are the K local features from the last convolutional layer of a ConvNet (e.g. ResNet-50), x_t is the input (composed of the embedding of the previous word and the average image feature). h_t is the hidden state of the LSTM at time t, which is used to compute the attention weights to apply to V in order to obtain the context vector c_t. c_t and h_t are combined to predict the current word y_t. In (b), an additional gate is incorporated into the LSTM to produce the additional s_t output, which is combined with V to compute the attention weights. s_t is used as an alternative feature to look at rather than the image features in V.

Installation

Clone this repository

# Make sure to clone with --recursive
git clone --recursive https://github.com/amaiasalvador/sat_keras.git

Install python 2.7.
Install tensorflow 0.12.
pip install -r requirements.txt
(Optional )Install this Keras PR with support for layer-wise learning rate multipliers:

git clone https://github.com/amaiasalvador/keras.git
cd keras
git checkout lr_mult
python setup.py install

This option is disabled by default, so you can use "regular" keras 1.2.2 if you don't want to set a different learning rate to the base model.

Set tensorflow as the keras backend in ~/.keras/keras.json:

{
    "image_dim_ordering": "tf", 
    "epsilon": 1e-07, 
    "floatx": "float32", 
    "backend": "tensorflow"
}

Data & Pretrained model

Download MS COCO Caption Challenge 2015 dataset. Note that test images are not required for this code to work.
After extraction, the dataset folder must have the following structure:

$coco/                                    # dataset dir
$coco/annotations/                        # annotations directory
$coco/annotations/captions_train2014.json # caption anns for training set
$coco/annotations/captions_val2014.json   # ...
$coco/images/                             # image dir
$coco/images/train2014                    # train image dir
$coco/images/val2014                      # ...

Navigate to imcap/utils and run:

python prepro_coco.py --output_json path_to_json --output_h5 path_to_h5 --images_root path_to_coco_images

this will create the vocabulary and HDF5 file with data.

[Coming soon] Download pretrained model here.

Usage

Unless stated otherwise, run all commands from ./imcap:

Demo

Run sample_captions.ipynb to test the trained network on some images and visualize attention maps.

Training

Run python train.py. Run python args.py --help for a list of the available arguments to pass.

Testing

Run python test.py to forward all validation images through a trained network and create json file with results. Use --cnntrain flag if evaluating a model with fine tuned convnet.
Navigate to ./imcap/coco_caption/.
From there run:
```
python eval_caps.py -results_file results.json -ann_file gt_file.json
```
to get METEOR, Bleu, ROUGE_L & CIDEr scores for the previous json file with generated captions.

Note on used train/val/test splits

For the sake of comparison, the data processing script follows the one in NeuralTalk2 and AdaptiveAttention.

References

Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML 2015.
Lu et al. Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning. CVPR 2017 (original code here).
Caption evaluation code from this repository.

Contact

For questions and suggestions either use the issues section or send an e-mail to [email protected].

dhruvarora93 / imcap_keras Goto Github PK

imcap_keras's Introduction

Image Captioning with Spatial Attention in Keras

Installation

Data & Pretrained model

Usage

Demo

Training

Testing

Note on used train/val/test splits

References

Contact

imcap_keras's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent