Giter Site home page Giter Site logo

imcap_keras's Introduction

Disclaimer: I am not working on this anymore. I will be happy to answer questions and review & merge PRs though.

Image Captioning with Spatial Attention in Keras

This is a Keras & Tensorflow implementation of a captioning model. In particular, it uses the attention models described in this paper, which is depicted below:

where V are the K local features from the last convolutional layer of a ConvNet (e.g. ResNet-50), xt is the input (composed of the embedding of the previous word and the average image feature). ht is the hidden state of the LSTM at time t, which is used to compute the attention weights to apply to V in order to obtain the context vector ct. ct and ht are combined to predict the current word yt. In (b), an additional gate is incorporated into the LSTM to produce the additional st output, which is combined with V to compute the attention weights. st is used as an alternative feature to look at rather than the image features in V.

Installation

  • Clone this repository
# Make sure to clone with --recursive
git clone --recursive https://github.com/amaiasalvador/sat_keras.git
  • Install python 2.7.
  • Install tensorflow 0.12.
  • pip install -r requirements.txt
  • (Optional )Install this Keras PR with support for layer-wise learning rate multipliers:
git clone https://github.com/amaiasalvador/keras.git
cd keras
git checkout lr_mult
python setup.py install

This option is disabled by default, so you can use "regular" keras 1.2.2 if you don't want to set a different learning rate to the base model.

  • Set tensorflow as the keras backend in ~/.keras/keras.json:
{
    "image_dim_ordering": "tf", 
    "epsilon": 1e-07, 
    "floatx": "float32", 
    "backend": "tensorflow"
}

Data & Pretrained model

$coco/                                    # dataset dir
$coco/annotations/                        # annotations directory
$coco/annotations/captions_train2014.json # caption anns for training set
$coco/annotations/captions_val2014.json   # ...
$coco/images/                             # image dir
$coco/images/train2014                    # train image dir
$coco/images/val2014                      # ...
  • Navigate to imcap/utils and run:
python prepro_coco.py --output_json path_to_json --output_h5 path_to_h5 --images_root path_to_coco_images
this will create the vocabulary and HDF5 file with data.
  • [Coming soon] Download pretrained model here.

Usage

Unless stated otherwise, run all commands from ./imcap:

Demo

Run sample_captions.ipynb to test the trained network on some images and visualize attention maps.

Training

Run python train.py. Run python args.py --help for a list of the available arguments to pass.

Testing

  • Run python test.py to forward all validation images through a trained network and create json file with results. Use --cnntrain flag if evaluating a model with fine tuned convnet.
  • Navigate to ./imcap/coco_caption/.
  • From there run:
    python eval_caps.py -results_file results.json -ann_file gt_file.json
    
    to get METEOR, Bleu, ROUGE_L & CIDEr scores for the previous json file with generated captions.

Note on used train/val/test splits

For the sake of comparison, the data processing script follows the one in NeuralTalk2 and AdaptiveAttention.

References

Contact

For questions and suggestions either use the issues section or send an e-mail to [email protected].

imcap_keras's People

Contributors

amaiasalvador avatar mbaradad avatar

Stargazers

Roman avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.