Giter Site home page Giter Site logo

anubhavshrimal / attention-beam-image-captioning Goto Github PK

View Code? Open in Web Editor NEW
13.0 3.0 9.0 9.87 MB

Image captioning using beam search heuristic on top of the encoder-decoder based architecture

Jupyter Notebook 81.25% Python 18.75%
image-captioning beam-search heuristic image-caption-generator encoder-decoder attention-network

attention-beam-image-captioning's Introduction

Attention-Beam-Image-Captioning

We present a heuristic of beam search on top of the encoder-decoder based architecture that gives better quality captions on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.

Beam search helps in finding the most optimal caption that can be generated by the model instead of greedily choosing the word with best score at each decoding step. Following shows how a beam width (k) of 3 helps in generating better captions:

beam search

Dependencies

For dependencies related to this project, environment.yml and requirements.txt files have been provided.

To install the dependencies using conda:

conda env create -f environment.yml
conda env list

Training

Reference data folder and annotations json file for the downloaded dataset (MSCOCO, Flickr8k, Flickr30k) in create_input_files.py and run the python script to create the required dataset.

To train a model run python train.py. All training hyper-parameters are mentioned in train.py.

Note: Pretrained models for MSCOCO, Flickr8k, Flickr30k can be downloaded from here.

The downloaded zip file needs to be extracted in the models/ directory.

Testing / Inference

  • You may use caption.py to generate image captions and attention map over an image.

    python caption.py --img='path/to/image.jpeg' --model='path/to/BEST_checkpoint_coco_5_cap_per_img_5_min_word_freq.pth.tar' --word_map='path/to/WORDMAP_coco_5_cap_per_img_5_min_word_freq.json' --beam_size=5
    
  • The Jupyter Notebook Caption-Sample-Images.ipynb can be used to caption specified images using the trained model.

  • Generate-Testset-Predictions.ipynb is used for generating predictions in the required format for the testing dataset.

Results

results table

comparing captions

image1 image1a
image2 image2a
image3 image3a

Intercative User Interface

To use the UI based image captioner module run the following commands:

cd ui/
python MainWindowUI.py 

This would open the following user interface:

ui-view1 ui-view3

Project UI Demo

You can find the demo video here on youtube.

attention-beam-image-captioning's People

Contributors

anubhavshrimal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

attention-beam-image-captioning's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.