Giter Site home page Giter Site logo

cvpr18-caption-eval's Introduction

Learning to Evaluate Image Captioning

TensorFlow implementation for the paper:

Learning to Evaluate Image Captioning
Yin Cui, Guandao Yang, Andreas Veit, Xun Huang, Serge Belongie
CVPR 2018

This repository contains a discriminator that could be trained to evaluate image captioning systems. The discriminator is trained to distinguish between machine generated captions and human written ones. During testing, the trained discriminator take the cadidate caption, the reference caption, and optionally the image to be captioned as input. Its output probability of how likely the candidate caption is human written can be used to evaluate the candidate caption. Please refer to our paper [link] for more detail.

Dependencies

  • Python (2.7)
  • Tensorflow (>1.4)
  • PyTorch (for extracting ResNet image features.)
  • ProgressBar
  • NLTK

Preparation

  1. Clone the dataset with recursive (include the bilinear pooling)
git clone --recursive https://github.com/richardaecn/cvpr18-caption-eval.git
  1. Install dependencies. Please refer to TensorFlow, PyTorch and NLTK's official websites for installation guide. For other dependencies, please use the following:
pip install -r requirements.txt
  1. Download data. This script will download needed data. The detailed description of the data can be found in "./download.sh".
./download.sh
  1. Generate vocabulrary.
python scripts/preparation/prep_vocab.py
  1. Extract image features. Following script will download COCO dataset and ResNet checkpoint, then extract image features from COCO dataset using ResNet. This might take few hours.
./download_coco_dataset.sh
cd scripts/features/
./download.sh
python feature_extraction_coco.py --data-dir ../../data/ --coco-img-dir ../../data

Alternatively, we provide a [link] to download features extracted from ResNet152. Please put all *.npy files under "./data/resnet152/".

Evaluation

To evaluate the results of an image captioning method, first put the output captions of the model on COCO dataset into the following JSON format:

{
    "<file-name-1>" : "<caption-1>",
    "<file-name-2>" : "<caption-2>",
    ...
    "<file-name-n>" : "<caption-n>",
}

Note that <caption-i> are caption represented in text, and the file name is the name for the file in the image. The caption should be all lower-cased and have no \n at the end. Examples of such files by running open sourced NeuralTalk, Show and Tell and Show, Attend and Tell can be found in the examples folder: examples/neuraltalk_all_captions.json, examples/showandtell_all_captions.json, examples/showattendandtell_all_captions.json, and examples/human_all_captions.json.

Make sure you have NLTK Punkt sentence tokenizer installed in Python:

import nltk
nltk.download('punkt')

Following command prepared the data so that it could be used for training:

python scripts/preparation/prep_submission.py --submission examples/neuraltalk_all_captions.json  --name neuraltalk

Note that we assume you've followed through the steps in the Preparation section before running this command. This script will create a folder data/neuraltalk and three .npy files that contain data needed for training the metric. Please use the following command to train the metric:

python score.py --name neuraltalk

The results will be logged in model/neuraltalk_scoring directory. If you use the default model architecture, the results will be in model/neuraltalk_scoring/mlp_1_img_1_512_0.txt.

Followings are the scores for three submissions (calculated as the averaged score among last 10 epochs). Notice that scores might be slightly different due to randomization in training.

Architecture Epochs NeuralTalk Show and Tell Show, Attend and Tell
mlp_1_img_1_512_0 30 0.038 0.056 0.077

Citation

If you find our work helpful in your research, please cite it as:

@inproceedings{Cui2018CaptionEval,
  title = {Learning to Evaluate Image Captioning},
  author = {Yin Cui, Guandao Yang, Andreas Veit, Xun Huang, and Serge Belongie},
  booktitle={CVPR},
  year={2018}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.