Giter Site home page Giter Site logo

source-new-samples-for-nic's Introduction

NIC Wrapper

The NIC Wrapper is a framework that augments the training set of Google NIC , the deep-learning based Neural Image Caption Generator. NIC is built to validate a hypothesis that extra training data from Google Image may help Google NIC learn the MS COCO dataset better. During the training process of NIC, we keep inserting image caption pairs from Google as extra training samples. By doing this, we expect the mistakes made by NIC are corrected.

Overview

Suppose NIC sees an image of a "cat" and describes it as a "dog". If we use the term "dog" to source extra images from Google and let NIC see the new images, NIC might realize how a dog look like. The NIC Wrapper automates this process. In each epoch of training, the wrapper sources extra training samples from Google, using the captions predicted by the latest model weights as textual queries. The model is expected to be more accurate since it now learns how Google binds images and captions beyond the initial training set.

The following diagram illustrates the architecture of NIC Wrapper. Show and Tell Architecture

We see a few decimal BLEU4 points of improvement when evaluating the NIC Wrapper on the MS COCO dataset. That is to say, the model trained with Google images has slightly better performances on the COCO validation. A few pairs of captions generated by the model trained with/without Google images are seen below:

Requirement

Installation

Follow the steps at im2txt to get a whole picture of NIC.

Clone the repository:

git clone [email protected]:LEAAN/Source-new-samples-for-NIC.git

Prepare the COCO Data. This may take a few hours.

# Location to save the MSCOCO data.
MSCOCO_DIR="${HOME}/im2txt/data/mscoco"

# Build the preprocessing script.
sh /im2txt/data/download_and_preprocess_mscoco.py "${MSCOCO_DIR}"

Download the Inception v3 Checkpoint.

# Location to save the Inception v3 checkpoint.
INCEPTION_DIR="${HOME}/im2txt/data"
mkdir -p ${INCEPTION_DIR}

wget "http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz"
tar -xvf "inception_v3_2016_08_28.tar.gz" -C ${INCEPTION_DIR}
rm "inception_v3_2016_08_28.tar.gz"

Train from scratch, only on COCO training set, until the LSTMs generate sentences that read like human languages. This takes around 1 million steps, nearly one week using a TITAN X (Pascal) with 12 GB of GPU RAM.

# Directory containing preprocessed MSCOCO data.
MSCOCO_DIR="${HOME}/im2txt/data/mscoco"

# Inception v3 checkpoint file.
INCEPTION_CHECKPOINT="${HOME}/im2txt/data/inception_v3.ckpt"

# Directory to save the model.
MODEL_DIR="${HOME}/im2txt/model"

# Run the training script.
python /im2txt/train.py \
  --input_file_pattern="${MSCOCO_DIR}/train-?????-of-00256" \
  --inception_checkpoint_file="${INCEPTION_CHECKPOINT}" \
  --train_dir="${MODEL_DIR}/train" \
  --train_inception=false \
  --number_of_steps=1000000

Now that the captions generated by NIC already read like human languages, we can feed the predicted captions of COCO training images to Google. Images suggested by Google, together with the textual queries used to source them, are added to the COCO training set. We renew the image caption pairs from Google every epoch and allow the latest model weights only see the extra training data obtained at the beginning of this epoch.

# save a backup of the model checkpoint at step=1,000,000
mkdir ${MODEL_DIR}/train_COCO
mv  ${MODEL_DIR}/train/* ${MODEL_DIR}/train_COCO/

# Train NIC with samples from Google.
python /im2txt/train_wrapper.py \
  --input_file_pattern="${MSCOCO_DIR}/train-?????-of-?????" \
  --train_dir="${MODEL_DIR}/train" \
  --train_inception=true \
  --number_of_steps=3000000

Run the image crawler in a separate process.

# Ignore GPU devices 
export CUDA_VISIBLE_DEVICES=""

# Source image caption pairs from Google.
python /im2txt/data/build_google_data.py

We compare our performances to that of the model fine-tuned with only COCO.

# move the checkpoints of the model trained with Google images to another directory.
mkdir ${MODEL_DIR}/train_Google
mv  ${MODEL_DIR}/train/!(1000000) ${MODEL_DIR}/train_Google/

# Restart the training script with --train_inception=true.
python /im2txt/train.py \
  --input_file_pattern="${MSCOCO_DIR}/train-?????-of-00256" \
  --train_dir="${MODEL_DIR}/train" \
  --train_inception=true \
  --number_of_steps=3000000

Calculate perplexity values while train_wrapper.py or train.py is running. We evaluate the model by perplexity values during training. Since the perplexity value correlates to the loss value, we expect the perplexity on validating set decreases, either trained with or without extra samples from Google.

# Ignore GPU devices.
export CUDA_VISIBLE_DEVICES=""

# Run the evaluation script. This will run in a loop, periodically loading the
# latest model checkpoint file and computing evaluation metrics.
python /im2txt/evaluate.py \
  --input_file_pattern="${MSCOCO_DIR}/val-?????-of-00004" \
  --checkpoint_dir="${MODEL_DIR}/train" \
  --eval_dir="${MODEL_DIR}/eval"

Evaluation metrics including BLEU4 are calculated after the training is done. An example of using the COCO evaluation API is available via https://github.com/tylin/coco-caption/blob/master/cocoEvalCapDemo.ipynb

References

Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge

Microsoft COCO Captions: Data Collection and Evaluation Server

source-new-samples-for-nic's People

Contributors

leaan avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.