Giter Site home page Giter Site logo

tesagure / cs224n_project Goto Github PK

View Code? Open in Web Editor NEW

This project forked from fregu856/cs224n_project

0.0 1.0 0.0 297.17 MB

Neural Image Captioning in TensorFlow.

Home Page: http://www.fregu856.com/

Python 32.68% Shell 0.51% Jupyter Notebook 53.01% C++ 9.52% Makefile 0.09% Lua 0.32% MATLAB 0.94% C 0.23% CSS 1.06% JavaScript 1.39% PHP 0.02% HTML 0.23%

cs224n_project's Introduction

CS224n_project

Demo: http://www.fregu856.com/image_captioning
Poster: https://goo.gl/1DMQVE
Report: https://goo.gl/PzgRf5


Installed packages (all just pip install on linux):
numpy
tensorflow
Cython (for the COCO PythonAPI)
matplotlib
scikit-image


Clone the Tensorflow models repo: https://github.com/tensorflow/models

Download the Inception-V3 model to where you want it (in my case to ~/CS224n/Project/CS224n_project/inception):
$ cd models/tutorials/image/imagenet
$ python classify_image.py --model_dir ~/CS224n/project/CS224n_project/inception

How to extract features from the second-to-last layer of the pretrained CNN:
https://www.kernix.com/blog/image-classification-with-a-pre-trained-deep-neural-network_p11


Dataset: Microsoft COCO:
http://mscoco.org/dataset/#download

Clone/download and place the "coco" folder in your project directory:
https://github.com/pdollar/coco

Download the training images and place in coco/images/train:
$ wget "http://msvocds.blob.core.windows.net/coco2014/train2014.zip"
$ unzip train2014.zip

Download the validation images:
$ wget "http://msvocds.blob.core.windows.net/coco2014/val2014.zip"
$ unzip val2014.zip
Place 5000 of the validation images in coco/images/val, 5000 in coco/images/test and the rest in coco/images/train.

Download the captions (captions_train2014.json and captions_val2014.json) and place in:
coco/annotations

To install the Python API:
$ cd coco/PythonAPI
$ make

Demo of the PythonAPI:
https://github.com/pdollar/coco/blob/master/PythonAPI/pycocoDemo.ipynb


For evaluation of captions:

Clone coco-caption and place in the coco folder in the project directory:
https://github.com/tylin/coco-caption
Make sure java is installed:
$ sudo apt-get install default-jdk


For initialization of the embedding matrix with GloVe vectors:

Download glove.6B.zip from https://nlp.stanford.edu/projects/glove/ and place glove.6B.300d.txt in coco/annotations.

Documentation

GRU_attention_model.py:

  • ASSUMES: that preprocess_captions.py, extract_img_features_attention.py and create_initial_embeddings.py has already been run.
  • DOES: defines the GRU_attention model and contains a script for training the model (basically identical to LSTM_attention_model.py).

GRU_model.py:

  • ASSUMES: that preprocess_captions.py, extract_img_features.py and create_initial_embeddings.py has already been run.
  • DOES: defines the GRU model and contains a script for training the model (basically identical to LSTM_model.py).

LSTM_attention_model.py:

  • ASSUMES: that preprocess_captions.py, extract_img_features_attention.py and create_initial_embeddings.py has already been run.
  • DOES: defines the LSTM_attention model and contains a script for training the model.

LSTM_model.py:

  • ASSUMES: that preprocess_captions.py, extract_img_features.py and create_initial_embeddings.py has already been run.
  • DOES: defines the LSTM model and contains a script for training the model.

caption_img.py:

  • Must be called in one of the following ways: $ caption_img.py LSTM (for using the best LSTM model) $ caption_img.py LSTM_attention (for using the best LSTM_attention model) $ caption_img.py GRU (for using the best GRU model) $ caption_img.py GRU_attention (for using the best GRU_attention model)
  • ASSUMES: that preprocess_captions.py has already been run. That the image one would like to generate a caption for is called "img.jpg" and is placed in the directory "img_to_caption". That the weights for the best LSTM/GRU/LSTM_attention/GRU_attention model has been placed in models/model_type/best_model with names model.filetype.
  • DOES: generates a caption for "img.jpg" using the best model of the specified model type and displays the img and its caption. For attention models, it also displays a figure visualizing the img attention at the time of prediciton for each word in the caption.

caption_random_test_img.py:

  • Must be called in one of the following ways: $ caption_img.py LSTM [img_id] (for using the best LSTM model) $ caption_img.py LSTM_attention [img_id] (for using the best LSTM_attention model) $ caption_img.py GRU [img_id] (for using te best GRU model) $ caption_img.py GRU_attention [img_id] (for using the best GRU_attention model)
  • ASSUMES: that preprocess_captions.py and extract_img_features.py has already been run. That the weights for the best LSTM/GRU/LSTM_attention/GRU_attention model has been placed in models/model_type/best_model with names model.filetype.
  • DOES: generates a caption for the test img with img id img_id if specified, otherwise for a random test img. It also displays the img and its caption. For attention models, it also displays a figure visualizing the img attention at the time of prediciton for each word in the caption.

create_initial_embeddings.py:

  • ASSUMES: that "preprocess_captions.py" already has been run.
  • DOES: creates a word embedding matrix (embeddings_matrix) using GloVe vectors.

evaluate_best_models_on_test.py:

  • ASSUMES: that preprocess_captions.py, extract_img_features.py and extract_img_features_attention.py has already been run. That the weights for the best LSTM/GRU/LSTM_attention/GRU_attention model has been placed in models/model_type/best_model with names model.filetype.
  • DOES: generates captions for all 5000 imgs in test using the best LSTM/GRU/LSTM_attention/GRU_attention model, evaluates the captions and returns the metric scores (BLEU-1, BLEU-2, BLEU-3, BLEU-4, CIDEr, METEOR and ROUGE_L).

extract_img_features.py:

  • ASSUMES: that the image dataset has been manually split such that all train images are stored in "coco/images/train/", all test images are stored in "coco/images/test/" and all val images are stored in "coco/images/val". That the Inception-V3 model has been downloaded and placed in inception.
  • DOES: extracts a 2048 dimensional feature vector for each train/val/test img and creates dicts mapping from img id to feature vector ( train/val/test_img_id_2_feature_vector).

extract_img_features_attention.py:

  • ASSUMES: that the image dataset has been manually split such that all train images are stored in "coco/images/train/", all test images are stored in "coco/images/test/" and all val images are stored in "coco/images/val". That the Inception-V3 model has been downloaded and placed in inception. That the dict numpy_params (containing W_img and b_img taken from the img_transform step in a well-performing non-attention model) is placed in coco/data/img_features_attention/transform_params.
  • DOES: extracts a 64x300 feature array (64 300 dimensional feature vectors, one each for 8x8 different img regions) for each train/val/test img and saves each individual feature array to disk (to coco/data/img_features_attention). Is used in the attention models.

preprocess_captions.py:

  • ASSUMES: that "split_img_ids.py" already has been run. That the COCO Python API has been installed. That the files captions_train2014.json, captions_val2014.json and glove.6B.300d.txt is placed in coco/annotations. That the folder coco/data exists.
  • DOES: all necessary pre-processing of the captions. Creates a number of files, see all "cPickle.dump" below.

split_img_ids.py:

  • ASSUMES: that the image dataset has been manually split such that all test images are stored in "coco/images/test/" and all val images are stored in "coco/images/val".
  • DOES: creates two files (val_img_ids, test_img_ids) containing the img ids for all val and test imgs, respectively. Is later used to sort an img as either train, val or test.

test.py:

  • DOES: contains a bunch of code snippets that have been tested or used at some point. Probably nothing interesting to see here.

utilities.py:

  • DOES: contains a number of functions used in different parts of the project.

web/app.py:

  • DOES: contains backend code for local live demo webpage.

web/templates/index.html:

  • DOES: contains frontend code for local live demo webpage.

cs224n_project's People

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.