Giter Site home page Giter Site logo

yoghourtcover / image-captioning Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yuanxiaosc/image-captioning

0.0 1.0 0.0 2.17 MB

CNN-Encoder and RNN-Decoder (Bahdanau Attention) for image caption or image to text on MS-COCO dataset. 图片描述

Python 1.75% Jupyter Notebook 98.25%

image-captioning's Introduction

Image-Captioning

CNN-Encoder and RNN-Decoder (Bahdanau Attention) for image caption or image to text on MS-COCO dataset.

Task Description

Given an image like the example below, our goal is to generate a caption such as "a surfer riding on a wave".

Man Surfing

To accomplish this, you'll use an attention-based model, which enables us to see what parts of the image the model focuses on as it generates a caption.

Prediction

The model architecture is similar to Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.

Main principle

The model consists of CNN-Encoder and RNN-Decoder. The CNN-Encoder is used to extract the information of the input image to generate the intermediate representation H, and then use RNN-Decode to gradually decode the H (using Bahdanau Attention) to generate a text description corresponding to the image.

模型由CNN-Encoder和RNN-Decoder组成,首先使用CNN-Encoder提取输入图片的信息生成中间表示H,然后使用RNN-Decode对H逐步解码(使用了BahdanauAttention)生成图片对应的文本描述。

Input: image_features.shape (16, 64, 2048)
---------------Pass by cnn_encoder---------------
Output: image_features_encoder.shape (16, 64, 256)

Input: batch_words.shape (16, 1)
Input: rnn state shape (16, 512)
---------------Pass by rnn_decoder---------------
Output: out_batch_words.shape (16, 5031)
Output: out_state.shape (16, 512)
Output: attention_weights.shape (16, 64, 1)

Code test pass

  • Pyhon 3.6
  • TensorFlow version 2

Usage

1. Preparing data

python data_utils.py

Manual download of data If the code can't download the data automatically because of network reasons, you can download the data manually.

  1. Downloading captions data from http://images.cocodataset.org/annotations/annotations_trainval2014.zip
  2. unzip annotations_trainval2014.zip and move annotations to project
  3. Downloading images data from http://images.cocodataset.org/zips/train2014.zip
  4. unzip train2014.zip and move train2014 to project

2. Train model

python train_image_caption_model.py

3. Model inference

python inference_image_caption.py

Experimental result

loss

inference_image_caption outputs

Reference Code

image_captioning.ipynb

This notebook is an end-to-end example. When you run the notebook, it downloads the MS-COCO dataset, preprocesses and caches a subset of images using Inception V3, trains an encoder-decoder model, and generates captions on new images using the trained model.

In this example, you will train a model on a relatively small amount of data—the first 30,000 captions for about 20,000 images (because there are multiple captions per image in the dataset).

Learn more

Title Content
awesome-image-captioning A curated list of image captioning and related area resources.

image-captioning's People

Contributors

yuanxiaosc avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.