Giter Site home page Giter Site logo

vinayaksharmagh / imcap Goto Github PK

View Code? Open in Web Editor NEW
12.0 2.0 4.0 317.94 MB

Inspired from the paper "Show Attend and Tell". This project's aim was to train a neural network which can provide descriptive text for a given image.

Jupyter Notebook 99.95% Python 0.05%
attention-model attention-decoder image-captioning image-caption-generator encoder-decoder computer-vision

imcap's Introduction

IMcap

Inspired from the paper "Show Attend and Tell". This project's aim was to train a neural network which can provide descriptive text for a given image.

Overview

Architecture basically consists of encoder-decoder Neural Network, where encoder extracts feature from the image and decoder interprets those features to produce sentence.

Encoder consists of a pre-trained Inception-V3 model (minus the last Fully Connected layers) appended by a custom Fully Connected layer. Decoder consists of LSTM along with visual-attention. Visual attention helps LSTM in focusing on relevant image features for the prediction of a particular word in word sequence (sentence)

For word embedding, a custom word2vec model was trained and it was then intersected with Google's pre-trained word2vec model.

Google-colab and google drive combo was used for training the model. Flickr30 Dataset was used for creating train and test sets.

A number of models were trained (each trained on different no of images). Of these, version 5 model performs the best. You can access these models from "models" sub-directory inside "ImageCap" directory

Technologies used

InceptionV3, LSTM, visual-attention, Word2Vec

Libraries used

Tensorflow, Keras, Gensim, Numpy, Pandas, Sklearn, Pickle, Pillow, Matplotlib, ast, os

Using pretrained models

In order to use these pretrained models, clone/download this repository. Make sure that you have Tensorflow (>=1.15), Numpy, Matplotlib and Pillow installed on your system. Your system should also have Python 3. Now, unzip the repository and execute the "run.py" file in it with arguments <image_path> (1st argument) and <model_version> (2nd argument). Use model_version==5 to get best results.

Training your own model

In order to train your own model, clone/download this repository. Install all libraries mentioned above. Download content mentioned in /ImageCap/misc/Download links.txt. (1) Extract images using Extract.ipynb. (2) Run Image_Features.ipynb to obtain features of the images (3) Run Text_processing_and_embedding.ipynb to perform text preprocessing and word embedding. (4) Run Training.ipynb to train the model. (5) Run Evaluate.ipynb to get results on test images. (Note that, you will need to create some additional empty sub-directories as per paths in code to contain things like Image features, Images etc.)

Results

Following are some of the results. You can look at some more results in the Evaluate.ipynb

sunset

massan1

gully_boy

massan2

interstellar

old_couple

dogs

men sitting

imcap's People

Contributors

vinayaksharmagh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.