Giter Site home page Giter Site logo

captionbot-model-for-assisted-vission-people's Introduction

ICG Model By Nitin Kumar
Roll No. - 2019UCO1622

Captionbot - Image Caption Generator Model for Assisted vision people

Description : Image caption generator is a task that involves natural language processing (NLP) concepts to recognize the context of an image and describe them in a natural language like English.Python based project as captionbot for blind or blurred vision people – Learn to Build Image Caption Generator with CNN & LSTM.

Libraries Used :

  • Tensorflow : It's an open-source library that supports deep learning using Python etc frameworks.
  • Keras : It's an open-source Python library that allows to evaluate the deep learning models.
  • Pillow : Pillow is a Python Imaging Library (PIL), that adds support for opening, manipulating, and saving images.
  • Numpy : To work with arrays, Numpy library is used.
  • Matplotlib : Library to create static and animated visualizations in Python framework.
  • Tqdm : tqdm is a library in Python which is used for creating Progress Meters or Progress Bars.

Type of Dataset :

  • For the image caption generator, we will be using the Flickr_8K dataset. There are also other big datasets like Flickr_30K and MSCOCO dataset but it can take weeks just to train the network so we will be using a small Flickr8k dataset. The advantage of a huge dataset is that we can build better models.

About Dataset used for this model

  • Name of dataset : Flickr_8K dataset
  • Source of dataset : Kaggle
  • Description of dataset : Contains 8091 photographs in JPEG format.

Direct link to download the Dataset

  • Flickr_8K dataset
    • Contains 8091 photographs in JPEG format.
  • Flickr_8K Text
    • The Flickr_8k_text folder contains file Flickr8k.token which is the main file of our dataset that contains image name and their respective captions separated by newline(“\n”).

Complete Demo Video Given Below ⬇

Click on image

IMAGE ALT TEXT HERE

GIF Tutorial For Working Of This Model

Image Caption Generator Model

Steps shown in above GIF or video described as below :

1. Open Anaconda Prompt.
2. Go to the path, where your all project files or dataset (both testing and training) present.
3. Now you need to run python file testing_caption_generator.py.
4. Before running you need to give a path to the input image.
5. For performing step 3 and step 4, you need to type the following command in anaconda prompt.
                python testing_caption_generator.py -i "Path to the input image"
6. After typing the command, hit enter .
7. After few seconds, you will see a caption START and caption END.
8. your required image caption will be present in between the START and END.

Demonstration Via Some Screenshots :

img1 img5 img6 img3 img2 img4 img7

Folder Structure :

  • Downloaded from dataset :

    • Flicker8k_Dataset – Dataset folder which contains 8091 images.
    • Flickr_8k_text – Dataset folder which contains text files and captions of images.
  • The below files will be created by us while making the project :

    • Models – It will contain our trained models.
    • Descriptions.txt – This text file contains all image names and their captions after preprocessing.
    • Features.p – Pickle object that contains an image and their feature vector extracted from the Xception pre-trained CNN model.
    • Tokenizer.p – Contains tokens mapped with an index value.
    • Model.png – Visual representation of dimensions of our project.
    • Testing_caption_generator.py – Python file for generating a caption of any image.
    • Training_caption_generator.ipynb – Jupyter notebook in which we train and build our image caption generator.

Working of CNN

img

  • Convolutional Neural networks are specialized deep neural networks which can process the data that has input shape like a 2D matrix. Images are easily represented as a 2D matrix and CNN is very useful in working with images.
  • CNN is basically used for image classifications and identifying if an image is a bird, a plane or Superman, etc.
  • It scans images from left to right and top to bottom to pull out important features from the image and combines the feature to classify images. It can handle the images that have been translated, rotated, scaled and changes in perspective.
  • In simple word what CNN does is, it extract the feature of image and convert it into lower dimension without loosing its characteristics.

Working of LSTM

Lstm

  • LSTM stands for Long short term memory, they are a type of RNN (recurrent neural network) which is well suited for sequence prediction problems.
  • Based on the previous text, we can predict what the next word will be.
  • It has proven itself effective from the traditional RNN by overcoming the limitations of RNN which had short term memory.
  • LSTM can carry out relevant information throughout the processing of inputs and with a forget gate, it discards non-relevant information.

So, to make our image caption generator model, we will be merging these architectures. It is also called a CNN-RNN model. model

  • CNN is used for extracting features from the image. We will use the pre-trained model Xception.
  • LSTM will use the information from CNN to help generate a description of the image.

Summary

We built an image caption generator to create a CNN-RNN model in this Python project. It's important to note that our model is data-driven, so it can't predict words that aren't in its vocabulary. We worked with a tiny dataset of 8000 photos. We need to train production-level models on datasets larger than 100,000 pictures in order to develop better accuracy models.

Future Goals

  • Image captions used in self-driving cars where in it could describe the scene around the car.
  • Secondly could be an aid to the people who are blind as it could guide them in every way by converting scenes to caption and then to audio.
  • CCTV cameras where the alarms could be raised if any malicious activity is observed while describing the scene, recommendations in editing, social media posts, and many more.

Integration with android application and camera.What I do using this model mentioned the steps below :
Steps :

  • First of all train ML Model on Large dataset(so that it can predict better) and Create tflite model using TensorflowLite.
  • After Creating tflite, Build or integrate this model.tflite in android studio.
  • Front end Implementation -- ImageView, TextView, select(From Internal), capture(Camera) and predict button in Android Studio.
  • Back end Implementation -- show output caption in textview as well as convert this text to speech for blind people or blurred vision people.
  • Finally Improve Front end of App for Better Visuals(Colorful and splashscreen & Headlines - Caption Bot for Assisted Vision)
  • App is ready to predict.

ICG Model By Nitin Kumar

captionbot-model-for-assisted-vission-people's People

Contributors

nitinkumar3399 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.