Computer Vision
Description
DLTK Computer Vision enables you to find meaning in visual content! Analyze images for scenes, objects, faces, and other content. Choose a default model off the shelf, or create your own custom classifier. Develop smart applications that analyze the visual content of images or video frames to understand what is happening in a scene.
Features provided
DLTK Computer Vision Module provides the following APIs as of now:
- Face Detection Image/ JSON: Uses HaarCascade algorigthm to detect and save the location of the detected face. Depending on the request used, it will either server those co-ordinates in json format or in base64 encoded image.
- Object Detection Image/ JSON: Detect multiple objects in the same image using RetinaNet-50. It also tags the objects and shows their location within the image.
- Image Classification: Used pre-trained InceptionV3 Model which is a convolutional neural network having 48 hidden layers. The pretrained network can classify images into 1000 object categories, such as keyboard, mouse, pencil, and many animals. The network has an image input size of 299-by-299.
- License Plate Detection Image/ JSON: Used Haarcascade Classifier trained on automobile license plates. Depending on the request used we can extract the JSON which contains the co-ordinates of the license plate or the image containing the license encoded in base64 string.
Demo
License plate detection
Motivation
This Repository is created to show how DLTK computer vision API uses advanced deep learning algorithms to analyze images and videos for scenes, objects, faces, licence plates and other content. For example, you upload a photograph and service detects different objects in a photograph. You can use the default model from DLTK.AI or create your own custom classifier.
Frameworks/ Tech Stack used
- Django : Python-based open-source web framework that follows the model-view-template (MVT) architectural pattern.
- OpenCV: Library of programming functions mainly aimed at real-time computer vision.
- InceptionV3: convolutional neural network having 48 hidden layers. The pretrained network can classify images into 1000 object categories, such as keyboard, mouse, pencil, and many animals.
- RetinaNet-50: a popular single-stage detector, which is accurate and runs fast. RetinaNet uses a feature pyramid network to efficiently detect objects at multiple scales and introduces a new loss, the Focal loss function, to alleviate the problem of the extreme foreground-background class imbalance.
- Haar Cascade: Machine learning based approach where a cascade function is trained from a lot of positive and negative images. It is then used to detect objects in other images.
How to use?
Before executing this project, first, we need to download the models used for Computer Vision tasks in the existing repository. Due to the sheer size of the models, we decided not to use GitHub and store in Amazon S3 buckets.
To download the models-
wget https://dltk-ai-prod.s3.ap-south-1.amazonaws.com/computer_vision_models/resources.zip
Then the 'resources.zip' needs to unzipped inside the Computer-Vision repository. The unzipped directory 'resources' contains all models that we use in this repository.
Option-1: Executing dltk-vision-core as a service.
- Clone the repository
- Install all the required dependencies.
pip install requirements.txt
- Open command prompt/Terminal and run the django server
python manage.py runserver 0.0.0.0:8187
- Start using the APIs listed below:
Face detection API:
curl --location --request POST 'http://0.0.0.0:8187/dltk-vision/face-detection/image' \ --form 'image=@image_path'
JSON:
curl --location --request POST 'http://0.0.0.0:8187/dltk-vision/face-detection/json' \ --form 'image=@image_path'
Object detection API:
curl --location --request POST 'http://0.0.0.0:8187/dltk-vision/object-detection/image' \ --form 'image=@image_path'
JSON:
curl --location --request POST 'http://0.0.0.0:8187/dltk-vision/object-detection/json' \ --form 'image=@image_path'
Image classification API:
curl --location --request POST 'http://0.0.0.0:8187/dltk-vision/image-classification' \ --form 'image=@image_path'
Option-2: Executing dltk-vision-core as a docker container.
Docker: Docker is an advanced OS virtualization software platform that makes it easier to create, deploy, and run applications in a Docker container.
Install Docker by following this link.
Docker compose: Docker Compose is that users can activate all the services (containers) using a single command.
Install Docker Compose by following this link
Steps:
- Clone the repository;
- Go to the path where docker-compose.yml is placed.
- Run the command to start the container
sudo docker-compose up -d
- Now check the containers
sudo docker ps
- Execute the CURL Command mentioned in option-1
- Run the command to stop the container
sudo docker-compose down
Founding Member
Lead Maintainer
Core Maintainer
Core Contributers
License
The content of this project itself is licensed under GNU LGPL, Version 2.1 (LGPL-2.1)