Giter Site home page Giter Site logo

sanjeev309 / posebox Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 1.0 245 KB

A machine learning approach for pose estimation of hand-drawn marker

License: MIT License

Python 59.48% Jupyter Notebook 40.52%
machine-learning computer-vision pose-estimation marker-detection

posebox's Introduction

Posebox :

A machine learning approach for pose estimation of hand-drawn marker

Details

Pose estimation is a costly operation and often requires additional hardware like depth sensor for accurate plane detection. This project is an attempt to build a computationally cheap, no additional hardware dependency and realtime pose estimation of a fixed hand-drawn marker.

(cause let's be honest, not everyone owns a printer)

Prerequisites

Checkout requirements.txt for specifics

The project is structured per the following layout.

This means you will have to create data and it's sub folders accordingly

Data

The training data contains a specific marker hand drawn on paper and annotated always in a particular order.

For example:

The data is created using a video file captured from a mobile device of the hand drawn marker on paper.

The data flow pipeline is as follows:

Tool used:

VGG Image Annotator is used for point annotations

Model

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 511, 511, 3)       39        
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 255, 255, 3)       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 254, 254, 3)       39        
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 127, 127, 3)       0         
_________________________________________________________________
mobilenetv2_1.00_224 (Model) (None, 4, 4, 1280)        2257984   
_________________________________________________________________
flatten (Flatten)            (None, 20480)             0         
_________________________________________________________________
dense (Dense)                (None, 64)                1310784   
_________________________________________________________________
dense_1 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_2 (Dense)              (None, 16)                528       
_________________________________________________________________
dense_3 (Dense)              (None, 8)                 136       
=================================================================
Total params: 3,571,590
Trainable params: 1,313,606
Non-trainable params: 2,257,984
_________________________________________________________________

Training

Training was done on manually captured and annotated dataset containing 325 images. Model was trained using Google Colab and checkpoints were saved on google drive.

Accuracy and Loss:

Inference

Images are resized to 512 x 512 before being fed to the model. The results are regression coordinates in float between 0 to 1 which are then scaled as per the original dimension of the image.

Result from the model:

The points of the marker are numbered in the same manner as the annotation.

It is to note that the order of points in crucial and therefore must be ensured in the annotation process as well.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Authors

Sanjeev Tripathi

Harshini Gudipally

License

This project is licensed under the MIT License - see the LICENSE.md file for details

posebox's People

Contributors

dependabot[bot] avatar harshinigudipally avatar sanjeev309 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.