Giter Site home page Giter Site logo

blazeface's Introduction

BlazeFace in Python

BlazeFace is a fast, light-weight face detector from Google Research. Read more, Paper on arXiv

A pretrained model is available as part of Google's MediaPipe framework.

Besides a bounding box, BlazeFace also predicts 6 keypoints for face landmarks (2x eyes, 2x ears, nose, mouth).

Because BlazeFace is designed for use on mobile devices, the pretrained model is in TFLite format. However, I wanted to use it from PyTorch and so I converted it.

NOTE: The MediaPipe model is slightly different from the model described in the BlazeFace paper. It uses depthwise convolutions with a 3x3 kernel, not 5x5. And it only uses "single" BlazeBlocks, not "double" ones.

The BlazePaper paper mentions that there are two versions of the model, one for the front-facing camera and one for the back-facing camera. This repo includes only the frontal camera model, as that is the only one I was able to find an official trained version for. The difference between the two models is the dataset they were trained on. As the paper says,

For the frontal camera model, only faces that occupy more than 20% of the image area were considered due to the intended use case (the threshold for the rear-facing camera model was 5%).

This means the included model will not be able to detect faces that are relatively small. It's really intended for selfies, not for general-purpose face detection.

Inside this repo

Essential files:

  • blazeface.py: defines the BlazeFace class that does all the work

  • blazeface.pth: the weights for the trained model

  • anchors.npy: lookup table with anchor boxes

Notebooks:

  • Anchors.ipynb: creates anchor boxes and saves them as a binary file (anchors.npy)

  • Convert.ipynb: loads the weights from the TFLite model and converts them to PyTorch format (blazeface.pth)

  • Inference.ipynb: shows how to use the BlazeFace class to make face detections

Detections

Each face detection is a PyTorch Tensor consisting of 17 numbers:

  • The first 4 numbers describe the bounding box corners:

    • ymin, xmin, ymax, xmax
    • These are normalized coordinates (between 0 and 1).
  • The next 12 numbers are the x,y-coordinates of the 6 facial landmark keypoints:

    • right_eye_x, right_eye_y
    • left_eye_x, left_eye_y
    • nose_x, nose_y
    • mouth_x, mouth_y
    • right_ear_x, right_ear_y
    • left_ear_x, left_ear_y
    • Tip: these labeled as seen from the perspective of the person, so their right is your left.
  • The final number is the confidence score that this detection really is a face.

Image credits

Included for testing are the following images:

  • 1face.png. Fei Fei Li by ITU Pictures, CC BY 2.0

  • 3faces.png. Geoffrey Hinton, Yoshua Bengio, Yann Lecun. Found at AIBuilders

  • 4faces.png from Andrew Ng’s Facebook page / KDnuggets

These images were scaled down to 128x128 pixels as that is the expected input size of the model.

blazeface's People

Contributors

longnguyen2 avatar hollance avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.