Giter Site home page Giter Site logo

joe-nano / 20bn-realtimenet Goto Github PK

View Code? Open in Web Editor NEW

This project forked from quic/sense

0.0 1.0 0.0 12.43 MB

20bn-realtimenet: Enhance your application with the ability to see and interact with humans using any RGB camera.

Home Page: https://20bn.com/products/datasets

License: MIT License

Python 83.34% Jupyter Notebook 16.66%

20bn-realtimenet's Introduction

20bn-realtimenet

20bn-realtimenet is an inference engine for two lightweight neural networks that were pre-trained on millions of videos, including a gesture recognition model and a fitness activity tracking model. Both neural networks are small, efficient, and run smoothly in real time on a CPU.

Getting Started

The following steps are confirmed to work on Linux (Ubuntu 18.04 LTS and 20.04 LTS) and macOS (Catalina 10.15.7).

1. Clone the Repository

To begin, clone this repository to a local directory of your choice:

git clone https://github.com/TwentyBN/20bn-realtimenet.git
cd 20bn-realtimenet

2. Install Dependencies

Create a new virtual environment. The following instruction uses conda (recommended). You can also create a new virtual environment with virtualenv.

conda create -y -n realtimenet python=3.6
conda activate realtimenet

Install Python dependencies:

pip install -r requirements.txt

Note: pip install -r requirements.txt only installs the CPU-only version of PyTorch. To run inference on your GPU, another version of PyTorch should be installed. For instance:

conda install pytorch torchvision cudatoolkit=10.2 -c pytorch

See all available options here.

3. Download Pre-trained Weights

Pre-trained weights can be downloaded from here. Follow the link, register for your account, and you will be redirected to the download page. After downloading the weights, be sure to unzip and place the contents of the directory into 20bn-realtimenet/resources.

Available scripts

Inside the 20bn-realtimenet/scripts directory, you will find 3 Python scripts, gesture_recognition.py, fitness_tracker.py, and calorie_estimation.py.

1. Gesture Recognition

scripts/gesture_recognition.py applies our pre-trained models to hand gesture recognition. 30 gestures are supported (see full list here):

PYTHONPATH=./ python scripts/gesture_recognition.py

(full video can be found here)

2. Fitness Activity Tracking

scripts/fitness_tracker.py applies our pre-trained models to real-time fitness activity recognition and calorie estimation. In total, 80 different fitness exercises are recognized (see full list here).

(full video can be found here)

Usage:

PYTHONPATH=./ python scripts/fitness_tracker.py --weight=65 --age=30 --height=170 --gender=female

Weight, age, height should be respectively given in kilograms, years and centimeters. If not provided, default values will be used.

Some additional arguments can be used to change the streaming source:

  --camera_id=CAMERA_ID           ID of the camera to stream from
  --path_in=FILENAME              Video file to stream from. This assumes that the video was encoded at 16 fps.

It is also possible to save the display window to a video file using:

  --path_out=FILENAME             Video file to stream to

Ideal Setup:

For the best performance, the following is recommended:

  • Camera on the floor
  • Body fully visible (head-to-toe)
  • Clean background

3. Calorie Estimation

In order to estimate burned calories, we trained a neural net to convert activity features to the corresponding MET value. We then post-process these MET values (see correction and aggregation steps performed here) and convert them to calories using the user's weight.

If you're only interested in the calorie estimation part, you might want to use scripts/calorie_estimation.py which has a slightly more detailed display (see video here which compares two videos produced by that script).

Usage:

PYTHONPATH=./ python scripts/calorie_estimation.py --weight=65 --age=30 --height=170 --gender=female

The estimated calorie estimates are roughly in the range produced by wearable devices, though they have not been verified in terms of accuracy. From our experiments, our estimates correlate well with the workout intensity (intense workouts burn more calories) so, regardless of the absolute accuracy, it should be fair to use this metric to compare one workout to another.

Running on an iOS Device and CoreML Conversion

If you're interested in mobile app development and want to run our models on iOS devices, please check out 20bn-realtimenet-ios for step by step instructions on how to get our gesture demo to run on an iOS device. One of the steps involves converting our Pytorch models to the CoreML format, which can be done from this repo using the following script:

python scripts/conversion/convert_to_coreml.py --backbone=efficientnet --classifier=efficient_net_gesture_control --output_name=realtimenet

Citation

We now have a blogpost you can cite:

@misc{realtimenet2020blogpost,
    author = {Guillaume Berger and Antoine Mercier and Florian Letsch and Cornelius Boehm and Sunny Panchal and Nahua Kang and Mark Todorovich and Ingo Bax and Roland Memisevic},
    title = {Towards situated visual AI via end-to-end learning on video clips},
    howpublished = {\url{https://medium.com/twentybn/towards-situated-visual-ai-via-end-to-end-learning-on-video-clips-2832bd9d519f}},
    note = {online; accessed 23 October 2020},
    year=2020,
}

License

The code is copyright (c) 2020 Twenty Billion Neurons GmbH under an MIT Licence. See the file LICENSE for details. Note that this license only covers the source code of this repo. Pretrained weights come with a separate license available here.

This repo uses PyTorch, which is licensed under a 3-clause BSD License. See the file LICENSE_PYTORCH for details.

20bn-realtimenet's People

Contributors

guillaumebrg avatar joe-nano avatar nahuakang avatar sunny-panchal avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.