Giter Site home page Giter Site logo

action-recognition-tutorial's Introduction

Action Recognition on Realtime Application

This project is RGB base action recognition system that focus on real time application. We use KARD dataset to train this model. You can check the full demo version in this youtube link

KARD dataset
alt text

New dataset
alt text

Overview

Requirement

  • keras
  • tensorflow 1.14 (for using CuDNNLSTM)
  • CUDA 10.0

Dataset

You can download dataset from here. Skeleton joints and depth data is not used in this project. Only RGB part is needed. For preparing dataset, you should make the structure of your folder to be in this form

KARD-split
├── a01                   
│   ├── a01_s01_e01.mp4             
│   ├── a01_s01_e02.mp4            
│   ├── ...           
│   ├── a01_s10_e03     
├── a02                   
│   ├── a02_s01_e01.mp4             
│   ├── a02_s01_e02.mp4            
│   ├── ...           
│   ├── a02_s10_e03      
├── ....
├── ....
├── a18                   
│   ├── a18_s01_e01.mp4             
│   ├── a18_s01_e02.mp4            
│   ├── ...           
│   ├── a18_s10_e03   
└── ...

You can see more detail in dataset_list/trainlist.txt and dataset_list/testlist.txt

Demo

Webcam or any camera is required for using demo. You can run demo by using the following command

python webcam.py

Evaluation

If you want to see accuracy and confusion matrix of pretrain model, you can run the evaluation part by using below command.

python evaluate_model.py

Training

You can try to train model by run this command.

python train.py

If you want to change any parameter, you can find it in train.py file

Performance

Accuracy: around 87-89% (depend on which part of test set that is random)

Confusion Matrix: alt text

Method

Input: 8 RGB frames

Output: 18 action classes

  • This project use RGB Difference as input. The idea is from this paper and this project.
  • This project just use only simple model to solve it. I use just only LSTM as core of model and use MobileNetV2 as feature extraction part. You can see the detail of architecture in model_ML.py
  • while testing and evaluation, I will random n_sequence frames from each video file. So, n_sequence frames is "1 sample". while testing, if we random only 1 sample per 1 video file is not good because the accuracy will be unstable. So, we need to random more sample per one file. For example in evaluate_model.py, I set 'n_mul_test' to 2. It mean I will random 2 sample per one video file. You can change n_mul_test to be any value. If value is high, the accuracy will be stable but it need more testing time.

Note

  • The hyperparameters of train.py, evaluate_model.py and webcam.py is located in header of file. You can adjust it.
  • If you face the out of memory problem when you try to evaluate or train, you can decrease n_batch and n_sequence to reduce memory consumption. I suggest you should not use n_batch = 1 because the accuracy will be very swing and cannot converge

Acknowledgement

Beijing University of Post and Telecommunication(BUPT)

My supervisor: Assoc Prof Dr. Hui Gao
My mentor: ChaiXinXin

Tsinghua University

Co-supervisor: Prof Dr.Xin Su

Reference

Example code

https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly Data generator on keras
https://github.com/eriklindernoren/Action-Recognition Sampling Idea
https://github.com/AhmedGamal1496/online-action-recognition#Introduction RGB Difference Example

Paper

Temporal Segment Networks for Action Recognition in Videos, Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool, TPAMI, 2018. Arxiv Preprint

Dataset

https://data.mendeley.com/datasets/k28dtm7tr6/1

action-recognition-tutorial's People

Contributors

peachman05 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.