Giter Site home page Giter Site logo

iq-scm / gast-net-3dposeestimation Goto Github PK

View Code? Open in Web Editor NEW

This project forked from fabro66/gast-net-3dposeestimation

0.0 0.0 0.0 88.53 MB

A Graph Attention Spatio-temporal Convolutional Networks for 3D Human Pose Estimation in Video (GAST-Net)

License: MIT License

Python 99.96% Makefile 0.04%

gast-net-3dposeestimation's Introduction

A Graph Attention Spatio-temporal Convolutional Networks for 3D Human Pose Estimation in Video (GAST-Net)

News

  • [2021/01/28] We update GAST-Net to able to generate 19-joint human poses including body and foot joints. [DEMO]
  • [2020/11/17] We provide a tutorial on how to generate 3D poses/animation from a custom video. [INFERENCE_EN.md]
  • [2020/10/15] We achieve online 3D skeleton-based action recognition with a single RGB camera. [video][code]
  • [2020/08/14] We achieve real-time 3D pose estimation. [video]

Introduction

Spatio-temporal information is key to resolve occlusion and depth ambiguity in 3D pose estimation. Previous methods have focused on either temporal contexts or local-to-global architectures that embed fixed-length spatio-temporal information. To date, there have not been effective proposals to simultaneously and flexibly capture varying spatio-temporal sequences and effectively achieves real-time 3D pose estimation. In this work, we improve the learning of kinematic constraints in the human skeleton: posture, local kinematic connections, and symmetry by modeling local and global spatial information via attention mechanisms. To adapt to single- and multi-frame estimation, the dilated temporal model is employed to process varying skeleton sequences. Also, importantly, we carefully design the interleaving of spatial semantics with temporal dependencies to achieve a synergistic effect. To this end, we propose a simple yet effective graph attention spatio-temporal convolutional network (GAST-Net) that comprises of interleaved temporal convolutional and graph attention blocks. Combined with the proposed method, we introduce a real-time strategy for online 3D skeleton-based action recognition with a simple RGB camera. Experiments on two challenging benchmark datasets (Human3.6M and HumanEva-I) and YouTube videos demonstrate that our approach effectively mitigates depth ambiguity and self-occlusion, generalizes to half upper body estimation, and achieves competitive performance on 2D-to-3D video pose estimation.

FrameWork

 Two-person 3D human pose estimation

Dependencies

Make sure you have the following dependencies installed before proceeding:

  • Python >=3.6
  • PyTorch >= 1.0.1
  • matplotlib
  • numpy
  • ffmpeg

Data preparation

  • Download the raw data from Human3.6M and HumanEva-I

  • Preprocess the dadaset in the same way as like VideoPose3D

  • Then put the preprocessed dataset under the data directory

     -data\
          data_2d_h36m_gt.npz
          data_3d_36m.npz
          data_2d_h36m_cpn_ft_h36m_dbb.npz
          data_2d_h36m_sh_ft_h36m.npz
      
          data_2d_humaneva15_gt.npz
          data_3d_humaneva15.npz
          data_2d_humaneva15_detectron_pt_coco.npz
    

Training & Testing

If you want to reproduce the results of our paper, run the following commands.

For Human3.6M:

python trainval.py -e 80 -k cpn_ft_h36m_dbb -arc 3,3,3 -drop 0.05 -b 128

For HumanEva:

python trainval.py -d humaneva15 -e 200 -k detectron_pt_coco -d humaneva15 -arc 3,3,3 -drop 0.5 -b 32 -lrd 0.98 -str Train/S1,Train/S2,Train/S3 -ste Validate/S1,Validate/S2,Validate/S3 -a Walk,Jog,Box --by-subject

To test on Human3.6M, run:

python trainval.py -k cpn_ft_h36m_dbb -arc 3,3,3 -c checkpoint --evaluate epoch_60.bin

To test on HumanEva, run:

python trainval.py -k detectron_pt_coco -arc 3,3,3 -str Train/S1,Train/S2,Train/S3 -ste Validate/S1,Validate/S2,Validate/S3 -a Walk,Jog,Box --by-subject -c checkpoint --evaluate epoch_200.bin

Download our pretrained models from model zoo(GoogleDrive or BaiduDrive (ietc))

cd root_path
mkdir checkpoint output
cd checkpoint
mkdir gastnet
-checkpoint\gastnet\
            27_frame_model.bin
            27_frame_model_toe.bin

Reconstruct 3D poses from 2D keypoints

Reconstruct 3D poses from 2D keypoints estimated from 2D detector (Mask RCNN, HRNet and OpenPose et al), and visualize it.

If you want to reproduce the baseball example (17 joints, only include body joints), please run the following code:

python reconstruction.py

If you want to reproduce the baseball example (19 joints, include body and toe joints), please run the following code:

python reconstruction.py -w 27_frame_model_toe.bin -n 19 -k ./data/keypoints/baseball_wholebody.json -kf wholebody
  • Reconstructed from YouTube video
17-joint 3D human pose estimation
19-joint 3D human pose estimation

How to generate 3D human poses from a custom video

We provide a tutorial on how to run our model on custom videos. See INFERENCE.md for more details.

Acknowledgements

This repo is based on

Thanks to the original authors for their work!

Reference

If you find our paper and repo useful, please cite our paper. Thanks!

@article{liu2020a,
  title={A Graph Attention Spatio-temporal Convolutional Networks for 3D Human Pose Estimation in Video},
  author={Liu, Junfa and Rojas, Juan and Liang, Zhijun and Li, Yihui and Guan, Yisheng},
  journal={arXiv preprint arXiv:2003.14179},
  year={2020}
}

Contact

gast-net-3dposeestimation's People

Contributors

fabro66 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.