Giter Site home page Giter Site logo

pandinosaurus / v2v-posenet_release Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mks0601/v2v-posenet_release

0.0 2.0 0.0 11.62 MB

Official Torch7 implementation of "V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map", CVPR 2018

Home Page: https://arxiv.org/abs/1711.07399

License: MIT License

MATLAB 53.11% Lua 41.55% Python 5.33%

v2v-posenet_release's Introduction

PWC PWC PWC PWC PWC PWC

V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map

PyTorch re-implementation of the V2V-PoseNet is available at dragonbook's repo.

Thanks dragonbook for re-implementation. Other versions of the V2V-PoseNet are also welcome!

Introduction

This is our project repository for the paper, V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map (CVPR 2018).

We, Team SNU CVLAB, (Gyeongsik Moon, Juyong Chang, and Kyoung Mu Lee of Computer Vision Lab, Seoul National University) are winners of HANDS2017 Challenge on frame-based 3D hand pose estimation.

Please refer to our paper for details.

If you find our work useful in your research or publication, please cite our work:

[1] Moon, Gyeongsik, Ju Yong Chang, and Kyoung Mu Lee. "V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map." CVPR 2018. [arXiv]

@InProceedings{Moon_2018_CVPR_V2V-PoseNet,
author = {Moon, Gyeongsik and Chang, Juyong and Lee, Kyoung Mu},
title = {V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2018}
}

In this repository, we provide

  • Our model architecture description (V2V-PoseNet)
  • HANDS2017 frame-based 3D hand pose estimation Challenge Results
  • Comparison with the previous state-of-the-art methods
  • Training code
  • Datasets we used (ICVL, NYU, MSRA, ITOP)
  • Trained models and estimated results
  • 3D hand and human pose estimation examples

Model Architecture

V2V-PoseNet

HANDS2017 frame-based 3D hand pose estimation Challenge Results

Challenge_result

Comparison with the previous state-of-the-art methods

Paper_result_hand_graph

Paper_result_hand_table

Paper_result_human_table

About our code

Dependencies

Our code is tested under Ubuntu 14.04 and 16.04 environment with Titan X GPUs (12GB VRAM).

Code

Clone this repository into any place you want. You may follow the example below.

makeReposit = [/the/directory/as/you/wish]
mkdir -p $makeReposit/; cd $makeReposit/
git clone https://github.com/mks0601/V2V-PoseNet_RELEASE.git
  • src folder contains lua script files for data loader, trainer, tester and other utilities.
  • data folder contains data converter which converts image files to the binary files.

To train our model, please run the following command in the src directory:

th rum_me.lua
  • There are some optional configurations you can adjust in the config.lua.
  • You have to convert the .png images of the ICVL, NYU, and HANDS 2017 dataset to the .bin files by running the code from data folder.
  • The directory where you have to put the dataset files and computed centers of each frame is defined in src/data/dataset_name/data.lua
  • Visualization code is finally uploaded! You have to prepare 'result_pixel.txt' for each dataset. Each row of the result file has to contain the pixel coordinates of x, y and depth of all joints (i.e, x1 y1 z1 x2 y2 z2 ...). Then run pixel2world script and run draw_DB.m

Dataset

We trained and tested our model on the four 3D hand pose estimation and one 3D human pose estimation datasets.

Results

Here we provide the precomputed centers, estimated 3D coordinates and pre-trained models of ICVL, NYU, MSRA, HANDS2017, and ITOP datasets. You can download precomputed centers and 3D hand pose results in here and pre-trained models in here

The precomputed centers are obtained by training the hand center estimation network from DeepPrior++ . Each line represents 3D world coordinate of each frame. In case of ICVL, NYU, MSRA, and HANDS2017 dataset, if depth map not exist or not contain hand, that frame is considered as invalid. In case of ITOP dataset, if 'valid' variable of a certain frame is false, that frame is considered as invalid. All test images are considered as valid.

The 3D coordinates estimated on the ICVL, NYU and MSRA datasets are pixel coordinates and the 3D coordinates estimated on the HANDS2017 and ITOP datasets are world coordinates. The estimated results are from ensembled model. You can make the results from a single model by downloading the pre-trained model and testing it.

We used awesome-hand-pose-estimation to evaluate the accuracy of the V2V-PoseNet on the ICVL, NYU and MSRA dataset.

Belows are qualitative results. result_1 result_2 result_3 result_4 result_5 result_6

v2v-posenet_release's People

Contributors

mks0601 avatar

Watchers

James Cloos avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.