facebookresearch / interhand2.6m Goto Github PK

Official PyTorch implementation of "InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image", ECCV 2020

License: Other

Python 100.00%

interhand2.6m's Introduction

InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image

Our new Re:InterHand dataset has been released, which has much more diverse image appearances with more stable 3D GT. Check it out at here!

Introduction

This repo is official PyTorch implementation of InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image (ECCV 2020).
Our InterHand2.6M dataset is the first large-scale real-captured dataset with accurate GT 3D interacting hand poses.
Videos of 3D joint coordinates (from joint_3d.json) from the 30 fps split: link.
Videos of MANO fittings (only two hands) from the 30 fps split: link.

Above demo videos have low-quality frames because of the compression for the README upload.

News

2021.06.10. Boxs in RootNet results are updated to be correct.
2021.03.22. Finally, InterHand2.6M v1.0, which includes all images of 5 fps and 30 fps version, is released! 🎉 This is the dataset used in InterHand2.6M paper.
2020.11.26. Demo code for a random image is added! Checkout below instructions.
2020.11.26. Fitted MANO parameters are updated to the better ones (fitting error is about 5 mm). Also, reduced to much smaller file size by providing parameters fitted to the world coordinates (independent on the camera view).
2020.10.7. Fitted MANO parameters are available! They are obtained by NeuralAnnot.

InterHand2.6M dataset

For the InterHand2.6M dataset download and instructions, go to [HOMEPAGE].
Belows are instructions for our baseline model, InterNet, for 3D interacting hand pose estimation from a single RGB image.

Demo on a random image

Download pre-trained InterNet from here
Put the model at demo folder
Go to demo folder and edit bbox in here
run python demo.py --gpu 0 --test_epoch 20
You can see result_2D.jpg and 3D viewer.

MANO mesh rendering demo

Install SMPLX
cd tool/MANO_render
Set smplx_path in render.py
Run python render.py

MANO parameter conversion from the world coordinate to the camera coordinate system

Install SMPLX
cd tool/MANO_world_to_camera/
Set smplx_path in convert.py
Run python convert.py

Camera positions visualization demo

cd tool/camera_visualize
Run python camera_visualize.py

As there are many cameras, you'd better set subset and split in line 9 and 10, respectively, by yourself.

Running InterNet

Start

In the main/config.py, you can change settings of the model including dataset to use and which root joint translation vector to use (from gt or from RootNet).

Train

In the main folder, run

python train.py --gpu 0-3

to train the network on the GPU 0,1,2,3. --gpu 0,1,2,3 can be used instead of --gpu 0-3. If you want to continue experiment, run use --continue.

Test

Place trained model at the output/model_dump/.

In the main folder, run

python test.py --gpu 0-3 --test_epoch 20 --test_set $DB_SPLIT

to test the network on the GPU 0,1,2,3 with snapshot_20.pth.tar. --gpu 0,1,2,3 can be used instead of --gpu 0-3.

$DB_SPLIT is one of [val,test].

val: The validation set. Val in the paper.
test: The test set. Test in the paper.

Results

Here I provide the performance and pre-trained snapshots of InterNet, and output of the RootNet as well.

Pre-trained InterNet

RootNet output

RootNet codes

Codes
See RootNet for the code instructions.

Reference

@InProceedings{Moon_2020_ECCV_InterHand2.6M,  
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},  
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},  
booktitle = {European Conference on Computer Vision (ECCV)},  
year = {2020}  
}

License

InterHand2.6M is CC-BY-NC 4.0 licensed, as found in the LICENSE file.

[Terms of Use] [Privacy Policy]

interhand2.6m's People

Contributors

Stargazers

Watchers

Forkers

maasa1221 electricvocaloid hyzcn trantorrepository bloodflake tchigher liaw05 xp-ji ttakamura zhoushiwei yoyokitartora lxp2014 hwtwj hucui2022 tiansong1991 daydreamer2023 suhelnaryal chenyanyin saqib22 dumyy lingtengqiu yuxiaohui78 harmoniqpunk obuil zeta1999 kaiyiuw jjae-won wangjingbo1219 saintserge ailbertone iceyu jeffreyyihuang mohamedalirashad chin-fu-liu waterai12 booker-max pravin-dangol scott-mao llijiang lovegood-1 hommmm wtnan2003 menghao666 usamahasan seo0914 tyluann sajalmaheshwari624 wangxuewen99 diwgan32 xjhaoren hanchenchen jianxunreal cosmoshua metavai human2b shreyashampali dine1717 joedragonxii wujinzhong markson14 lightningbear yymaitxm binghui-z hf618 cvstack ke1ynoc eldentse jianzfb erwinwu211 ligaoqi2 mornydew jh-001 hologerry itouoti12 shchhan-meta alekseizhuravlev iq-scm mjones00 davidpengiupui rolandtate hex41434 freeheart111 chaerinmin kimx3966 fangwudi ymingdai tolgakrblt fruitboy1226 chaohz1 xiaopeiyang hongrui16 0josedark

interhand2.6m's Issues

Missing skeleton.txt for RHD and STB

Hi, @mks0601

Thanks for your awesome work!

Is the skeleton.txt file for RHD and STB missing ? Could you please add it ?

something wrong when read the dataset

Hi, thanks for your work and your dataset.When I use your demo I meet the error:

joint_world = np.array(joints[str(capture_id)][str(frame_idx)]['world_coord'], dtype=np.float32)
TypeError: list indices must be integers or slices, not str

I download the annotation file "InterHand2.6M.annotations.5.fps.zip" and the image files allready.Do you know how to solve it ? Is there any possible that I download the wrong files?

Timeline of the full release 30FPS

Any update on when the 30FPS images be released?

Results about non-interacting hands

I have a question that if both hands are on some distance or in other words we can say, there is no interaction in both hands then will this model handle this situation? As I checked that there is only one bbox for interacting hands. For this model is it necessary to keep both hands in interaction?

Question about joints order

@mks0601 what is the order of joints if we talk about only right hand?
Can you confirm that order of joint is same as below?

In above list where 00 is the joints closest to wrist and 03 is the finger tips

If order is not same as above list then what is the order of joints?

about the MANO parameters

Hi , I noticed that your update MANO parameters is fitted to the world coordinate.But I think fitting the camera coordinate as the most hand dataset do is better.
Can you tell me the benefit to fit the world coordinate ?

shouldn't the `all` folder have the validation files also ?

It's kind of confusing to find the validation json files in machine_annot folder but not in the all folder, shouldn't it be also their or i miss-understood something ?

How can I visualize just like the demo video from my own video or in real-time?

or when will the code release thanks!

The problem happend, when I used render.py

Could u help me to address this problem.

Traceback (most recent call last):
  File "/home/adminroot/myproject/py_3_6_benchmark/lib/python3.6/site-packages/pyrender/platforms/pyglet_platform.py", line 39, in init_context
    width=1, height=1)
  File "/home/adminroot/myproject/py_3_6_benchmark/lib/python3.6/site-packages/pyglet/window/xlib/__init__.py", line 173, in __init__
    super(XlibWindow, self).__init__(*args, **kwargs)
  File "/home/adminroot/myproject/py_3_6_benchmark/lib/python3.6/site-packages/pyglet/window/__init__.py", line 603, in __init__
    config = screen.get_best_config(config)
  File "/home/adminroot/myproject/py_3_6_benchmark/lib/python3.6/site-packages/pyglet/canvas/base.py", line 194, in get_best_config
    raise window.NoSuchConfigException()
pyglet.window.NoSuchConfigException

Mapping INTERHand joint to MANO joint order

Hi, this is really nice work.
I Have a question about this dataset. Based on my understanding, INTERHand26m joint order is different from MANO joint order.
Do you have a plan to map the joint to the MANO version?
Thank you!

Best regards,

Joshua

Overlapping train and test set

Thanks for sharing the dataset and the code. I am trying to figure out how the train, val and test sets are split. Therefore, I ran the following code and found that some images in train, val and test sets overlaps. I am wondering if I did something wrong? Or it is a mistake in the provided json files.

import pickle as pkl
import json

with open('./data/InterHand2.6M/annotations/all/InterHand2.6M_train_data.json') as f:
    data = json.load(f)
    
sequences = []
for idx in range(len(data['images'])):
    seq_name = data['images'][idx]['file_name']
    sequences.append(seq_name)
sequences_train = set(sequences)


with open('./data/InterHand2.6M/annotations/machine_annot/InterHand2.6M_val_data.json') as f:
    data = json.load(f)
    
sequences = []
for idx in range(len(data['images'])):
    seq_name = data['images'][idx]['file_name']
    sequences.append(seq_name)    
sequences_val = set(sequences)

with open('./data/InterHand2.6M/annotations/all/InterHand2.6M_test_data.json') as f:
    data = json.load(f)
    
sequences = []
for idx in range(len(data['images'])):
    seq_name = data['images'][idx]['file_name']
    sequences.append(seq_name)
    
sequences_test = set(sequences)

Results

The output for sequences_train.intersection(sequences_val) is set().

The output for len(sequences_test.intersection(sequences_val)) is 13507

The output for list(sequences_test.intersection(sequences_val))[:10] is:

['Capture0/ROM02_Interaction_2_Hand/cam400428/image22223.jpg',
 'Capture0/ROM03_LT_No_Occlusion/cam400364/image16814.jpg',
 'Capture0/ROM03_LT_No_Occlusion/cam400269/image16454.jpg',
 'Capture0/ROM03_LT_No_Occlusion/cam400372/image17444.jpg',
 'Capture0/ROM03_LT_No_Occlusion/cam400299/image16604.jpg',
 'Capture0/ROM03_LT_No_Occlusion/cam400500/image16772.jpg',
 'Capture0/ROM03_LT_No_Occlusion/cam400281/image16550.jpg',
 'Capture0/ROM03_LT_No_Occlusion/cam400375/image17372.jpg',
 'Capture0/ROM03_LT_No_Occlusion/cam400364/image17348.jpg',
 'Capture0/ROM03_LT_No_Occlusion/cam400367/image16472.jpg']

The output for sequences_train.intersection(sequences_test) is:

{'Capture5/0007_thumbup_normal/cam400006/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam400010/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam400042/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam400053/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam400067/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam410003/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam410067/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam410209/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam410210/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam410218/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam410236/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam410238/image2564.jpg',
 'Capture7/0001_neutral_rigid/cam400006/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam400008/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam400012/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam400013/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam400016/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam400017/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam400041/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam400053/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam400059/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam400067/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410001/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410003/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410004/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410007/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410028/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410053/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410062/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410063/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410208/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410209/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410210/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410213/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410218/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410219/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410233/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410236/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410238/image0372.jpg'}

When can we expect the code for using model on our own images?

Hi,
Congratulation for such great work. I am building an application where I need a robust hand pose estimation model like yours. I tried to figure out the code to use own images myself but couldn't achieve it. Parameters like focal, principal points, abs depths are confusing me. So, Can you give me some directions on this and share any potential dates for the release of code to use your own images. Thank you.

Cannot reproduce paper numbers following instructions

Hi, I followed the instructions according to the repo to train and reproduce the InterHand performance. While the reported numbers for interacting hand pose validation error is 18.58mm (Table 4), my reproduced number is 20mm. Do you know why there is a discrepancy? I didn't modify anything from the repo to train this.

I saw a bug report earlier that the image sizes were swapped. Would that be the reason? Thanks.

Here are the command I used for training and validation (I guess I should use epoch 19 for testing as the number starts from 0):

python train.py --gpu 2 --annot_subset all
python test.py --gpu 0 --test_epoch 19 --test_set val --annot_subset machine_annot

I use the default config.py but do a batch size of 32 with accumulate gradient of 2, which should be equivalent to batch size 64.

Evaluation start...
Handedness accuracy: 0.9831676136363636
MRRPE: 35.12182411345029

MPJPE for each joint: 
r_thumb4: 21.78, r_thumb3: 16.52, r_thumb2: 12.93, r_thumb1: 8.33, r_index4: 25.84, r_index3: 21.78, r_index2: 18.68, r_index1: 14.23, r_middle4: 25.96, r_middle3: 22.06, r_middle2: 19.23, r_middle1: 14.23, r_ring4: 24.27, r_ring3: 20.41, r_ring2: 17.57, r_ring1: 13.15, r_pinky4: 22.74, r_pinky3: 19.37, r_pinky2: 17.12, r_pinky1: 12.72, r_wrist: 0.00, l_thumb4: 22.68, l_thumb3: 17.41, l_thumb2: 13.32, l_thumb1: 8.35, l_index4: 25.08, l_index3: 20.58, l_index2: 17.80, l_index1: 13.81, l_middle4: 25.59, l_middle3: 21.60, l_middle2: 18.89, l_middle1: 14.08, l_ring4: 23.69, l_ring3: 19.87, l_ring2: 17.31, l_ring1: 13.56, l_pinky4: 23.56, l_pinky3: 20.01, l_pinky2: 17.20, l_pinky1: 13.07, l_wrist: 0.00, 
MPJPE for all hand sequences: 17.53

MPJPE for each joint: 
r_thumb4: 17.88, r_thumb3: 13.76, r_thumb2: 10.29, r_thumb1: 6.97, r_index4: 19.81, r_index3: 17.42, r_index2: 15.27, r_index1: 12.04, r_middle4: 22.19, r_middle3: 19.61, r_middle2: 16.82, r_middle1: 11.89, r_ring4: 21.16, r_ring3: 18.37, r_ring2: 15.21, r_ring1: 10.80, r_pinky4: 19.80, r_pinky3: 17.02, r_pinky2: 14.57, r_pinky1: 9.99, r_wrist: 0.00, l_thumb4: 19.56, l_thumb3: 15.51, l_thumb2: 11.46, l_thumb1: 7.24, l_index4: 19.82, l_index3: 16.34, l_index2: 14.42, l_index1: 11.36, l_middle4: 21.45, l_middle3: 18.39, l_middle2: 15.78, l_middle1: 11.76, l_ring4: 20.48, l_ring3: 17.29, l_ring2: 14.55, l_ring1: 11.26, l_pinky4: 20.50, l_pinky3: 17.41, l_pinky2: 14.73, l_pinky1: 10.81, l_wrist: 0.00, 
MPJPE for single hand sequences: 14.79

MPJPE for each joint: 
r_thumb4: 25.54, r_thumb3: 19.17, r_thumb2: 15.46, r_thumb1: 10.48, r_index4: 32.22, r_index3: 26.37, r_index2: 22.16, r_index1: 16.37, r_middle4: 30.13, r_middle3: 24.67, r_middle2: 21.72, r_middle1: 16.50, r_ring4: 27.56, r_ring3: 22.58, r_ring2: 19.97, r_ring1: 15.41, r_pinky4: 25.83, r_pinky3: 21.77, r_pinky2: 19.64, r_pinky1: 15.33, r_wrist: 0.00, l_thumb4: 26.10, l_thumb3: 19.45, l_thumb2: 15.32, l_thumb1: 10.11, l_index4: 31.30, l_index3: 25.39, l_index2: 21.47, l_index1: 16.41, l_middle4: 31.19, l_middle3: 25.41, l_middle2: 22.33, l_middle1: 16.54, l_ring4: 28.02, l_ring3: 22.87, l_ring2: 20.31, l_ring1: 16.01, l_pinky4: 27.30, l_pinky3: 22.88, l_pinky2: 19.85, l_pinky1: 15.46, l_wrist: 0.00, 
MPJPE for interacting hand sequences: 20.54

Visualize

I run Vis.py but there is no output i get, how can I Visualize same as demo video?

the prediction is not good

Hi, I run the testing script to test the interHand2.6M's test-set. but the prediction of hand keypoints is very bad.
the testing command is follow as python3.5 test.py --gpu 0 --test_epoch 20 --test_set test --annot_subset all
Is there anything I might have done wrong?

questions about root depth

It seems that this project uses the absolute depth obtained from rootnet for all datasets in the evaluation procedure，so how do you train the rootnet for the hand project. As I know，rootnet is desinged for human pose. Can you release the training code for the rootnet in hand pose? thanks.

download/verify scripts not working for 5fps image download

Had to modify both to work in case this helps someone else

download 5fps

import os
url = 'https://fb-baas-f32eacb9-8abb-11eb-b2b8-4857dd089e15.s3.amazonaws.com/InterHand2.6M/InterHand2.6M.images.5.fps.v1.0/'
for part1 in ('a', 'b'):
    for part2 in ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'):
        if part1 == 'b' and part2 == 's': break
        tar_file = f'InterHand2.6M.images.5.fps.v1.0.tar.part{part1}{part2}'
        # check if tarfile is already downloaded, if not download
        if not os.path.exists(tar_file): os.system(f'wget {url}{tar_file}')
    

os.system(f'wget {url}InterHand2.6M.images.5.fps.v1.0.tar.CHECKSUM')
os.system(f'wget {url}unzip.sh')
os.system(f'wget {url}verify_download.py')

only had to change a single line on the verify_download.py

for line in tqdm(checksums):
    md5sum, filename = line.split()
    _, filename = filename.split('/')

this is because the MD5 file has an extra InterHand2.6M_5fps_batch1_splits/ in it

what is "smplx_path" indeed?

You said Set smplx_path in render.py in section of MANO mesh rendering demo, but what is smplx_path indeed?

PyTorch -> ONNX -> Unity -" Only tensors of rank 4 or less are supported, but got rank 5"

I'm trying to test this model in Unity Inference Engine. To do this I had to export it as ONNX format. I managed to export it as ONNX format but once I imported into Unity I got this error:

Asset import failed, "Assets/models/interhand.onnx" > OnnxImportException: Unexpected error while parsing layer 561 of type Reshape.
Only tensors of rank 4 or less are supported, but got rank 5

Json: { "input": [ "559", "560" ], "output": [ "561" ], "name": "Reshape_131", "opType": "Reshape" }
  at Unity.Barracuda.ONNXLayout.AxisPermutationsForMappingONNXLayoutToBarracuda (System.Int32 onnxRank, System.String onnxLayout) [0x003ef] in /Users/nautilus/Pose-Demo/Library/PackageCache/[email protected]/Barracuda/Editor/ONNXLayout.cs:152 
  at Unity.Barracuda.ONNXLayout.PermuteToBarracuda (System.Int64[] shape, System.String onnxLayout) [0x00003] in /Users/nautilus/Pose-Demo/Library/PackageCache/[email protected]/Barracuda/Editor/ONNXLayout.cs:158 
  at Unity.Barracuda.ONNXLayout.ConvertSymbolicShapeToBarracuda (System.Int64[] onnxShape, System.String onnxLayout) [0x00000] in /Users/nautilus/Pose-Demo/Library/PackageCache/[email protected]/Barracuda/Editor/ONNXLayout.cs:223 
  at Unity.Barracuda.ONNXModelImporter.<.ctor>b__14_1 (Unity.Barracuda.ModelBuilder net, Unity.Barracuda.ONNXNodeWrapper node) [0x000b9] in /Users/nautilus/Pose-Demo/Library/PackageCache/[email protected]/Barracuda/Editor/ONNXModelImporter.cs:105 
  at Unity.Barracuda.ONNXModelImporter.ConvertOnnxModel (Onnx.ModelProto onnxModel) [0x0032f] in /Users/nautilus/Pose-Demo/Library/PackageCache/[email protected]/Barracuda/Editor/ONNXModelImporter.cs:1088

We are talking about these 2 Reshapes

Any idea how can I do a workaround to those reshapes to use tensor rank 4 ?

The ONNX model can be downloaded from here: https://github.com/nauutilus/InterHand2.6M/releases/download/0.0.1/interhand.onnx

Dataset ""transform** bug?

1、https://github.com/facebookresearch/InterHand2.6M/blob/master/common/base.py#L81
transforms.ToTensor(), img->[0, 1]
2、https://github.com/facebookresearch/InterHand2.6M/blob/master/data/InterHand2.6M/dataset.py#L139
img->[0, 1/255.] ?

the application scope of the data and model

Thanks for this great work and dataset. Do the dataset and model just fit in the scenario that two hands are very close to each other?

Why multiply the heatmap by 255?

When I trace the training code in model.py , I saw that "heatmap" is multiplied by 255 on the 38 line.
Could this value be an arbitrary scale value, or have other physical meanings?

Thanks in advance.

How to add rootnet to my data set,and how can I finetune the model

How to add rootnet to my data set, 3DMPPE_ROOTNET_RELEASE seems to only apply to human posture.

Is there any plan to release images in 30 fps？

How to implement a front bbox detector for InterNet?

Question about the meaning of camera name

Hi, I find that your ECCV paper reports the result of the model which is trained only under four different views.
I would like to know the camera id of these different views.
By the way, I also would like to know the meaning of these ids, for example, which id means the front view?
Thanks,
Jingbo

Issue using transform.py function to convert world coordinates to pixel coordinates

Hello,

I am using the utility functions world2camera and camera2pixel to compute the image plane coordinates for the joints locations in the 2D images hand images. I load in the camera parameters R,t just as you do in the render.py code. However when I piece everything together, the joint locations do not match with those in the images (see example below). Any help is appreciated. How are the R and t quantities used to convert from world to camera coordinates?

'def cam2pixel(cam_coord, f, c):
x = cam_coord[:, 0] / (cam_coord[:, 2] + 1e-8) * f[0] + c[0]
y = cam_coord[:, 1] / (cam_coord[:, 2] + 1e-8) * f[1] + c[1]
z = cam_coord[:, 2]
img_coord = np.concatenate((x[:,None], y[:,None], z[:,None]),1)
return img_coord

def world2cam(world_coord, R, T):
cam_coord = np.dot(world_coord-T,R)
cam_coord.transpose()
return cam_coord

def world2image(joints,cam_params, capture_id, frame_idx, cam, hand_type):
# camera extrinsic parameters (t is the translation vector, R is the rotation matrix)
t, R = np.array(cam_params[str(capture_id)]['campos'][str(cam)], dtype=np.float32).reshape(3), np.array(cam_params[str(capture_id)]['camrot'][str(cam)], dtype=np.float32).reshape(3,3)
t = -np.dot(R,t.reshape(3,1)).reshape(3) # -Rt -> t
focal=cam_params[str(capture_id)]['focal'][str(cam)]
princpt=cam_params[str(capture_id)]['princpt'][str(cam)]

# Transform to camera coordinates
cam_coord=world2cam( joints[str(capture_id)][str(frame_idx)]['world_coord'],R,t)

#Transform to pixel/image coordinates
image_coord=cam2pixel(cam_coord,focal,princpt)

#Split into 2 subarray. One for right hand. One for left hand.
image_coord_right=image_coord[np.arange(0,21),:]
image_coord_left=image_coord[np.arange(21,21*2),:]

#Fill in zeros if one hand does not appear in frame
if hand_type == 'right':
    image_coord_left=np.zeros(np.shape(image_coord_left))
elif hand_type == 'left':
  image_coord_right=np.zeros(np.shape(image_coord_right))
    
return [image_coord_right,image_coord_left]

Question about InterNet compared with other work

Hi,

I read the paper (ECCV), and I found that you compared your InterNet with other state of art work in Table 5.

So, I would like to know how can I get the EPE of the model on dataset STB and RHP?

By the way, the model you used to get EPE on STB or RHP is trained by InterHand dataset or the corresponding dataset? (I mean the corresponding is that if we want to get EPE on STB data, we need to train the model on STB dataset)

Error in extracting files from the InterHand2.6M.tar

Hello! Thank you for releasing the great hand dataset. I downloaded all parts of the InterHand2.6M dataset. When I used the command:
cat InterHand2.6M.images.5.fps.v0.0.tar.parta* | tar -xvf - -i
error occured:

tar: Skipping to next header
tar: Skipping to next header
tar: Skipping to next header
tar: Exiting with failure status due to previous errors

Is there anything wrong in the tar files?

No module named 'nets'

Hello i'm facing this error any help ?

File "/Users/Compu/InterHand2.6M/main/model.py", line 11, in
from nets.module import BackboneNet, PoseNet
ModuleNotFoundError: No module named 'nets'

what are 4 views that used on Table 6 in paper?

Thanks for your great research!

Above image, what are 4 views trained ? (cam numbers like cam400004)
thank you 👍

test per batch

hey!

great work

Google Colab (GPU enabled) doesn't have enough space to download all of the images (40GB). Yet it’s tempting to try the whole thing. Can you possibly provide which of tar parts consist test split?

Question about left/right hands in MANO annotation

For some images with two hands, the provided MANO annotation only contains one hand. What is the reasoning behind removing one hand? Are they removed because of a bad fitting error?

What was the fitting procedure for MANO parameters?

Would love to get a bit more info as to how you fit the MANO hand model to the hand. Was it a similar process to what was done in the FreiHand dataset (multiview loss function for 2d/3d/segmentation) or something different like the SPIN algorithm for full-body mesh estimation? I didn't see anything about a fitting procedure in the original paper. Thanks!

annotations for 30 fps

Hi @mks0601 ,

Thanks for your great work,
Do all images of 30 fps version have annotations available? or some of them have annotations?

STB & RHD

hey! finally, clear hands dataset (v00) and impressive baseline. great work! looking forward to get full version.

what do STB & RHD stand for?

Question about using my own image to test

1.Hello,first I have question about the result of using my own image to test,the result seems not really good.Especially the Little finger part,it didn't catch well as the image I show.However,while I'm using your InterHand2.6M datasets,the result seems really good,so I'm confuse about is the model's problem or not.

2.Second,I want to ask how can I use particular folder include my own image to test.Because now I need to follow the sequence of annotation json file.Thanks!

Mutli-view data.

Hi,
Thanks for your great work and provide us this wonderful dataset !
I have a question about multi-data, can I determine the sample pose from the name of image.
For example, can I presume the following two images to have the same pose, since they only differ in camera_id.
Capture1/0287_pointingtowardsfeatures/cam410210/image67650.jpg and
Capture1/0287_pointingtowardsfeatures/cam410220/image67650.jpg

How do i run this on test image ? and How can I get two hands separately ?

can not

Hi, thanks for such great work!

I fail to download this dataset, as the error like "Time limit of download is exceeded!" occured.
Do u know ehat happened? Is there any solution?

all best,
Hao Meng

When will the 30fps images be released?

Hi
I would like to do some research on the video-based 3D hand pose estimation and I would like to know when will the 30 fps image will be released.
THX

rootnet_output

Hi：
I can't find this two file rootnet_interhand2.6m_output_machine_annot_val.json and rootnet_interhand2.6m_output_all_test.json in the dataset,please tell me the files where I can find?

train resnet18 with InterHand error

I run test.py as readme successfully, but there are some errors when I try to train a resnet18 model.

I have already changed the resnet_type in main/config.py, I think there must be some configs still needs to be modified, but I couldn't find them. Can you help?

$ python train.py --gpu 0-3 --annot_subset human_annot                              
>>> Using GPU: 0,1,2,3
04-30 03:40:55 Creating train dataset...
Load annotation from  ../data/InterHand2.6M/annotations/human_annot
loading annotations into memory...
Done (t=10.21s)
creating index...
index created!
Get bbox and root depth from groundtruth annotation
Number of annotations in single hand sequences: 76445
Number of annotations in interacting hand sequences: 208271
04-30 03:42:11 Creating graph and optimizer...
Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to /root/.cache/torch/hub/checkpoints/resnet18-5c106cde.pth
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44.7M/44.7M [00:00<00:00, 107MB/s]
Initialize resnet from model zoo
Traceback (most recent call last):
  File "train.py", line 90, in <module>
    main()
  File "train.py", line 60, in main
    loss = trainer.model(inputs, targets, meta_info, 'train')
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 161, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 171, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 428, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/cephfs2/peichao/code/InterHand2.6M/main/model.py", line 45, in forward
    joint_heatmap_out, rel_root_depth_out, hand_type = self.pose_net(img_feat)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/cephfs2/peichao/code/InterHand2.6M/main/../common/nets/module.py", line 48, in forward
    joint_img_feat_1 = self.joint_deconv_1(img_feat)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 929, in forward
    output_padding, self.groups, self.dilation)
RuntimeError: Given transposed=1, weight of size [2048, 256, 4, 4], expected input[16, 512, 8, 8] to have 2048 channels, but got 512 channels instead

ImportError: Cannot load backend 'TkAgg' which requires the 'tk' interactive framework, as 'headless' is currently running

Hi,

I am trying to run the demo.py on the sample image in demo folder.
Could you please help me to address this problem?

File "demo.py", line 25, in
from utils.vis import vis_keypoints, vis_3d_keypoints
File "/InterHand2.6M-master/main/../common/utils/vis.py", line 14, in
import matplotlib.pyplot as plt
File "/anaconda3/envs/interhand/lib/python3.6/site-packages/matplotlib/pyplot.py", line 2336, in
switch_backend(rcParams["backend"])
File "/anaconda3/envs/interhand/lib/python3.6/site-packages/matplotlib/pyplot.py", line 287, in switch_backend
newbackend, required_framework, current_framework))
ImportError: Cannot load backend 'TkAgg' which requires the 'tk' interactive framework, as 'headless' is currently running

I have already tried to change 'tkagg' to 'TKAgg' in vis.py as below:

import matplotlib
matplotlib.use('TKAgg')

Then:
import matplotlib.pyplot as plt

Thanks in advance!

Bad results in wild images.

I test the networks in default input image, and I got good result as descributions. But when I test hand images from the internet, I got bad results.

nearby perspective

Hi,
Thanks for your great work and provide us this wonderful dataset !
How to find the data of the nearby perspective?

How to transfer MANO parameters from world coordinate system to camera coordinate system

I need to use the MANO parameters in the camera coordinate system, and try to use the camera extrinsics to convert it to the camera coordinate system. But it failed, and the fitting error reached tens of millimeters.
Part of the code is as follows:

        mano_pose = np.array(mano_param['pose']).reshape(-1,3)
        # mano_pose = np.dot(R, mano_pose.transpose(1,0)).transpose(1,0) + t.reshape(1,3)/1000  # (16,3)
        mano_pose = torch.FloatTensor(mano_pose)
        root_pose = mano_pose[0].view(1, 3)
        root_pose = np.dot(R, root_pose.transpose(1, 0)).transpose(1, 0) + t.reshape(1, 3) / 1000
        root_pose = torch.FloatTensor(root_pose)
        hand_pose = mano_pose[1:, :].contiguous().view(1, -1)
        shape = torch.FloatTensor(mano_param['shape']).view(1, -1)
        trans = np.array(mano_param['trans']).reshape(-1,3)
        trans = np.dot(R, trans.transpose(1,0)).transpose(1,0) + t.reshape(1,3)/1000
        trans = torch.FloatTensor(trans).view(1, -1)
        output = mano_layer[hand_type](global_orient=root_pose, hand_pose=hand_pose, betas=shape, transl=trans)
        mesh = output.vertices[0].numpy() * 1000
        fit_err = get_fitting_error(mesh, ih26m_joint_regressor, cam_params, joints, hand_type, capture_idx,frame_idx, cam_idx)
        print('Fitting error: ' + str(fit_err) + ' mm')

Missing Validation Human Annotation

I don't see the following:

human_annot/InterHand2.6M_val_camera.json  
human_annot/InterHand2.6M_val_data.json  
human_annot/InterHand2.6M_val_joint_3d.json

Are they provided?

The current archive has:

Archive:  InterHand2.6M.annotations.5.fps.zip
   creating: all/
  inflating: all/InterHand2.6M_test_camera.json  
  inflating: all/InterHand2.6M_test_data.json  
  inflating: all/InterHand2.6M_test_joint_3d.json  
  inflating: all/InterHand2.6M_train_camera.json  
  inflating: all/InterHand2.6M_train_data.json  
  inflating: all/InterHand2.6M_train_joint_3d.json  
   creating: human_annot/
  inflating: human_annot/InterHand2.6M_test_camera.json  
  inflating: human_annot/InterHand2.6M_test_data.json  
  inflating: human_annot/InterHand2.6M_test_joint_3d.json  
  inflating: human_annot/InterHand2.6M_train_camera.json  
  inflating: human_annot/InterHand2.6M_train_data.json  
  inflating: human_annot/InterHand2.6M_train_joint_3d.json  
   creating: machine_annot/
  inflating: machine_annot/InterHand2.6M_test_camera.json  
  inflating: machine_annot/InterHand2.6M_test_data.json  
  inflating: machine_annot/InterHand2.6M_test_joint_3d.json  
  inflating: machine_annot/InterHand2.6M_train_camera.json  
  inflating: machine_annot/InterHand2.6M_train_data.json  
  inflating: machine_annot/InterHand2.6M_train_joint_3d.json  
  inflating: machine_annot/InterHand2.6M_val_camera.json  
  inflating: machine_annot/InterHand2.6M_val_data.json  
  inflating: machine_annot/InterHand2.6M_val_joint_3d.json  
  inflating: skeleton.txt            
  inflating: subject.txt

Can you keep the results from v0.0?

Thanks for releasing the full dataset. I noticed that the results and the model from v0.0 dataset are no longer there. Is it possible to include those in the current README again? Because some of us might need to do some experiments during ICCV rebuttal.

InterHand2.6M_XXX_MANO.json Discard or not?

all of the InterHand2.6M_XXX_MANO.json are replaced by InterHand2.6M_XXX_MANO_NeuralAnnot.json ? THX

facebookresearch / interhand2.6m Goto Github PK

interhand2.6m's Introduction

InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image

Our new Re:InterHand dataset has been released, which has much more diverse image appearances with more stable 3D GT. Check it out at here!

Introduction

News

InterHand2.6M dataset

Demo on a random image

MANO mesh rendering demo

MANO parameter conversion from the world coordinate to the camera coordinate system

Camera positions visualization demo

Directory

Root

Data

Output

Running InterNet

Start

Train

Test

Results

Pre-trained InterNet

RootNet output

RootNet codes

Reference

License

interhand2.6m's People

Contributors

Stargazers

Watchers

Forkers

interhand2.6m's Issues

Results

Recommend Projects

Recommend Topics

Recommend Org