Giter Site home page Giter Site logo

facebookresearch / interhand2.6m Goto Github PK

View Code? Open in Web Editor NEW
677.0 677.0 92.0 19.63 MB

Official PyTorch implementation of "InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image", ECCV 2020

License: Other

Python 100.00%

interhand2.6m's Introduction

InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image

Our new Re:InterHand dataset has been released, which has much more diverse image appearances with more stable 3D GT. Check it out at here!

Introduction

Above demo videos have low-quality frames because of the compression for the README upload.

News

  • 2021.06.10. Boxs in RootNet results are updated to be correct.
  • 2021.03.22. Finally, InterHand2.6M v1.0, which includes all images of 5 fps and 30 fps version, is released! πŸŽ‰ This is the dataset used in InterHand2.6M paper.
  • 2020.11.26. Demo code for a random image is added! Checkout below instructions.
  • 2020.11.26. Fitted MANO parameters are updated to the better ones (fitting error is about 5 mm). Also, reduced to much smaller file size by providing parameters fitted to the world coordinates (independent on the camera view).
  • 2020.10.7. Fitted MANO parameters are available! They are obtained by NeuralAnnot.

InterHand2.6M dataset

  • For the InterHand2.6M dataset download and instructions, go to [HOMEPAGE].
  • Belows are instructions for our baseline model, InterNet, for 3D interacting hand pose estimation from a single RGB image.

Demo on a random image

  1. Download pre-trained InterNet from here
  2. Put the model at demo folder
  3. Go to demo folder and edit bbox in here
  4. run python demo.py --gpu 0 --test_epoch 20
  5. You can see result_2D.jpg and 3D viewer.

MANO mesh rendering demo

  1. Install SMPLX
  2. cd tool/MANO_render
  3. Set smplx_path in render.py
  4. Run python render.py

MANO parameter conversion from the world coordinate to the camera coordinate system

  1. Install SMPLX
  2. cd tool/MANO_world_to_camera/
  3. Set smplx_path in convert.py
  4. Run python convert.py

Camera positions visualization demo

  1. cd tool/camera_visualize
  2. Run python camera_visualize.py
  • As there are many cameras, you'd better set subset and split in line 9 and 10, respectively, by yourself.

Directory

Root

The ${ROOT} is described as below.

${ROOT}
|-- data
|-- common
|-- main
|-- output
  • data contains data loading codes and soft links to images and annotations directories.
  • common contains kernel codes for 3D interacting hand pose estimation.
  • main contains high-level codes for training or testing the network.
  • output contains log, trained models, visualized outputs, and test result.

Data

You need to follow directory structure of the data as below.

${ROOT}
|-- data
|   |-- STB
|   |   |-- data
|   |   |-- rootnet_output
|   |   |   |-- rootnet_stb_output.json
|   |-- RHD
|   |   |-- data
|   |   |-- rootnet_output
|   |   |   |-- rootnet_rhd_output.json
|   |-- InterHand2.6M
|   |   |-- annotations
|   |   |   |-- train
|   |   |   |-- test
|   |   |   |-- val
|   |   |-- images
|   |   |   |-- train
|   |   |   |-- test
|   |   |   |-- val
|   |   |-- rootnet_output
|   |   |   |-- rootnet_interhand2.6m_output_test.json
|   |   |   |-- rootnet_interhand2.6m_output_test_30fps.json
|   |   |   |-- rootnet_interhand2.6m_output_val.json
|   |   |   |-- rootnet_interhand2.6m_output_val_30fps.json

Output

You need to follow the directory structure of the output folder as below.

${ROOT}
|-- output
|   |-- log
|   |-- model_dump
|   |-- result
|   |-- vis
  • log folder contains training log file.
  • model_dump folder contains saved checkpoints for each epoch.
  • result folder contains final estimation files generated in the testing stage.
  • vis folder contains visualized results.

Running InterNet

Start

  • In the main/config.py, you can change settings of the model including dataset to use and which root joint translation vector to use (from gt or from RootNet).

Train

In the main folder, run

python train.py --gpu 0-3

to train the network on the GPU 0,1,2,3. --gpu 0,1,2,3 can be used instead of --gpu 0-3. If you want to continue experiment, run use --continue.

Test

Place trained model at the output/model_dump/.

In the main folder, run

python test.py --gpu 0-3 --test_epoch 20 --test_set $DB_SPLIT

to test the network on the GPU 0,1,2,3 with snapshot_20.pth.tar. --gpu 0,1,2,3 can be used instead of --gpu 0-3.

$DB_SPLIT is one of [val,test].

  • val: The validation set. Val in the paper.
  • test: The test set. Test in the paper.

Results

Here I provide the performance and pre-trained snapshots of InterNet, and output of the RootNet as well.

Pre-trained InterNet

RootNet output

RootNet codes

Reference

@InProceedings{Moon_2020_ECCV_InterHand2.6M,  
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},  
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},  
booktitle = {European Conference on Computer Vision (ECCV)},  
year = {2020}  
}  

License

InterHand2.6M is CC-BY-NC 4.0 licensed, as found in the LICENSE file.

[Terms of Use] [Privacy Policy]

interhand2.6m's People

Contributors

endoplasmic1357 avatar mks0601 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

interhand2.6m's Issues

something wrong when read the dataset

Hi, thanks for your work and your dataset.When I use your demo I meet the error:

joint_world = np.array(joints[str(capture_id)][str(frame_idx)]['world_coord'], dtype=np.float32)
TypeError: list indices must be integers or slices, not str

I download the annotation file "InterHand2.6M.annotations.5.fps.zip" and the image files allready.Do you know how to solve it ? Is there any possible that I download the wrong files?

Results about non-interacting hands

I have a question that if both hands are on some distance or in other words we can say, there is no interaction in both hands then will this model handle this situation? As I checked that there is only one bbox for interacting hands. For this model is it necessary to keep both hands in interaction?

Question about joints order

@mks0601 what is the order of joints if we talk about only right hand?
Can you confirm that order of joint is same as below?

image

In above list where 00 is the joints closest to wrist and 03 is the finger tips

If order is not same as above list then what is the order of joints?

about the MANO parameters

Hi , I noticed that your update MANO parameters is fitted to the world coordinate.But I think fitting the camera coordinate as the most hand dataset do is better.
Can you tell me the benefit to fit the world coordinate ?

The problem happend, when I used render.py

Could u help me to address this problem.

Traceback (most recent call last):
  File "/home/adminroot/myproject/py_3_6_benchmark/lib/python3.6/site-packages/pyrender/platforms/pyglet_platform.py", line 39, in init_context
    width=1, height=1)
  File "/home/adminroot/myproject/py_3_6_benchmark/lib/python3.6/site-packages/pyglet/window/xlib/__init__.py", line 173, in __init__
    super(XlibWindow, self).__init__(*args, **kwargs)
  File "/home/adminroot/myproject/py_3_6_benchmark/lib/python3.6/site-packages/pyglet/window/__init__.py", line 603, in __init__
    config = screen.get_best_config(config)
  File "/home/adminroot/myproject/py_3_6_benchmark/lib/python3.6/site-packages/pyglet/canvas/base.py", line 194, in get_best_config
    raise window.NoSuchConfigException()
pyglet.window.NoSuchConfigException

Mapping INTERHand joint to MANO joint order

Hi, this is really nice work.
I Have a question about this dataset. Based on my understanding, INTERHand26m joint order is different from MANO joint order.
Do you have a plan to map the joint to the MANO version?
Thank you!

Best regards,

Joshua

Overlapping train and test set

Thanks for sharing the dataset and the code. I am trying to figure out how the train, val and test sets are split. Therefore, I ran the following code and found that some images in train, val and test sets overlaps. I am wondering if I did something wrong? Or it is a mistake in the provided json files.

import pickle as pkl
import json

with open('./data/InterHand2.6M/annotations/all/InterHand2.6M_train_data.json') as f:
    data = json.load(f)
    
sequences = []
for idx in range(len(data['images'])):
    seq_name = data['images'][idx]['file_name']
    sequences.append(seq_name)
sequences_train = set(sequences)


with open('./data/InterHand2.6M/annotations/machine_annot/InterHand2.6M_val_data.json') as f:
    data = json.load(f)
    
sequences = []
for idx in range(len(data['images'])):
    seq_name = data['images'][idx]['file_name']
    sequences.append(seq_name)    
sequences_val = set(sequences)

with open('./data/InterHand2.6M/annotations/all/InterHand2.6M_test_data.json') as f:
    data = json.load(f)
    
sequences = []
for idx in range(len(data['images'])):
    seq_name = data['images'][idx]['file_name']
    sequences.append(seq_name)
    
sequences_test = set(sequences)

Results

The output for sequences_train.intersection(sequences_val) is set().

The output for len(sequences_test.intersection(sequences_val)) is 13507

The output for list(sequences_test.intersection(sequences_val))[:10] is:

['Capture0/ROM02_Interaction_2_Hand/cam400428/image22223.jpg',
 'Capture0/ROM03_LT_No_Occlusion/cam400364/image16814.jpg',
 'Capture0/ROM03_LT_No_Occlusion/cam400269/image16454.jpg',
 'Capture0/ROM03_LT_No_Occlusion/cam400372/image17444.jpg',
 'Capture0/ROM03_LT_No_Occlusion/cam400299/image16604.jpg',
 'Capture0/ROM03_LT_No_Occlusion/cam400500/image16772.jpg',
 'Capture0/ROM03_LT_No_Occlusion/cam400281/image16550.jpg',
 'Capture0/ROM03_LT_No_Occlusion/cam400375/image17372.jpg',
 'Capture0/ROM03_LT_No_Occlusion/cam400364/image17348.jpg',
 'Capture0/ROM03_LT_No_Occlusion/cam400367/image16472.jpg']

The output for sequences_train.intersection(sequences_test) is:

{'Capture5/0007_thumbup_normal/cam400006/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam400010/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam400042/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam400053/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam400067/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam410003/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam410067/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam410209/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam410210/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam410218/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam410236/image2564.jpg',
 'Capture5/0007_thumbup_normal/cam410238/image2564.jpg',
 'Capture7/0001_neutral_rigid/cam400006/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam400008/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam400012/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam400013/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam400016/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam400017/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam400041/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam400053/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam400059/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam400067/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410001/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410003/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410004/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410007/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410028/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410053/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410062/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410063/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410208/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410209/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410210/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410213/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410218/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410219/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410233/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410236/image0372.jpg',
 'Capture7/0001_neutral_rigid/cam410238/image0372.jpg'}

When can we expect the code for using model on our own images?

Hi,
Congratulation for such great work. I am building an application where I need a robust hand pose estimation model like yours. I tried to figure out the code to use own images myself but couldn't achieve it. Parameters like focal, principal points, abs depths are confusing me. So, Can you give me some directions on this and share any potential dates for the release of code to use your own images. Thank you.

Cannot reproduce paper numbers following instructions

Hi, I followed the instructions according to the repo to train and reproduce the InterHand performance. While the reported numbers for interacting hand pose validation error is 18.58mm (Table 4), my reproduced number is 20mm. Do you know why there is a discrepancy? I didn't modify anything from the repo to train this.

I saw a bug report earlier that the image sizes were swapped. Would that be the reason? Thanks.

Here are the command I used for training and validation (I guess I should use epoch 19 for testing as the number starts from 0):

python train.py --gpu 2 --annot_subset all
python test.py --gpu 0 --test_epoch 19 --test_set val --annot_subset machine_annot

I use the default config.py but do a batch size of 32 with accumulate gradient of 2, which should be equivalent to batch size 64.

Evaluation start...
Handedness accuracy: 0.9831676136363636
MRRPE: 35.12182411345029

MPJPE for each joint: 
r_thumb4: 21.78, r_thumb3: 16.52, r_thumb2: 12.93, r_thumb1: 8.33, r_index4: 25.84, r_index3: 21.78, r_index2: 18.68, r_index1: 14.23, r_middle4: 25.96, r_middle3: 22.06, r_middle2: 19.23, r_middle1: 14.23, r_ring4: 24.27, r_ring3: 20.41, r_ring2: 17.57, r_ring1: 13.15, r_pinky4: 22.74, r_pinky3: 19.37, r_pinky2: 17.12, r_pinky1: 12.72, r_wrist: 0.00, l_thumb4: 22.68, l_thumb3: 17.41, l_thumb2: 13.32, l_thumb1: 8.35, l_index4: 25.08, l_index3: 20.58, l_index2: 17.80, l_index1: 13.81, l_middle4: 25.59, l_middle3: 21.60, l_middle2: 18.89, l_middle1: 14.08, l_ring4: 23.69, l_ring3: 19.87, l_ring2: 17.31, l_ring1: 13.56, l_pinky4: 23.56, l_pinky3: 20.01, l_pinky2: 17.20, l_pinky1: 13.07, l_wrist: 0.00, 
MPJPE for all hand sequences: 17.53

MPJPE for each joint: 
r_thumb4: 17.88, r_thumb3: 13.76, r_thumb2: 10.29, r_thumb1: 6.97, r_index4: 19.81, r_index3: 17.42, r_index2: 15.27, r_index1: 12.04, r_middle4: 22.19, r_middle3: 19.61, r_middle2: 16.82, r_middle1: 11.89, r_ring4: 21.16, r_ring3: 18.37, r_ring2: 15.21, r_ring1: 10.80, r_pinky4: 19.80, r_pinky3: 17.02, r_pinky2: 14.57, r_pinky1: 9.99, r_wrist: 0.00, l_thumb4: 19.56, l_thumb3: 15.51, l_thumb2: 11.46, l_thumb1: 7.24, l_index4: 19.82, l_index3: 16.34, l_index2: 14.42, l_index1: 11.36, l_middle4: 21.45, l_middle3: 18.39, l_middle2: 15.78, l_middle1: 11.76, l_ring4: 20.48, l_ring3: 17.29, l_ring2: 14.55, l_ring1: 11.26, l_pinky4: 20.50, l_pinky3: 17.41, l_pinky2: 14.73, l_pinky1: 10.81, l_wrist: 0.00, 
MPJPE for single hand sequences: 14.79

MPJPE for each joint: 
r_thumb4: 25.54, r_thumb3: 19.17, r_thumb2: 15.46, r_thumb1: 10.48, r_index4: 32.22, r_index3: 26.37, r_index2: 22.16, r_index1: 16.37, r_middle4: 30.13, r_middle3: 24.67, r_middle2: 21.72, r_middle1: 16.50, r_ring4: 27.56, r_ring3: 22.58, r_ring2: 19.97, r_ring1: 15.41, r_pinky4: 25.83, r_pinky3: 21.77, r_pinky2: 19.64, r_pinky1: 15.33, r_wrist: 0.00, l_thumb4: 26.10, l_thumb3: 19.45, l_thumb2: 15.32, l_thumb1: 10.11, l_index4: 31.30, l_index3: 25.39, l_index2: 21.47, l_index1: 16.41, l_middle4: 31.19, l_middle3: 25.41, l_middle2: 22.33, l_middle1: 16.54, l_ring4: 28.02, l_ring3: 22.87, l_ring2: 20.31, l_ring1: 16.01, l_pinky4: 27.30, l_pinky3: 22.88, l_pinky2: 19.85, l_pinky1: 15.46, l_wrist: 0.00, 
MPJPE for interacting hand sequences: 20.54

Visualize

I run Vis.py but there is no output i get, how can I Visualize same as demo video?

the prediction is not good

Hi, I run the testing script to test the interHand2.6M's test-set. but the prediction of hand keypoints is very bad.
the testing command is follow as python3.5 test.py --gpu 0 --test_epoch 20 --test_set test --annot_subset all
Is there anything I might have done wrong?

questions about root depth

It seems that this project uses the absolute depth obtained from rootnet for all datasets in the evaluation procedure,so how do you train the rootnet for the hand project. As I know,rootnet is desinged for human pose. Can you release the training code for the rootnet in hand pose? thanks.

download/verify scripts not working for 5fps image download

Had to modify both to work in case this helps someone else

download 5fps

import os
url = 'https://fb-baas-f32eacb9-8abb-11eb-b2b8-4857dd089e15.s3.amazonaws.com/InterHand2.6M/InterHand2.6M.images.5.fps.v1.0/'
for part1 in ('a', 'b'):
    for part2 in ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'):
        if part1 == 'b' and part2 == 's': break
        tar_file = f'InterHand2.6M.images.5.fps.v1.0.tar.part{part1}{part2}'
        # check if tarfile is already downloaded, if not download
        if not os.path.exists(tar_file): os.system(f'wget {url}{tar_file}')
    

os.system(f'wget {url}InterHand2.6M.images.5.fps.v1.0.tar.CHECKSUM')
os.system(f'wget {url}unzip.sh')
os.system(f'wget {url}verify_download.py')

only had to change a single line on the verify_download.py

for line in tqdm(checksums):
    md5sum, filename = line.split()
    _, filename = filename.split('/')

this is because the MD5 file has an extra InterHand2.6M_5fps_batch1_splits/ in it

PyTorch -> ONNX -> Unity -" Only tensors of rank 4 or less are supported, but got rank 5"

I'm trying to test this model in Unity Inference Engine. To do this I had to export it as ONNX format. I managed to export it as ONNX format but once I imported into Unity I got this error:

Asset import failed, "Assets/models/interhand.onnx" > OnnxImportException: Unexpected error while parsing layer 561 of type Reshape.
Only tensors of rank 4 or less are supported, but got rank 5

Json: { "input": [ "559", "560" ], "output": [ "561" ], "name": "Reshape_131", "opType": "Reshape" }
  at Unity.Barracuda.ONNXLayout.AxisPermutationsForMappingONNXLayoutToBarracuda (System.Int32 onnxRank, System.String onnxLayout) [0x003ef] in /Users/nautilus/Pose-Demo/Library/PackageCache/[email protected]/Barracuda/Editor/ONNXLayout.cs:152 
  at Unity.Barracuda.ONNXLayout.PermuteToBarracuda (System.Int64[] shape, System.String onnxLayout) [0x00003] in /Users/nautilus/Pose-Demo/Library/PackageCache/[email protected]/Barracuda/Editor/ONNXLayout.cs:158 
  at Unity.Barracuda.ONNXLayout.ConvertSymbolicShapeToBarracuda (System.Int64[] onnxShape, System.String onnxLayout) [0x00000] in /Users/nautilus/Pose-Demo/Library/PackageCache/[email protected]/Barracuda/Editor/ONNXLayout.cs:223 
  at Unity.Barracuda.ONNXModelImporter.<.ctor>b__14_1 (Unity.Barracuda.ModelBuilder net, Unity.Barracuda.ONNXNodeWrapper node) [0x000b9] in /Users/nautilus/Pose-Demo/Library/PackageCache/[email protected]/Barracuda/Editor/ONNXModelImporter.cs:105 
  at Unity.Barracuda.ONNXModelImporter.ConvertOnnxModel (Onnx.ModelProto onnxModel) [0x0032f] in /Users/nautilus/Pose-Demo/Library/PackageCache/[email protected]/Barracuda/Editor/ONNXModelImporter.cs:1088 

We are talking about these 2 Reshapes

Screenshot 2020-11-12 at 13 07 37

Any idea how can I do a workaround to those reshapes to use tensor rank 4 ?

The ONNX model can be downloaded from here: https://github.com/nauutilus/InterHand2.6M/releases/download/0.0.1/interhand.onnx

Why multiply the heatmap by 255?

When I trace the training code in model.py , I saw that "heatmap" is multiplied by 255 on the 38 line.
Could this value be an arbitrary scale value, or have other physical meanings?

Thanks in advance.

Question about the meaning of camera name

Hi, I find that your ECCV paper reports the result of the model which is trained only under four different views.
I would like to know the camera id of these different views.
By the way, I also would like to know the meaning of these ids, for example, which id means the front view?
Thanks,
Jingbo

Issue using transform.py function to convert world coordinates to pixel coordinates

Hello,

I am using the utility functions world2camera and camera2pixel to compute the image plane coordinates for the joints locations in the 2D images hand images. I load in the camera parameters R,t just as you do in the render.py code. However when I piece everything together, the joint locations do not match with those in the images (see example below). Any help is appreciated. How are the R and t quantities used to convert from world to camera coordinates?

'def cam2pixel(cam_coord, f, c):
x = cam_coord[:, 0] / (cam_coord[:, 2] + 1e-8) * f[0] + c[0]
y = cam_coord[:, 1] / (cam_coord[:, 2] + 1e-8) * f[1] + c[1]
z = cam_coord[:, 2]
img_coord = np.concatenate((x[:,None], y[:,None], z[:,None]),1)
return img_coord

def world2cam(world_coord, R, T):
cam_coord = np.dot(world_coord-T,R)
cam_coord.transpose()
return cam_coord

def world2image(joints,cam_params, capture_id, frame_idx, cam, hand_type):
# camera extrinsic parameters (t is the translation vector, R is the rotation matrix)
t, R = np.array(cam_params[str(capture_id)]['campos'][str(cam)], dtype=np.float32).reshape(3), np.array(cam_params[str(capture_id)]['camrot'][str(cam)], dtype=np.float32).reshape(3,3)
t = -np.dot(R,t.reshape(3,1)).reshape(3) # -Rt -> t
focal=cam_params[str(capture_id)]['focal'][str(cam)]
princpt=cam_params[str(capture_id)]['princpt'][str(cam)]

# Transform to camera coordinates
cam_coord=world2cam( joints[str(capture_id)][str(frame_idx)]['world_coord'],R,t)

#Transform to pixel/image coordinates
image_coord=cam2pixel(cam_coord,focal,princpt)

#Split into 2 subarray. One for right hand. One for left hand.
image_coord_right=image_coord[np.arange(0,21),:]
image_coord_left=image_coord[np.arange(21,21*2),:]

#Fill in zeros if one hand does not appear in frame
if hand_type == 'right':
    image_coord_left=np.zeros(np.shape(image_coord_left))
elif hand_type == 'left':
  image_coord_right=np.zeros(np.shape(image_coord_right))
    
return [image_coord_right,image_coord_left]

'
PerspectiveProjectionTest

Question about InterNet compared with other work

Hi,

I read the paper (ECCV), and I found that you compared your InterNet with other state of art work in Table 5.

So, I would like to know how can I get the EPE of the model on dataset STB and RHP?

By the way, the model you used to get EPE on STB or RHP is trained by InterHand dataset or the corresponding dataset? (I mean the corresponding is that if we want to get EPE on STB data, we need to train the model on STB dataset)

Error in extracting files from the InterHand2.6M.tar

Hello! Thank you for releasing the great hand dataset. I downloaded all parts of the InterHand2.6M dataset. When I used the command:
cat InterHand2.6M.images.5.fps.v0.0.tar.parta* | tar -xvf - -i
error occured:

tar: Skipping to next header
tar: Skipping to next header
tar: Skipping to next header
tar: Exiting with failure status due to previous errors

Is there anything wrong in the tar files?

No module named 'nets'

Hello i'm facing this error any help ?

File "/Users/Compu/InterHand2.6M/main/model.py", line 11, in
from nets.module import BackboneNet, PoseNet
ModuleNotFoundError: No module named 'nets'

test per batch

hey!

great work

Google Colab (GPU enabled) doesn't have enough space to download all of the images (40GB). Yet it’s tempting to try the whole thing. Can you possibly provide which of tar parts consist test split?

What was the fitting procedure for MANO parameters?

Would love to get a bit more info as to how you fit the MANO hand model to the hand. Was it a similar process to what was done in the FreiHand dataset (multiview loss function for 2d/3d/segmentation) or something different like the SPIN algorithm for full-body mesh estimation? I didn't see anything about a fitting procedure in the original paper. Thanks!

annotations for 30 fps

Hi @mks0601 ,

Thanks for your great work,
Do all images of 30 fps version have annotations available? or some of them have annotations?

STB & RHD

hey! finally, clear hands dataset (v00) and impressive baseline. great work! looking forward to get full version.

what do STB & RHD stand for?

Question about using my own image to test

1.Hello,first I have question about the result of using my own image to test,the result seems not really good.Especially the Little finger part,it didn't catch well as the image I show.However,while I'm using your InterHand2.6M datasets,the result seems really good,so I'm confuse about is the model's problem or not.
mine
mine2
mine3
2.Second,I want to ask how can I use particular folder include my own image to test.Because now I need to follow the sequence of annotation json file.Thanks!

Mutli-view data.

Hi,
Thanks for your great work and provide us this wonderful dataset !
I have a question about multi-data, can I determine the sample pose from the name of image.
For example, can I presume the following two images to have the same pose, since they only differ in camera_id.
Capture1/0287_pointingtowardsfeatures/cam410210/image67650.jpg and
Capture1/0287_pointingtowardsfeatures/cam410220/image67650.jpg

can not

Hi, thanks for such great work!

I fail to download this dataset, as the error like "Time limit of download is exceeded!" occured.
Do u know ehat happened? Is there any solution?

all best,
Hao Meng

rootnet_output

Hi:
I can't find this two file rootnet_interhand2.6m_output_machine_annot_val.json and rootnet_interhand2.6m_output_all_test.json in the dataset,please tell me the files where I can find?

train resnet18 with InterHand error

I run test.py as readme successfully, but there are some errors when I try to train a resnet18 model.

I have already changed the resnet_type in main/config.py, I think there must be some configs still needs to be modified, but I couldn't find them. Can you help?

$ python train.py --gpu 0-3 --annot_subset human_annot                              
>>> Using GPU: 0,1,2,3
04-30 03:40:55 Creating train dataset...
Load annotation from  ../data/InterHand2.6M/annotations/human_annot
loading annotations into memory...
Done (t=10.21s)
creating index...
index created!
Get bbox and root depth from groundtruth annotation
Number of annotations in single hand sequences: 76445
Number of annotations in interacting hand sequences: 208271
04-30 03:42:11 Creating graph and optimizer...
Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to /root/.cache/torch/hub/checkpoints/resnet18-5c106cde.pth
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 44.7M/44.7M [00:00<00:00, 107MB/s]
Initialize resnet from model zoo
Traceback (most recent call last):
  File "train.py", line 90, in <module>
    main()
  File "train.py", line 60, in main
    loss = trainer.model(inputs, targets, meta_info, 'train')
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 161, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 171, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 428, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/cephfs2/peichao/code/InterHand2.6M/main/model.py", line 45, in forward
    joint_heatmap_out, rel_root_depth_out, hand_type = self.pose_net(img_feat)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/cephfs2/peichao/code/InterHand2.6M/main/../common/nets/module.py", line 48, in forward
    joint_img_feat_1 = self.joint_deconv_1(img_feat)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 929, in forward
    output_padding, self.groups, self.dilation)
RuntimeError: Given transposed=1, weight of size [2048, 256, 4, 4], expected input[16, 512, 8, 8] to have 2048 channels, but got 512 channels instead

ImportError: Cannot load backend 'TkAgg' which requires the 'tk' interactive framework, as 'headless' is currently running

Hi,

I am trying to run the demo.py on the sample image in demo folder.
Could you please help me to address this problem?

File "demo.py", line 25, in
from utils.vis import vis_keypoints, vis_3d_keypoints
File "/InterHand2.6M-master/main/../common/utils/vis.py", line 14, in
import matplotlib.pyplot as plt
File "/anaconda3/envs/interhand/lib/python3.6/site-packages/matplotlib/pyplot.py", line 2336, in
switch_backend(rcParams["backend"])
File "/anaconda3/envs/interhand/lib/python3.6/site-packages/matplotlib/pyplot.py", line 287, in switch_backend
newbackend, required_framework, current_framework))
ImportError: Cannot load backend 'TkAgg' which requires the 'tk' interactive framework, as 'headless' is currently running

I have already tried to change 'tkagg' to 'TKAgg' in vis.py as below:

import matplotlib
matplotlib.use('TKAgg')

Then:
import matplotlib.pyplot as plt

Thanks in advance!

Bad results in wild images.

I test the networks in default input image, and I got good result as descributions. But when I test hand images from the internet, I got bad results.

nearby perspective

Hi,
Thanks for your great work and provide us this wonderful dataset !
How to find the data of the nearby perspective?

How to transfer MANO parameters from world coordinate system to camera coordinate system

I need to use the MANO parameters in the camera coordinate system, and try to use the camera extrinsics to convert it to the camera coordinate system. But it failed, and the fitting error reached tens of millimeters.
Part of the code is as follows:

        mano_pose = np.array(mano_param['pose']).reshape(-1,3)
        # mano_pose = np.dot(R, mano_pose.transpose(1,0)).transpose(1,0) + t.reshape(1,3)/1000  # (16,3)
        mano_pose = torch.FloatTensor(mano_pose)
        root_pose = mano_pose[0].view(1, 3)
        root_pose = np.dot(R, root_pose.transpose(1, 0)).transpose(1, 0) + t.reshape(1, 3) / 1000
        root_pose = torch.FloatTensor(root_pose)
        hand_pose = mano_pose[1:, :].contiguous().view(1, -1)
        shape = torch.FloatTensor(mano_param['shape']).view(1, -1)
        trans = np.array(mano_param['trans']).reshape(-1,3)
        trans = np.dot(R, trans.transpose(1,0)).transpose(1,0) + t.reshape(1,3)/1000
        trans = torch.FloatTensor(trans).view(1, -1)
        output = mano_layer[hand_type](global_orient=root_pose, hand_pose=hand_pose, betas=shape, transl=trans)
        mesh = output.vertices[0].numpy() * 1000
        fit_err = get_fitting_error(mesh, ih26m_joint_regressor, cam_params, joints, hand_type, capture_idx,frame_idx, cam_idx)
        print('Fitting error: ' + str(fit_err) + ' mm')

Missing Validation Human Annotation

I don't see the following:

human_annot/InterHand2.6M_val_camera.json  
human_annot/InterHand2.6M_val_data.json  
human_annot/InterHand2.6M_val_joint_3d.json  

Are they provided?

The current archive has:

Archive:  InterHand2.6M.annotations.5.fps.zip
   creating: all/
  inflating: all/InterHand2.6M_test_camera.json  
  inflating: all/InterHand2.6M_test_data.json  
  inflating: all/InterHand2.6M_test_joint_3d.json  
  inflating: all/InterHand2.6M_train_camera.json  
  inflating: all/InterHand2.6M_train_data.json  
  inflating: all/InterHand2.6M_train_joint_3d.json  
   creating: human_annot/
  inflating: human_annot/InterHand2.6M_test_camera.json  
  inflating: human_annot/InterHand2.6M_test_data.json  
  inflating: human_annot/InterHand2.6M_test_joint_3d.json  
  inflating: human_annot/InterHand2.6M_train_camera.json  
  inflating: human_annot/InterHand2.6M_train_data.json  
  inflating: human_annot/InterHand2.6M_train_joint_3d.json  
   creating: machine_annot/
  inflating: machine_annot/InterHand2.6M_test_camera.json  
  inflating: machine_annot/InterHand2.6M_test_data.json  
  inflating: machine_annot/InterHand2.6M_test_joint_3d.json  
  inflating: machine_annot/InterHand2.6M_train_camera.json  
  inflating: machine_annot/InterHand2.6M_train_data.json  
  inflating: machine_annot/InterHand2.6M_train_joint_3d.json  
  inflating: machine_annot/InterHand2.6M_val_camera.json  
  inflating: machine_annot/InterHand2.6M_val_data.json  
  inflating: machine_annot/InterHand2.6M_val_joint_3d.json  
  inflating: skeleton.txt            
  inflating: subject.txt 

Can you keep the results from v0.0?

Thanks for releasing the full dataset. I noticed that the results and the model from v0.0 dataset are no longer there. Is it possible to include those in the current README again? Because some of us might need to do some experiments during ICCV rebuttal.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.