Giter Site home page Giter Site logo

nvlabs / dex-ycb-toolkit Goto Github PK

View Code? Open in Web Editor NEW
144.0 12.0 23.0 2.62 MB

A Python package that provides evaluation and visualization tools for the DexYCB dataset

Home Page: https://dex-ycb.github.io

License: GNU General Public License v3.0

Python 98.67% Shell 1.33%

dex-ycb-toolkit's Introduction

DexYCB Toolkit

DexYCB Toolkit is a Python package that provides evaluation and visualization tools for the DexYCB dataset. The dataset and results were initially described in a CVPR 2021 paper:

DexYCB: A Benchmark for Capturing Hand Grasping of Objects
Yu-Wei Chao, Wei Yang, Yu Xiang, Pavlo Molchanov, Ankur Handa, Jonathan Tremblay, Yashraj S. Narang, Karl Van Wyk, Umar Iqbal, Stan Birchfield, Jan Kautz, Dieter Fox
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
[ paper ] [ supplementary ] [ video ] [ arXiv ] [ studio CAD model ] [ studio hardware ] [ RealSense calibration & recording guide ] [ project site ]

Citing DexYCB Toolkit

Please cite DexYCB Toolkit if it helps your research:

@INPROCEEDINGS{chao:cvpr2021,
  author    = {Yu-Wei Chao and Wei Yang and Yu Xiang and Pavlo Molchanov and Ankur Handa and Jonathan Tremblay and Yashraj S. Narang and Karl {Van Wyk} and Umar Iqbal and Stan Birchfield and Jan Kautz and Dieter Fox},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  title     = {{DexYCB}: A Benchmark for Capturing Hand Grasping of Objects},
  year      = {2021},
}

License

DexYCB Toolkit is released under the GNU General Public License v3.0.

Contents

  1. Prerequisites
  2. Installation
  3. Loading Dataset and Visualizing Samples
  4. Evaluation
    1. COCO Evaluation
    2. BOP Evaluation
    3. HPE Evaluation
    4. Grasp Evaluation
  5. Reproducing CVPR 2021 Results
  6. Visualizing Sequences
    1. Interactive 3D viewer
    2. Offline Renderer

Prerequisites

This code is tested with Python 3.7 on Linux.

Installation

For good practice for Python package management, it is recommended to use virtual environments (e.g., virtualenv or conda) to ensure packages from different projects do not interfere with each other.

  1. Clone the repo with --recursive and cd into it:

    git clone --recursive [email protected]:NVlabs/dex-ycb-toolkit.git
    cd dex-ycb-toolkit
  2. Install the dex-ycb-toolkit package and dependencies:

    # Install dex-ycb-toolkit
    pip install -e .
    
    # Install bop_toolkit dependencies
    cd bop_toolkit
    pip install -r requirements.txt
    cd ..
    
    # Install manopth
    cd manopth
    pip install -e .
    cd ..
  3. Download the DexYCB dataset from the project site.

  4. Set the environment variable for dataset path:

    export DEX_YCB_DIR=/path/to/dex-ycb

    $DEX_YCB_DIR should be a folder with the following structure:

    ├── 20200709-subject-01/
    ├── 20200813-subject-02/
    ├── ...
    ├── calibration/
    └── models/
  5. Download MANO models and code (mano_v1_2.zip) from the MANO website and place the file under manopath. Unzip the file and create symlink:

    cd manopth
    unzip mano_v1_2.zip
    cd mano
    ln -s ../mano_v1_2/models models
    cd ../..

Loading Dataset and Visualizing Samples

  1. The example below shows how to create a DexYCB dataset given setup (e.g., s0) and split name (e.g., train). Once created, you can use the dataset to fetch image samples.

    python examples/create_dataset.py
    You should see the following output (click to expand):
    Dataset name: s0_train
    Dataset size: 465504
    1000th sample:
    {
        "color_file": "/datasets/dex-ycb-20201205/20200709-subject-01/20200709_141841/932122060861/color_000053.jpg",
        "depth_file": "/datasets/dex-ycb-20201205/20200709-subject-01/20200709_141841/932122060861/aligned_depth_to_color_000053.png",
        "label_file": "/datasets/dex-ycb-20201205/20200709-subject-01/20200709_141841/932122060861/labels_000053.npz",
        "intrinsics": {
            "fx": 613.0762329101562,
            "fy": 611.9989624023438,
            "ppx": 313.0279846191406,
            "ppy": 245.00865173339844
        },
        "ycb_ids": [
            1,
            11,
            12,
            20
        ],
        "ycb_grasp_ind": 0,
        "mano_side": "right",
        "mano_betas": [
            0.6993994116783142,
            -0.16909725964069366,
            -0.8955091834068298,
            -0.09764610230922699,
            0.07754238694906235,
            0.336286723613739,
            -0.05547792464494705,
            0.5248727798461914,
            -0.38668063282966614,
            -0.00133091164752841
        ]
    }
    .
    .
    .
    

    Each sample includes the paths to the color and depth image, path to the label file, camera intrinsics, presented YCB objects' ID, index of the object being grasped, whether right or left hand, and the hand's MANO shape parameter.

    Each label file contains the following annotations packed in a dictionary:

    • seg: A unit8 numpy array of shape [H, W] containing the segmentation map. The label of each pixel can be 0 (background), 1-21 (YCB object), or 255 (hand).
    • pose_y: A float32 numpy array of shape [num_obj, 3, 4] holding the 6D pose of each object. Each 6D pose is represented by [R; t], where R is the 3x3 rotation matrix and t is the 3x1 translation.
    • pose_m: A float32 numpy array of shape [1, 51] holding the pose of the hand. pose_m[:, 0:48] stores the MANO pose coefficients in PCA representation, and pose_m[0, 48:51] stores the translation. If the image does not have a visible hand or the annotation does not exist, pose_m will be all 0.
    • joint_3d: A float32 numpy array of shape [1, 21, 3] holding the 3D joint position of the hand in the camera coordinates. The joint order is specified here. If the image does not have a visible hand or the annotation does not exist, joint_3d will be all -1.
    • joint_2d: A float32 numpy array of shape [1, 21, 2] holding the 2D joint position of the hand in the image space. The joint order follows joint_3d. If the image does not have a visible hand or the annotation does not exist, joint_2d will be all -1.
  2. The example below shows how to visualize ground-truth object and hand pose of one image sample.

    python examples/visualize_pose.py

Evaluation

DexYCB provides a benchmark to evaluate four tasks: (1) 2D object and keypoint detection (COCO), (2) 6D object pose estimation (BOP), (3) 3D hand pose estimation (HPE), and (4) safe human-to-robot object handover (Grasp).

Below we provide instructions and examples to run these evaluations. To run the examples, you need to first download the example results.

./results/fetch_example_results.sh

COCO Evaluation

  • The example below shows how to run COCO evaluation using an example result file. The output will be logged to results/coco_eval_s0_test_example_results_coco_s0_test.log.

    python examples/evaluate_coco.py
    You should see the following output (click to expand):
    Evaluation results for *bbox*:
    |  AP   |  AP50  |  AP75  |  APs  |  APm  |  APl  |
    |:-----:|:------:|:------:|:-----:|:-----:|:-----:|
    | 2.970 | 2.970  | 2.970  | 3.065 | 3.017 | 2.723 |
    Per-category *bbox* AP:
    | category            | AP    | category              | AP    | category            | AP    |
    |:--------------------|:------|:----------------------|:------|:--------------------|:------|
    | 002_master_chef_can | 2.970 | 003_cracker_box       | 2.970 | 004_sugar_box       | 2.970 |
    | 005_tomato_soup_can | 2.970 | 006_mustard_bottle    | 2.970 | 007_tuna_fish_can   | 2.970 |
    | 008_pudding_box     | 2.970 | 009_gelatin_box       | 2.970 | 010_potted_meat_can | 2.970 |
    | 011_banana          | 2.970 | 019_pitcher_base      | 2.970 | 021_bleach_cleanser | 2.970 |
    | 024_bowl            | 2.970 | 025_mug               | 2.970 | 035_power_drill     | 2.970 |
    | 036_wood_block      | 2.970 | 037_scissors          | 2.970 | 040_large_marker    | 2.970 |
    | 051_large_clamp     | nan   | 052_extra_large_clamp | 2.970 | 061_foam_brick      | 2.970 |
    | hand                | 2.970 |                       |       |                     |       |
    Evaluation results for *segm*:
    |  AP   |  AP50  |  AP75  |  APs  |  APm  |  APl  |
    |:-----:|:------:|:------:|:-----:|:-----:|:-----:|
    | 2.970 | 2.970  | 2.970  | 3.065 | 3.017 | 2.723 |
    Per-category *segm* AP:
    | category            | AP    | category              | AP    | category            | AP    |
    |:--------------------|:------|:----------------------|:------|:--------------------|:------|
    | 002_master_chef_can | 2.970 | 003_cracker_box       | 2.970 | 004_sugar_box       | 2.970 |
    | 005_tomato_soup_can | 2.970 | 006_mustard_bottle    | 2.970 | 007_tuna_fish_can   | 2.970 |
    | 008_pudding_box     | 2.970 | 009_gelatin_box       | 2.970 | 010_potted_meat_can | 2.970 |
    | 011_banana          | 2.970 | 019_pitcher_base      | 2.970 | 021_bleach_cleanser | 2.970 |
    | 024_bowl            | 2.970 | 025_mug               | 2.970 | 035_power_drill     | 2.970 |
    | 036_wood_block      | 2.970 | 037_scissors          | 2.970 | 040_large_marker    | 2.970 |
    | 051_large_clamp     | nan   | 052_extra_large_clamp | 2.970 | 061_foam_brick      | 2.970 |
    | hand                | 2.970 |                       |       |                     |       |
    Evaluation results for *keypoints*:
    |  AP   |  AP50  |  AP75  |  APm  |  APl  |
    |:-----:|:------:|:------:|:-----:|:-----:|
    | 2.970 | 2.970  | 2.970  | 2.970 | 2.970 |
    Per-category *keypoints* AP:
    | category            | AP    | category              | AP   | category            | AP   |
    |:--------------------|:------|:----------------------|:-----|:--------------------|:-----|
    | 002_master_chef_can | nan   | 003_cracker_box       | nan  | 004_sugar_box       | nan  |
    | 005_tomato_soup_can | nan   | 006_mustard_bottle    | nan  | 007_tuna_fish_can   | nan  |
    | 008_pudding_box     | nan   | 009_gelatin_box       | nan  | 010_potted_meat_can | nan  |
    | 011_banana          | nan   | 019_pitcher_base      | nan  | 021_bleach_cleanser | nan  |
    | 024_bowl            | nan   | 025_mug               | nan  | 035_power_drill     | nan  |
    | 036_wood_block      | nan   | 037_scissors          | nan  | 040_large_marker    | nan  |
    | 051_large_clamp     | nan   | 052_extra_large_clamp | nan  | 061_foam_brick      | nan  |
    | hand                | 2.970 |                       |      |                     |      |
    Evaluation complete.
    
  • Results format: You should store the results in a .json file following the COCO results format. The results should be a list of dictionary items with the following key-value pairs:

    • image_id: Index of an image sample.
    • category_id: Object category ID. The value can be 0 (background), 1-21 (YCB object), or 22 (hand).
    • bbox: Bounding box in [x, y, width, height]. Required for the bbox task.
    • score: Detection score.
    • segmentation: Segmentation in RLE. Required for the segm task.
    • keypoints: Keypoints in [x1, y1, v1, ..., x21, y21, v21]. Required for the keypoints task.

    You can also look at the example result files in results/example_results_coco_*.json.

  • To evaluate on your own results, simply add --name to specify the setup and split and --res_file to specify the path to the result file. For example:

    python examples/evaluate_coco.py \
      --name s0_test \
      --res_file path/to/results.json

BOP Evaluation

  • The example below shows how to run BOP evaluation using an example result file. The output will be logged to results/bop_eval_s0_test_example_results_bop_s0_test.log.

    python examples/evaluate_bop.py
    You should see the following output (click to expand):
    Deriving results for *all*
    Evaluation results for *all*:
    |  vsd  |  mssd  |  mspd  |  mean  |
    |:-----:|:------:|:------:|:------:|
    | 0.129 | 0.123  | 0.177  | 0.143  |
    Per-object scores for *all*:
    | object                |   vsd |   mssd |   mspd |   mean |
    |:----------------------|------:|-------:|-------:|-------:|
    | 002_master_chef_can   | 0.768 |  0.768 |  0.768 |  0.768 |
    | 003_cracker_box       | 0.818 |  0.812 |  0.812 |  0.814 |
    | 004_sugar_box         | 0.000 |  0.000 |  0.000 |  0.000 |
    | 005_tomato_soup_can   | 0.000 |  0.000 |  0.000 |  0.000 |
    | 006_mustard_bottle    | 0.000 |  0.000 |  0.000 |  0.000 |
    | 007_tuna_fish_can     | 0.000 |  0.000 |  0.098 |  0.033 |
    | 008_pudding_box       | 0.113 |  0.000 |  0.098 |  0.071 |
    | 009_gelatin_box       | 0.000 |  0.000 |  0.022 |  0.007 |
    | 010_potted_meat_can   | 0.000 |  0.000 |  0.017 |  0.006 |
    | 011_banana            | 0.000 |  0.000 |  0.000 |  0.000 |
    | 019_pitcher_base      | 0.001 |  0.000 |  0.000 |  0.000 |
    | 021_bleach_cleanser   | 0.029 |  0.000 |  0.000 |  0.010 |
    | 024_bowl              | 0.000 |  0.000 |  0.015 |  0.005 |
    | 025_mug               | 0.870 |  0.870 |  0.933 |  0.891 |
    | 035_power_drill       | 0.000 |  0.000 |  0.004 |  0.001 |
    | 036_wood_block        | 0.000 |  0.000 |  0.000 |  0.000 |
    | 037_scissors          | 0.000 |  0.011 |  0.021 |  0.011 |
    | 040_large_marker      | 0.000 |  0.000 |  0.105 |  0.035 |
    | 052_extra_large_clamp | 0.000 |  0.000 |  0.205 |  0.068 |
    | 061_foam_brick        | 0.000 |  0.000 |  0.532 |  0.177 |
    Deriving results for *grasp only*
    Evaluation results for *grasp only*:
    |  vsd  |  mssd  |  mspd  |  mean  |
    |:-----:|:------:|:------:|:------:|
    | 0.160 | 0.160  | 0.268  | 0.196  |
    Per-object scores for *grasp only*:
    | object                |   vsd |   mssd |   mspd |   mean |
    |:----------------------|------:|-------:|-------:|-------:|
    | 002_master_chef_can   | 3.115 |  3.114 |  3.114 |  3.114 |
    | 003_cracker_box       | 0.024 |  0.000 |  0.000 |  0.008 |
    | 004_sugar_box         | 0.000 |  0.000 |  0.000 |  0.000 |
    | 005_tomato_soup_can   | 0.000 |  0.000 |  0.000 |  0.000 |
    | 006_mustard_bottle    | 0.000 |  0.000 |  0.000 |  0.000 |
    | 007_tuna_fish_can     | 0.000 |  0.000 |  0.028 |  0.009 |
    | 008_pudding_box       | 0.000 |  0.000 |  0.000 |  0.000 |
    | 009_gelatin_box       | 0.000 |  0.000 |  0.100 |  0.033 |
    | 010_potted_meat_can   | 0.000 |  0.000 |  0.072 |  0.024 |
    | 011_banana            | 0.000 |  0.000 |  0.000 |  0.000 |
    | 019_pitcher_base      | 0.003 |  0.000 |  0.000 |  0.001 |
    | 021_bleach_cleanser   | 0.000 |  0.000 |  0.000 |  0.000 |
    | 024_bowl              | 0.000 |  0.000 |  0.061 |  0.020 |
    | 025_mug               | 0.000 |  0.000 |  0.232 |  0.077 |
    | 035_power_drill       | 0.000 |  0.000 |  0.017 |  0.006 |
    | 036_wood_block        | 0.000 |  0.000 |  0.000 |  0.000 |
    | 037_scissors          | 0.000 |  0.044 |  0.078 |  0.041 |
    | 040_large_marker      | 0.000 |  0.000 |  0.344 |  0.115 |
    | 052_extra_large_clamp | 0.000 |  0.000 |  0.385 |  0.128 |
    | 061_foam_brick        | 0.000 |  0.000 |  0.876 |  0.292 |
    Evaluation complete.
    
  • Warning: Compared to COCO and HPE evaluation, BOP evaluation is much more compute expensive, which can take up to an hour for a full run or even more depending on the compute resource.

  • Results format: You should store the results in a .csv file following the BOP results format. Each line should represent one pose estimate with the following variables separated by commas:

    • scene_id: Scene ID. You can get this by get_bop_id_from_idx().
    • im_id: Image ID. You get can this by get_bop_id_from_idx().
    • obj_id: YCB object ID.
    • score: Confidence score.
    • R: A 3x3 rotation matrix in r11 r12 r13 r21 r22 r23 r31 r32 r33.
    • t: A 3x1 translation vector (in mm) in t1 t2 t3.
    • time: Set to -1.

    As described in the supplementary paper Sec. C.1, to speed up the BOP evaluation we only evaluate on a set of subsampled keyframes. As a result, you only need to generate pose estimates for these keyframes. Each image sample will contain a is_bop_target key to indicate whether this image is a keyframe used for BOP evaluation. The example below shows how you may use is_bop_target:

    from dex_ycb_toolkit.factory import get_dataset
    dataset = get_dataset('s0_test')
    for sample in dataset:
      if not sample['is_bop_target']:
        continue
      # Generate object pose estimates below for this image sample.

    You can also look at the example result files in results/example_results_bop_*.csv.

  • To evaluate on your own results, simply add --name to specify the setup and split and --res_file to specify the path to the result file. For example:

    python examples/evaluate_bop.py \
      --name s0_test \
      --res_file path/to/results.csv

HPE Evaluation

  • The example below shows how to run HPE evaluation using an example result file. The output will be logged to results/hpe_eval_s0_test_example_results_hpe_s0_test.log. This will also plot the PCK curves and save them to results/hpe_curve_s0_test_example_results_hpe_s0_test/.

    python examples/evaluate_hpe.py
    You should see the following output (click to expand):
    Running evaluation
    Results:
    | alignment     |   MPJPE (mm) |    AUC |
    |:--------------|-------------:|-------:|
    | absolute      |     280.4081 | 0.0019 |
    | root-relative |     104.7705 | 0.0585 |
    | procrustes    |      49.1635 | 0.1545 |
    Evaluation complete.
    
  • Results format: You should store the results in a .txt file. Each line should represent the predicted 3D position of the 21 hand joints of one image sample. Particularly, each line should have 64 comma-separated numbers:

    • The first number is the index of the image sample.
    • The following 63 numbers are the [x, y, z] values (in mm) of the 21 joints in the camera coordinates of the image. The numbers should be ordered in x1, y1, z1, x2, y2, z2, ..., x21, y21, z21, where the joint order is specified here.

    You can also look at the example result files in results/example_results_hpe_*.txt.

  • To evaluate on your own results, simply add --name to specify the setup and split and --res_file to specify the path to the result file. For example:

    python examples/evaluate_hpe.py \
      --name s0_test \
      --res_file path/to/results.txt

Grasp Evaluation

  • The Grasp evaluation simply takes a result file on object pose (used for BOP evaluation) and a result file on hand segmentation (used for COCO evaluation). It will generate grasps for handover based on these results and evaluate these grasps accordingly.

  • The example below shows how to run Grasp evaluation using example result files. The output will be logged to results/grasp_eval_s0_test_example_results_bop_s0_test_example_results_coco_s0_test. This will also generate a result file results/grasp_res_s0_test_example_results_bop_s0_test_example_results_coco_s0_test.json, which can later be used for plotting precision-coverage curves.

    python examples/evaluate_grasp.py
    You should see the following output (click to expand):
    Running evaluation
    0001/1152     648  003_cracker_box        # gt grasps:  62
    0002/1152     722  003_cracker_box        # gt grasps:  62
    0003/1152     796  003_cracker_box        # gt grasps:  62
    0004/1152     870  003_cracker_box        # gt grasps:  62
    0005/1152     944  003_cracker_box        # gt grasps:  62
    .
    .
    .
    1148/1152   92972  061_foam_brick         # gt grasps:  60
    1149/1152   93044  061_foam_brick         # gt grasps:  60
    1150/1152   93116  061_foam_brick         # gt grasps:  60
    1151/1152   93188  061_foam_brick         # gt grasps:  60
    1152/1152   93260  061_foam_brick         # gt grasps:  60
    Results:
    |   radius (m) |   angle (deg) |   dist th (m) |   coverage |   precision |
    |-------------:|--------------:|--------------:|-----------:|------------:|
    |       0.0500 |            15 |        0.0000 |     0.0000 |      0.0000 |
    |       0.0500 |            15 |        0.0100 |     0.0000 |      0.0000 |
    |       0.0500 |            15 |        0.0200 |     0.0000 |      0.0000 |
    |       0.0500 |            15 |        0.0300 |     0.0000 |      0.0000 |
    |       0.0500 |            15 |        0.0400 |     0.0000 |      0.0000 |
    |       0.0500 |            15 |        0.0500 |     0.0000 |      0.0000 |
    |       0.0500 |            15 |        0.0600 |     0.0000 |      0.0000 |
    |       0.0500 |            15 |        0.0700 |     0.0000 |      0.0000 |
    Evaluation complete.
    
  • You can also run with a --visualize flag, which will simultaneously save visualizations of predicted grasps. The visualizations will be saved to results/grasp_vis_s0_test_example_results_bop_s0_test_example_results_coco_s0_test/.

    python examples/evaluate_grasp.py --visualize
  • The above command requires an active display manager. You can also run an offscreen renderer on a headless server with EGL:

    PYOPENGL_PLATFORM=egl python examples/evaluate_grasp.py --visualize
  • To evaluate on your own results, simply add --name to specify the setup and split and --bop_res_file and --coco_res_file to specify the paths to the BOP and COCO result files. For example:

    python examples/evaluate_grasp.py \
      --name s0_test \
      --bop_res_file path/to/bop/results.csv \
      --coco_res_file path/to/coco/results.json
  • The Grasp evaluation makes use of a set of (100) pre-generated grasps for each object. You can visualize these pre-generated grasps with:

    python examples/visualize_grasps.py

Reproducing CVPR 2021 Results

We provide the result files of the benchmarks reported in the CVPR 2021 paper. Below we show how you can run evaluation on these files and reproduce the exact numbers in the paper.

To run the evaluation, you need to first download the CVPR 2021 results.

./results/fetch_cvpr2021_results.sh

The full set of evaluation scripts can be found in examples/all_cvpr2021_results_eval_scripts.sh. Below we show some examples.

  • For example, to evaluate the 2D object and keypoint detection (COCO) results of Mask R-CNN (Detectron2) on s0, you can run:

    python examples/evaluate_coco.py \
      --name s0_test \
      --res_file results/cvpr2021_results/coco_maskrcnn_s0_test.json
    You should see the following output (click to expand):
    Evaluation results for *bbox*:
    |   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
    |:------:|:------:|:------:|:------:|:------:|:------:|
    | 75.762 | 96.082 | 87.723 | 31.271 | 77.584 | 71.907 |
    Per-category *bbox* AP:
    | category            | AP     | category              | AP     | category            | AP     |
    |:--------------------|:-------|:----------------------|:-------|:--------------------|:-------|
    | 002_master_chef_can | 83.872 | 003_cracker_box       | 85.846 | 004_sugar_box       | 81.297 |
    | 005_tomato_soup_can | 76.031 | 006_mustard_bottle    | 81.557 | 007_tuna_fish_can   | 68.076 |
    | 008_pudding_box     | 73.595 | 009_gelatin_box       | 69.509 | 010_potted_meat_can | 75.634 |
    | 011_banana          | 70.533 | 019_pitcher_base      | 87.170 | 021_bleach_cleanser | 80.978 |
    | 024_bowl            | 80.615 | 025_mug               | 76.013 | 035_power_drill     | 81.826 |
    | 036_wood_block      | 83.745 | 037_scissors          | 64.070 | 040_large_marker    | 52.693 |
    | 051_large_clamp     | nan    | 052_extra_large_clamp | 73.413 | 061_foam_brick      | 72.683 |
    | hand                | 71.847 |                       |        |                     |        |
    Evaluation results for *segm*:
    |   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
    |:------:|:------:|:------:|:------:|:------:|:------:|
    | 69.584 | 93.835 | 78.718 | 20.954 | 70.949 | 77.700 |
    Per-category *segm* AP:
    | category            | AP     | category              | AP     | category            | AP     |
    |:--------------------|:-------|:----------------------|:-------|:--------------------|:-------|
    | 002_master_chef_can | 82.683 | 003_cracker_box       | 83.721 | 004_sugar_box       | 77.713 |
    | 005_tomato_soup_can | 74.745 | 006_mustard_bottle    | 79.395 | 007_tuna_fish_can   | 67.144 |
    | 008_pudding_box     | 70.392 | 009_gelatin_box       | 68.429 | 010_potted_meat_can | 72.659 |
    | 011_banana          | 63.432 | 019_pitcher_base      | 84.669 | 021_bleach_cleanser | 77.515 |
    | 024_bowl            | 78.120 | 025_mug               | 71.945 | 035_power_drill     | 73.797 |
    | 036_wood_block      | 81.409 | 037_scissors          | 29.361 | 040_large_marker    | 42.423 |
    | 051_large_clamp     | nan    | 052_extra_large_clamp | 54.032 | 061_foam_brick      | 72.848 |
    | hand                | 54.834 |                       |        |                     |        |
    Evaluation results for *keypoints*:
    |   AP   |  AP50  |  AP75  |  APm   |  APl   |
    |:------:|:------:|:------:|:------:|:------:|
    | 36.418 | 71.681 | 32.762 | 38.779 | 35.363 |
    Per-category *keypoints* AP:
    | category            | AP     | category              | AP   | category            | AP   |
    |:--------------------|:-------|:----------------------|:-----|:--------------------|:-----|
    | 002_master_chef_can | nan    | 003_cracker_box       | nan  | 004_sugar_box       | nan  |
    | 005_tomato_soup_can | nan    | 006_mustard_bottle    | nan  | 007_tuna_fish_can   | nan  |
    | 008_pudding_box     | nan    | 009_gelatin_box       | nan  | 010_potted_meat_can | nan  |
    | 011_banana          | nan    | 019_pitcher_base      | nan  | 021_bleach_cleanser | nan  |
    | 024_bowl            | nan    | 025_mug               | nan  | 035_power_drill     | nan  |
    | 036_wood_block      | nan    | 037_scissors          | nan  | 040_large_marker    | nan  |
    | 051_large_clamp     | nan    | 052_extra_large_clamp | nan  | 061_foam_brick      | nan  |
    | hand                | 36.418 |                       |      |                     |      |
    Evaluation complete.
    

    The output will also be logged to results/coco_eval_s0_test_coco_maskrcnn_s0_test.log.

  • For example, to evaluate the 6D object pose estimation (BOP) results of CosyPose on s1, you can run:

    python examples/evaluate_bop.py \
      --name s1_test \
      --res_file results/cvpr2021_results/bop_cosypose_s1_test.csv
    You should see the following output (click to expand):
    Deriving results for *all*
    Evaluation results for *all*:
    |  vsd   |  mssd  |  mspd  |  mean  |
    |:------:|:------:|:------:|:------:|
    | 50.384 | 71.791 | 74.431 | 65.535 |
    Per-object scores for *all*:
    | object                |    vsd |   mssd |   mspd |   mean |
    |:----------------------|-------:|-------:|-------:|-------:|
    | 002_master_chef_can   | 85.753 | 83.628 | 82.859 | 84.080 |
    | 003_cracker_box       | 84.040 | 92.605 | 88.905 | 88.516 |
    | 004_sugar_box         | 76.383 | 82.469 | 84.067 | 80.973 |
    | 005_tomato_soup_can   | 55.897 | 59.633 | 69.034 | 61.521 |
    | 006_mustard_bottle    | 62.274 | 77.397 | 77.298 | 72.323 |
    | 007_tuna_fish_can     | 56.964 | 61.002 | 75.030 | 64.332 |
    | 008_pudding_box       | 60.449 | 77.836 | 84.344 | 74.209 |
    | 009_gelatin_box       | 56.673 | 72.329 | 82.828 | 70.610 |
    | 010_potted_meat_can   | 55.996 | 76.488 | 83.338 | 71.941 |
    | 011_banana            |  6.616 | 63.311 | 64.266 | 44.731 |
    | 019_pitcher_base      | 43.229 | 73.489 | 59.781 | 58.833 |
    | 021_bleach_cleanser   | 61.659 | 75.832 | 69.177 | 68.889 |
    | 024_bowl              | 72.657 | 87.537 | 91.571 | 83.922 |
    | 025_mug               | 44.858 | 64.239 | 70.424 | 59.841 |
    | 035_power_drill       | 19.341 | 75.055 | 68.193 | 54.196 |
    | 036_wood_block        | 60.846 | 80.535 | 73.649 | 71.677 |
    | 037_scissors          |  1.481 | 47.346 | 44.762 | 31.196 |
    | 040_large_marker      | 42.481 | 70.218 | 86.952 | 66.550 |
    | 052_extra_large_clamp | 24.680 | 77.485 | 78.131 | 60.099 |
    | 061_foam_brick        | 32.808 | 30.573 | 47.635 | 37.006 |
    Deriving results for *grasp only*
    Evaluation results for *grasp only*:
    |  vsd   |  mssd  |  mspd  |  mean  |
    |:------:|:------:|:------:|:------:|
    | 43.372 | 63.000 | 65.931 | 57.434 |
    Per-object scores for *grasp only*:
    | object                |    vsd |   mssd |   mspd |   mean |
    |:----------------------|-------:|-------:|-------:|-------:|
    | 002_master_chef_can   | 79.613 | 76.590 | 75.433 | 77.212 |
    | 003_cracker_box       | 82.660 | 94.330 | 88.185 | 88.392 |
    | 004_sugar_box         | 69.308 | 69.196 | 70.325 | 69.609 |
    | 005_tomato_soup_can   | 44.397 | 50.901 | 62.995 | 52.764 |
    | 006_mustard_bottle    | 59.477 | 71.649 | 70.210 | 67.112 |
    | 007_tuna_fish_can     | 41.092 | 44.802 | 61.113 | 49.002 |
    | 008_pudding_box       | 55.603 | 73.812 | 81.245 | 70.220 |
    | 009_gelatin_box       | 44.160 | 58.278 | 70.443 | 57.627 |
    | 010_potted_meat_can   | 49.464 | 70.085 | 76.572 | 65.374 |
    | 011_banana            |  5.492 | 47.786 | 53.166 | 35.481 |
    | 019_pitcher_base      | 40.258 | 66.738 | 51.030 | 52.675 |
    | 021_bleach_cleanser   | 55.303 | 71.964 | 63.579 | 63.615 |
    | 024_bowl              | 61.346 | 79.147 | 83.728 | 74.740 |
    | 025_mug               | 35.344 | 51.434 | 58.652 | 48.476 |
    | 035_power_drill       | 13.889 | 70.908 | 62.873 | 49.223 |
    | 036_wood_block        | 56.199 | 77.507 | 67.724 | 67.143 |
    | 037_scissors          |  0.644 | 37.083 | 35.339 | 24.355 |
    | 040_large_marker      | 24.572 | 51.052 | 76.110 | 50.578 |
    | 052_extra_large_clamp | 18.313 | 65.989 | 68.030 | 50.777 |
    | 061_foam_brick        | 25.615 | 25.186 | 39.571 | 30.124 |
    Evaluation complete.
    

    The output will also be logged to results/bop_eval_s1_test_bop_cosypose_s1_test.log.

  • For example, to evaluate the 3D hand pose estimation (HPE) results of Spurr et al. + HRNet32 on s2, you can run:

    python examples/evaluate_hpe.py \
      --name s2_test \
      --res_file results/cvpr2021_results/hpe_spurr_hrnet_s2_test.txt
    You should see the following output (click to expand):
    Running evaluation
    Results:
    | alignment     |   MPJPE (mm) |    AUC |
    |:--------------|-------------:|-------:|
    | absolute      |      80.6272 | 0.2173 |
    | root-relative |      25.4875 | 0.5299 |
    | procrustes    |       8.2075 | 0.8359 |
    Evaluation complete.
    

    The output will also be logged to results/hpe_eval_s2_test_hpe_spurr_hrnet_s2_test.log.

  • For example, to evaluate the object handover (Grasp) results of CosyPose (for 6D object pose) and Mask R-CNN (Detectron2) (for hand segmentation) on s1, you can run:

    python examples/evaluate_grasp.py \
      --name s1_test \
      --bop_res_file results/cvpr2021_results/bop_cosypose_s1_test.csv \
      --coco_res_file results/cvpr2021_results/coco_maskrcnn_s1_test.json
    You should see the following output (click to expand):
    Running evaluation
    0001/1440    2980  003_cracker_box        # gt grasps:  73
    0002/1440    3050  003_cracker_box        # gt grasps:  73
    0003/1440    3120  003_cracker_box        # gt grasps:  73
    0004/1440    3190  003_cracker_box        # gt grasps:  73
    0005/1440    3260  003_cracker_box        # gt grasps:  73
    .
    .
    .
    1436/1440  115990  061_foam_brick         # gt grasps:  45
    1437/1440  116064  061_foam_brick         # gt grasps:  45
    1438/1440  116138  061_foam_brick         # gt grasps:  45
    1439/1440  116212  061_foam_brick         # gt grasps:  45
    1440/1440  116286  061_foam_brick         # gt grasps:  45
    Results:
    |   radius (m) |   angle (deg) |   dist th (m) |   coverage |   precision |
    |-------------:|--------------:|--------------:|-----------:|------------:|
    |       0.0500 |            15 |        0.0000 |     0.4110 |      0.2507 |
    |       0.0500 |            15 |        0.0100 |     0.3838 |      0.3556 |
    |       0.0500 |            15 |        0.0200 |     0.3408 |      0.3826 |
    |       0.0500 |            15 |        0.0300 |     0.2906 |      0.4027 |
    |       0.0500 |            15 |        0.0400 |     0.2412 |      0.4149 |
    |       0.0500 |            15 |        0.0500 |     0.1920 |      0.4125 |
    |       0.0500 |            15 |        0.0600 |     0.1476 |      0.3695 |
    |       0.0500 |            15 |        0.0700 |     0.1133 |      0.3117 |
    Evaluation complete.
    

    The output will also be logged to results/grasp_eval_s1_test_bop_cosypose_s1_test_coco_maskrcnn_s1_test.log.

Finally, you can reproduce the grasp precision-coverage curves for object handover on s1 with:

python examples/plot_grasp_curve.py

This will save the precision-coverage curves on s1 to results/grasp_precision_coverage_s1_test.pdf.

The precision-coverage curves on setup s0, s2, and s3 can be generated with:

python examples/plot_grasp_curve.py --name s0_test
python examples/plot_grasp_curve.py --name s2_test
python examples/plot_grasp_curve.py --name s3_test

Visualizing Sequences

Besides visualizing the ground truths of one image sample, we also provide tools to visualize the captured hand and object motion of a full sequence. The tools include (1) an interactive 3D viewer and (2) an offline renderer.

Interactive 3D Viewer

  • The example below shows how to run the interactive 3D viewer given a sequence name. By default it will first preload and preprocess the data of the entire sequence to provide a high frame rate rendering afterward. The preprocessing by default uses GPU. If you do not have a GPU, you need to add a --device cpu flag to run the preprocessing on CPU.

    # Run on GPU
    python examples/view_sequence.py --name 20200709-subject-01/20200709_141754
    # Run on CPU
    python examples/view_sequence.py --name 20200709-subject-01/20200709_141754 --device cpu

    The 3D viewer provides some basic controls using mouse and keyboard. You can find the control instructions here.

  • Warning: The above command may consume significant CPU memory (e.g. >8G) due to the preload. You can also load frames online without preloading using a --no-preload flag, at the sacrifice of the rendering frame rate:

    python examples/view_sequence.py --name 20200709-subject-01/20200709_141754 --no-preload
  • You can list the names of all the provided sequences (1,000 in total) with:

    for x in $DEX_YCB_DIR/2020*-*/; do for y in ${x}2020*_*/; do echo $(basename $x)/$(basename $y); done; done
    You should see the following output (click to expand):
    20200709-subject-01/20200709_141754
    20200709-subject-01/20200709_141841
    20200709-subject-01/20200709_141931
    20200709-subject-01/20200709_142022
    20200709-subject-01/20200709_142123
    .
    .
    .
    20201022-subject-10/20201022_114741
    20201022-subject-10/20201022_114802
    20201022-subject-10/20201022_114824
    20201022-subject-10/20201022_114847
    20201022-subject-10/20201022_114909
    

Offline Renderer

  • The example below shows how to run the offline renderer given a sequence name. Similar to the 3D viewer above, you need a --device cpu flag to run the preprocessing on CPU.

    # Run on GPU
    python examples/render_sequence.py --name 20200709-subject-01/20200709_141754
    # Run on CPU
    python examples/render_sequence.py --name 20200709-subject-01/20200709_141754 --device cpu

    This will render the color image, segmentation map, and a visualization of the hand joint position for all the frames in the sequence. The rendered images will be saved to data/render/.



  • Similar to the Grasp evaluation, if you do not have an active display manager, you can run an offscreen renderer on a headless server with EGL:

    PYOPENGL_PLATFORM=egl python examples/render_sequence.py --name 20200709-subject-01/20200709_141754

dex-ycb-toolkit's People

Contributors

neilsong avatar noirmist avatar ychao-nvidia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dex-ycb-toolkit's Issues

Question about dataset S2 (#seq)

if self._setup == 's2':

In Tab 9 (main paper), the number of sequences (#seq) would be all 1,000 for train, Val, test instead of 750/125/125. As I understood, the number of sequences(#seq) is independent of the number of camera view points(#view). Have I missed something?
Simple equation for #seq(1,000) = #subject_ind(10) * #sequnce_ind(100)

hand pose, translatation and rotation parameters

Thanks for your great work! I have some problems when I try to ues your dataset, and hope to get your help!

  1. I find that in the /annotations/DEX_YCB_s0_train_data.json, there is also mano_param for every image. I wonder whether they are in camera frame or world frame. If I want to use the hand parameters in this json file, how can I get the parameters of the hand translation(1X3) and the object rotation and translation(4X4)?
  2. I tried to visualize the hand pose using the paramaters both in the label file and the json file, but I found neither of them was right and they were different from each other. I couldn't figure out the reason. In my opinion, even if the paramaters of the object translatation I used were wrong, the hand pose at least should be right. So the result really confused me.

Thanks again and looking forward to your reply!

ModuleNotFoundError: No module named 'utils.eval_util' while testing examples/evaluate_hpe.py

I was trying to test HPE (Hand Pose Estimation) as mentioned in readme and getting below error
_python -m examples.evaluate_hpe.py
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 187, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/usr/lib/python3.10/runpy.py", line 110, in get_module_details
import(pkg_name)
File "../codes/dex-ycb-toolkit/examples/evaluate_hpe.py", line 10, in
from dex_ycb_toolkit.hpe_eval import HPEEvaluator
File "../codes/dex-ycb-toolkit/dex_ycb_toolkit/hpe_eval.py", line 21, in
from utils.eval_util import EvalUtil
ModuleNotFoundError: No module named 'utils.eval_util'

I installed utils module as well using pip install utils. Could you please help me out here?

mano shape consistency for right and left hand.

Thank you for your wonderful work !
Many other works which use MANO hands have considered the problem of shapedirs parameters that raised in Bug on shapedirs of MANO. I want to konw if the mano annotations of DexYCB take the problem into acount, as I found the right and left hands may have different shape even though they are of a same person.

Objects in the DEXYCB dataset

Thanks for your great work!
Would you tell me where I can buy the objects in the DEXYCB dataset?
Thanks for your reply!

subject-specific hand shapes (ß)

Hello @ychao-nvidia ,

In the paper it says "We pre-calibrate the hand shape β for each subject and fix it throughout each subject’s sequences."

Can you please shed some light on how this was done exactly?

Thanks.

Camera parameters

Thanks for the really cool work and the dataset.
I am just wondering, does that released dataset contains camera parameters(extrinsic and intrinsic)?

Thank you!

error downloading the repository

git clone --recursive [email protected]:NVlabs/dex-ycb-toolkit.git

Cloning into 'dex-ycb-toolkit'...
Warning: Permanently added the RSA host key for IP address '140.82.121.4' to the list of known hosts.
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Could you please help to resolve ?

How to get absolute depth values for each pixel?

Hello,

Provided depth images "aligned_depth_to_color_xxxxxx.png" is float32 type. Is it possible to obtain absolute depth values in the camera coordinate frame?

The values of the in-depth image range from 0.0 to 1.0. Are they normalized? How to unnormalize to obtain the absolute depth values for each pixel?

Thanks in advance!

Inconsistency in pose_y for per_image and per_sequence

Hi, when I compare the pose_y of 'label*.npz' in folder 840412060917 and 'pose.npz', I find the pose_y of num_obj=0 is different. As you said in #12 (comment), object 6D pose should be the same in two formats.

Is there something that I made a mistake with?

I really appreciate any help you can provide.

How to use this dataset

if I want to use the dataset to hand and obj pose estimation, What should I do to get the mask of the object

3D projection problem

Hi,

Thank you so much for this work! I am currently having issues on projecting object 3D points (from mesh) back to image plane using both camera intrinsics and extrinsics. Would you mind having a quick look on the following code please?

`

    label = np.load(sample['label_file'])
    trans = label['pose_y'][sample['ycb_grasp_ind'],:,:].astype(np.float32)
    trans_hom = np.concatenate((trans,np.array([[0,0,0,1]],dtype=np.float32)),axis=0).transpose()

    cam_extr = np.array(sample['cam_extr']).astype(np.float32).reshape(3,4) #I have added this to original code and double-checked
    cam_extr_hom = np.concatenate((cam_extr,np.array([[0,0,0,1]],dtype=np.float32)),axis=0)

    verts = obj["verts"]
    hom_verts = np.concatenate([verts, np.ones([verts.shape[0], 1])], axis=1)
    verts_w = trans.dot(hom_verts.T).T
    vert_cam = cam_extr.dot(trans_verts.transpose()).transpose()[:, :3]

    #Here I have skipped how I obtained intr for clarity
    cam_intr = np.array([[intr['fx'], 0.0, intr['ppx']],
                                     [0.0, intr['fy'],intr['ppy']],  
                                     [0.0, 0.0, 1.0]])
    hom_2d = (np.array(cam_intr).dot(objpoints3d.transpose()).transpose())
    objpoints2d = (hom_2d / hom_2d[:, 2:])[:, :2]

`

Why y-axis and z-axis multiply by -1?

Hi thank you for the work, just wondering why in file like examples/visualize_pose.py:
Why do we need to multiply those axes by -1? Is it alright if we don't?

In line 58-59:
pose[1] *= -1
pose[2] *= -1

In line 78-79:
vert[:, 1] *= -1
vert[:, 2] *= -1

World Frame Alignment

Hi I'm using the npz file provided in every sequence folder to do some test.
Is everything in the npz file aligned with world frame? (i.e. if I extract the hand keypoints from mano parameters, is it in world frame?)
If True, then what is the definition of the world frame(i.e. z-up right-handed or z-down right-handed)?
If False, then where can I find the transformation matrix?

Trained weights of Mask R-CNN and SOLOv2

Hello,

Thank you for creating this amazing dataset. In paper, you mentioned you finetuned ResNet50-FPN on DexYCB for Mask R-CNN and SOLOv2. Could you provide your trained weights after finetuning? Thank you!

MANO pose coefficients (theta)

Hello @ychao-nvidia,

From the README:

pose_m: A float32 numpy array of shape [1, 51] holding the pose of the hand. pose_m[:, 0:48] 
stores the MANO pose coefficients in PCA representation, and pose_m[0, 48:51] stores the 
translation. If the image does not have a visible hand or the annotation does not exist, 
pose_m will be all 0.

Where can I find the code for the above pose estimation?

Thanks!

Banana point cloud looks flattened

Hi, thank you for your work. I have extracted the 3D poses for the banana, loaded it into trimesh and converted it into point cloud, then visualize using matplotlib. However, I realized that the banana point cloud looks "flatten", I have tried the same approach with other objects, such as power drill, master chef can, boxes, and they all look correct. Is there any problem with the 3D poses for banana object? I loaded it specifically from /20200709-subject-01/20200709_145401/836212060125/labels_000070.npz
I realized that /20200709-subject-01/20200709_145401/839512060362/labels_000067.npz has the same flattened problem as well.
However, /20200709-subject-01/20200709_145401/840412060917/labels_000064.npz looks correct (not flattened)
Screenshot from 2024-01-11 00-21-19

Inconsistency in pose_y shape

Hi, I was trying to retrieve the rotation matrices for the objects in a sequence and I notice that the representation is a bit different than what stated in the README.

In the README it's mentioned:

pose_y: A float32 numpy array of shape [num_obj, 3, 4] holding the 6D pose of each object. Each 6D pose is represented by [R; t], where R is the 3x3 rotation matrix and t is the 3x1 translation .

But actually reading pose_y I get a different shape. In particular I get [num_frames, num_obj, 7] .
What is this shape representing? Is this using the quaternion representation?

Many thanks in advance!

Table 2: Cross dataset evaluation

I noticed that DexYCB has the left or the right hand. How did you do the cross dataset evaluation experiment compared to HO3D, which only has the right hand?

There are two ways that I can think of:

  1. Mirror all left hand images to the right and only train a single right hand network.
  2. Train a network that predicts both left and right hand on DexYCB, and compare with the right-handed network in HO3D.

Could you help clarify? Thanks!

Question on teaser

Dear Authors,

How did you guys create the cool animation in the dataset teaser video? more specifically, how did you create the zoom in effects on the mosaic of videos?

I would like to create a similar animation.

Could you please provide any hints? I learn that its possible using some video editors but then I need to buy the license for those tools.

the form of hand pose annotations

Thanks for your great work!

I noticed that in label files, pose_m[:, 0:48] stored the MANO pose coefficients in PCA representation, and pose_m[0, 48:51] stored the translation. I wonder how can I transform them into axis-angle representation or quaternion representation? And are the poses represented in a root-relative system in the label files? Is it right that I can trasnform the pose to camera system by using the hand translation?

Thanks for your help!

Camera extrinsics, intrinsics and depth

The intrinsics/xxxxxxxxx_640x480.yml file contains intrinsics separately for 'color' and 'depth'. However, to convert the depth map to pointcloud in SequenceLoader() 'color' intrinsics are used here

Isn't intr['depth'] be used for converting depth map into pointlcouds? what is the purpose of 'depth' intrinsics?

Also, the pointclouds return here are they world coordinate system?

Reproducing results of Table 7 in paper (3D hand pose estimation)!

Thanks for your great work. @noirmist @ychao-nvidia
To use your dataset,
I could not get your baseline ( a supervised version of Spurr et al. [31]) results on DexYCB.
I would be thankful for your clarification .

  1. In paper of [31], It is mentioned that input image to the baseline [31] is a 128 × 128 RGB image is cropped around the bounding box, have you done that in same way? (i.e For table 7 cropped images are input to the baseline ?)

  2. if the answer of Q1 is "yes", how have you report the "absolute error" ? reported absolute error is meaningful when the input image is not cropped!

  3. My absolute error is around 100 mm (input images are not cropped around the bounding-box), I do not have any idea how you have reached to "50 mm".

One more videos for class 2.

I think 20201015-subject-09\20201015_143145 should be class 3. But the video and label is 2. So, the 9th person grasps class 2 for 6 times, and class 3 for 4 times. Is that right?

Orientation of world coordinate system

Hi

I have a question about the orientation of the world coordinate system. When running the visualizer via the view_sequence.py file, the entire scene seems to be tilted with regards to the xyz world-coordinates displayed (see attached image). Is there any way to retrieve the orientation of the "scene", e.g., the table orientation with regards to world coordinates?

Thanks for your reply!

dexycb_vis

Image with hand, but no pose_m

thank you for your work!
I met a question about datasets.
For some images, the hand is visible but pose_m is all zeros, did I miss something?

Invalid tar magic when unpacking data of subjects 3-5

I've downloaded the individual tar files from the website. I can't untar the files from subjects 3 to 5 completely, but get the following error:

tar -xzvf 20200820-subject-03.tar.gz
...
20200820-subject-03/20200820_144338/839512060362/aligned_depth_to_color_000022.png
tar: invalid tar magic
tar -xzvf 20200903-subject-04.tar.gz
...
20200903-subject-04/20200903_113012/840412060917/aligned_depth_to_color_000047.png
tar: invalid tar magic
tar -xzvf 20200908-subject-05.tar.gz
...
20200908-subject-05/20200908_151833/839512060362/aligned_depth_to_color_000044.png
tar: invalid tar magic

I re-downloaded them several times, but with the same result. Is anyone else experiencing this issue?

find out even in the image, there is no hand, but true labels show there is hand pose

thank you for your work!
I met a question about dataset.
when I look at the image(location: data/20200709-subject-01/20200709_141841/840412060917/color_000013.jpg ), I can not see any hand. But when I load the label = numpy.load('data/20200709-subject-01/20200709_141841/840412060917/labels_000013.npz') and the print the pose_m, I find out the label is not all 0 for hand. Did I miss something or there is other explain for that.

Depth ambiguity for RGB only methods

I'm looking at the 3 error metrics for the BOP challenge, and it appears that 2 of them rely on absolute position to produce valid scores. I see that there are RGB only methods for obj pose evaluation in the paper. How is the depth ambiguity associated with RGB only methods handled with respect to the evaluations? Thanks!

How did you collect data?

Hello,

Thank you for making such great dataset! I wonder how you collected data and produced BOP dataset? Do you have any scripts for recording from RealSense D435, calibrating camera extrinsics, or marking ground truth on dataset? Thank you!

pyglet 2.0.9 requires Python 3.8 or newer.

Hi, thank you for the work.
I am facing some problem while running the python examples/visualize_pose.py code. I built my virtual environment with Python 3.7 as the code was tested with Python 3.7, and everything I have cloned from the github is built into this same virtual environment. However, I am getting the error "pyglet 2.0.9 requires 3.8 or newer.". What should I do?

I can't align vertices with other views by using extrinsic parameter.

Hello, I am using your dataset well. thank you for your great works.

I'm having a hard time aligning vertices with other views.

I rotated and translated hand vertices (using mano output, camera coordinate) using inverse of extrinsic matrix.
Then, I multiplied vertices (world coordinate) with other view's extrinsic and intrinsic parameter.
But it is not aligned with image.
I think extrinsic parameter is wrong because when I just align vertices on image using intrinsic, it was aligned well.

Could you give me any advise on this problem?
Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.