Giter Site home page Giter Site logo

Comments (8)

nitba avatar nitba commented on August 15, 2024 1

Thanks for your comments, @ychao-nvidia .
The paper did not mention that for For each test sample, you assume to have GT bounding box coordinates and camera intrinsic to report the absolute errors.
Regarding 100 mm absolute error, I did my experiments assuming that I do not have bounding boxes for test samples!

from dex-ycb-toolkit.

nitba avatar nitba commented on August 15, 2024

Hi @zc-alexfan

Do you have any idea about my question?

#3 (comment)

from dex-ycb-toolkit.

zc-alexfan avatar zc-alexfan commented on August 15, 2024

Sorry, I did not use it.

from dex-ycb-toolkit.

nitba avatar nitba commented on August 15, 2024

Hi @umariqb,

I would appreciate your comments on my issue.

from dex-ycb-toolkit.

ychao-nvidia avatar ychao-nvidia commented on August 15, 2024

Answers to your questions:

  1. Yes, we followed [31] for the baselines reported in Tab. 7. Therefore, yes, the input to the network is a 128 x 128 RGB image cropped around the bounding box.

  2. We report absolute error by computing the 3D distance between the ground-truth and predicted joint positions.

Two additional comments regarding [31] and absolute error:

  • [31] predicts a special "2.5D representation" (see [18]) for the hand pose, and then uses the bounding box coordinates and the camera intrinsics to convert this "2.5D representation" to the 3D pose. That said, the task of the network is only to predict (1) the 2D locations of keypoints within the input image and (2) the root-relative depths of keypoints (see [18]), which is reasonable for a cropped image input. With this "2.5D representation", converting to 3D pose is well-posed (see [18]).

  • With that said, the input for this benchmark (for RGB-only) should really be (1) the full RGB image, (2) the bounding box coordinates, and (3) the camera intrinsics---not just the cropped image itself. You can see [31] as using (1) and (2) to get the input of their network (i.e., the cropped image).

from dex-ycb-toolkit.

namepllet avatar namepllet commented on August 15, 2024

Hi @ychao-nvidia I'd like to compare my results on Dex YCB dataset to your results(Table 7, 3D hand pose estimation)

It seems ground truth bounding box for hand is not available in test time,

so what bounding box (maybe using 2D joints coordinates or detected bounding box ... ?) did you use when crop image for hand in test time ?

from dex-ycb-toolkit.

ychao-nvidia avatar ychao-nvidia commented on August 15, 2024

As mentioned above, for hand pose estimation we assume the bounding box is given at test time.

We calculated a tight bounding box by [min(X), min(Y), max(X), max(Y)], where X (Y) is the 2D x (y) coordinates of the 21 hand joints provided in the ground truths.

We then cropped a square image region (1) with a center shared with this tight bounding box and (2) with a side length of l, where 0.7*l=max(w, h) and w and h are the width and height of the tight bounding box. We used this cropped image region as the input to our network.

from dex-ycb-toolkit.

namepllet avatar namepllet commented on August 15, 2024

Thanks for clear comments!

from dex-ycb-toolkit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.