Giter Site home page Giter Site logo

Comments (5)

penincillin avatar penincillin commented on September 27, 2024

Hi LvZic,

For learning the camera parameters, I have two suggestions.

  1. Do not adding any direct supervision on cam parameters (scale, tx, and ty). They should only be supervised via 2D keypoint loss.
  2. To mitigate the focal length / scale issue you mentioned, I suggest to crop the input images to make the hand size roughly the same.

from frankmocap.

lvZic avatar lvZic commented on September 27, 2024

Hi LvZic,

For learning the camera parameters, I have two suggestions.

  1. Do not adding any direct supervision on cam parameters (scale, tx, and ty). They should only be supervised via 2D keypoint loss.
  2. To mitigate the focal length / scale issue you mentioned, I suggest to crop the input images to make the hand size roughly the same.

thanks for your suggestions.

Now the crop strategy has been added in data-augment, the camera params weight is set by 0, and the scale_gt now range from [0.7, 2.0]. Without direct supervision on cam parameters, the training converge better.

  1. However, all params seems to converge slowly after 30 epoch, so does the cam params. So the cams loss can now be added in or not? and I wonder why the direct supervision on cam parameters affect the converge performance, as the usage of direct loss of cam params can be found in some research.
  2. the 3d-2d projection is 2d = s(3d + t_xy) in your project, while in the HMR paper is s * 3d + t_xy. following
    discussion is about it: akanazawa/hmr#60 . In my test, your projection formula has better converge performance than origin HMR paper's. Can u explain it more ? thanks.

from frankmocap.

penincillin avatar penincillin commented on September 27, 2024

@lvZic

  1. How do you get the scale_gt? According to my knowledge, none of the papers in this area apply such loss.
  2. I suggest not adding the direct camera loss during the whole camera parameters. The model uses weak perspective camera model which only considers scale and translation. The ground-truth camera model is perspective. These two camera models has different definition thus are not the same. Therefore, it makes no sense to use direct camera parameters.
  3. I believe s(3d + t_xy) and s * 3d + t_xy are the same conceptually. Please feel free to choose any format that works better on your side.

from frankmocap.

lvZic avatar lvZic commented on September 27, 2024

teh

@lvZic

  1. How do you get the scale_gt? According to my knowledge, none of the papers in this area apply such loss.
  2. I suggest not adding the direct camera loss during the whole camera parameters. The model uses weak perspective camera model which only considers scale and translation. The ground-truth camera model is perspective. These two camera models has different definition thus are not the same. Therefore, it makes no sense to use direct camera parameters.
  3. I believe s(3d + t_xy) and s * 3d + t_xy are the same conceptually. Please feel free to choose any format that works better on your side.
  1. End-to-end Hand Mesh Recovery from a Monocular RGB Image(https://arxiv.org/abs/1902.09305) use camera loss:
    image

image

  1. In my test, the scale_gt is obtained by focal_length / global_trans[2], and the global_trans is the last 3 MANO parms. I just checked the uv = scale_gt*(XY + t_xy), the visual 2d result seems normal with little misalignment.
  2. However, in my latest test which not use camera loss, all the regressed params converge better except cam param, result is following:
    total loss: 453.9190 | 2d loss: 11.2612 | 3d loss: 0.037922 | mask loss: 0.2397 | reg loss: 0.0018 | scale loss: 0.1078 | trans loss: 4372.8198 | rvec loss: 0.0184 | pose loss: 0.0418 | shape loss: 0.2389
    The scale loss and trans loss are too large, although the scale_gt is not the ture value of weak perspective camera model.

from frankmocap.

penincillin avatar penincillin commented on September 27, 2024

@lvZic
I suggest not pay too much attention to "ground-truth" scale/translation as they don't really exist. Just learn scale/translation using 2D keypoint losses.

from frankmocap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.