I found the cam params (scale tx, ty) converge worse with or without the cam loss, and

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

ques about the cam parameter converge about frankmocap HOT 5 CLOSED

lvZic commented on September 27, 2024

ques about the cam parameter converge

from frankmocap.

Comments (5)

penincillin commented on September 27, 2024

Hi LvZic,

For learning the camera parameters, I have two suggestions.

Do not adding any direct supervision on cam parameters (scale, tx, and ty). They should only be supervised via 2D keypoint loss.
To mitigate the focal length / scale issue you mentioned, I suggest to crop the input images to make the hand size roughly the same.

from frankmocap.

lvZic commented on September 27, 2024

Hi LvZic,

For learning the camera parameters, I have two suggestions.

Do not adding any direct supervision on cam parameters (scale, tx, and ty). They should only be supervised via 2D keypoint loss.

To mitigate the focal length / scale issue you mentioned, I suggest to crop the input images to make the hand size roughly the same.

thanks for your suggestions.

Now the crop strategy has been added in data-augment, the camera params weight is set by 0, and the scale_gt now range from [0.7, 2.0]. Without direct supervision on cam parameters, the training converge better.

However, all params seems to converge slowly after 30 epoch, so does the cam params. So the cams loss can now be added in or not? and I wonder why the direct supervision on cam parameters affect the converge performance, as the usage of direct loss of cam params can be found in some research.
the 3d-2d projection is 2d = s(3d + t_xy) in your project, while in the HMR paper is s * 3d + t_xy. following
discussion is about it: akanazawa/hmr#60 . In my test, your projection formula has better converge performance than origin HMR paper's. Can u explain it more ? thanks.

from frankmocap.

penincillin commented on September 27, 2024

@lvZic

How do you get the scale_gt? According to my knowledge, none of the papers in this area apply such loss.
I suggest not adding the direct camera loss during the whole camera parameters. The model uses weak perspective camera model which only considers scale and translation. The ground-truth camera model is perspective. These two camera models has different definition thus are not the same. Therefore, it makes no sense to use direct camera parameters.
I believe s(3d + t_xy) and s * 3d + t_xy are the same conceptually. Please feel free to choose any format that works better on your side.

from frankmocap.

lvZic commented on September 27, 2024

teh

@lvZic

How do you get the scale_gt? According to my knowledge, none of the papers in this area apply such loss.

I suggest not adding the direct camera loss during the whole camera parameters. The model uses weak perspective camera model which only considers scale and translation. The ground-truth camera model is perspective. These two camera models has different definition thus are not the same. Therefore, it makes no sense to use direct camera parameters.

I believe s(3d + t_xy) and s * 3d + t_xy are the same conceptually. Please feel free to choose any format that works better on your side.

End-to-end Hand Mesh Recovery from a Monocular RGB Image(https://arxiv.org/abs/1902.09305) use camera loss:

In my test, the scale_gt is obtained by focal_length / global_trans[2], and the global_trans is the last 3 MANO parms. I just checked the uv = scale_gt*(XY + t_xy), the visual 2d result seems normal with little misalignment.
However, in my latest test which not use camera loss, all the regressed params converge better except cam param, result is following:
total loss: 453.9190 | 2d loss: 11.2612 | 3d loss: 0.037922 | mask loss: 0.2397 | reg loss: 0.0018 | scale loss: 0.1078 | trans loss: 4372.8198 | rvec loss: 0.0184 | pose loss: 0.0418 | shape loss: 0.2389
The scale loss and trans loss are too large, although the scale_gt is not the ture value of weak perspective camera model.

from frankmocap.

penincillin commented on September 27, 2024

@lvZic
I suggest not pay too much attention to "ground-truth" scale/translation as they don't really exist. Just learn scale/translation using 2D keypoint losses.

from frankmocap.

ques about the cam parameter converge about frankmocap HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent