Thank you very much for making your training code public. I used your default conf

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Training results are very poor. about dir HOT 7 OPEN

pengfeiren96 commented on June 11, 2024 1

Training results are very poor.

from dir.

Comments (7)

walsvid commented on June 11, 2024

Hi @xungeer29. Could you reproduce the metrics mentioned in the readme using the released checkpoint? My suggestion is to first ensure that the results of the official model inference can be reproduced before making any modifications. Additionally, according to the linear scaling rule, when the batch size changes, the learning rate also needs to be adjusted accordingly to achieve similar performance.

from dir.

xungeer29 commented on June 11, 2024

Hi @xungeer29. Could you reproduce the metrics mentioned in the readme using the released checkpoint? My suggestion is to first ensure that the results of the official model inference can be reproduced before making any modifications. Additionally, according to the linear scaling rule, when the batch size changes, the learning rate also needs to be adjusted accordingly to achieve similar performance.

I can reproduce the metrics mentioned in the readme using the released checkpoint, as shown below

joint mean error:
    left: 10.745436884462833 mm, right: 9.604906663298607 mm
    all: 10.17517177388072 mm
vert mean error:
    left: 10.490822605788708 mm, right: 9.404349140822887 mm
    all: 9.947585873305798 mm
pixel joint mean error:
    left: 6.331959247589111 mm, right: 5.808093070983887 mm
    all: 6.070026397705078 mm
pixel vert mean error:
    left: 6.235781669616699 mm, right: 5.725203037261963 mm
    all: 5.98049259185791 mm
root error: 28.982944786548615 mm

LR only have a small impact on the results and cannot make the network completely non convergent.
And I tried to linearly increase lr based on the batch size, but it also had no effect.

from dir.

luckyday2022 commented on June 11, 2024

非常感谢您公开您的训练代码。我使用您的默认配置文件来训练模型，只需将batch_size修改为 256。MPJPE的最终结果为81+，MPVPE的最终结果为88+。我不知道为什么结果这么糟糕。

[11/04 02:22:49] Training INFO: [Epoch 49/50][Batch 300/357][lr 0.000000][loss_seg: 0.0541][loss_dense: 0.0002][loss_lovasz: 0.0125][loss_joint_left_uv_0: 0.0055][loss_joint_right_uv_0: 0.0054][loss_mesh_left_uv_0: 0.0071][loss_mesh_right_uv_0: 0.0075][loss_joint_left_xyz_0: 0.0066][loss_joint_right_xyz_0: 0.0064][loss_mesh_left_xyz_0: 0.0080][loss_mesh_right_xyz_0: 0.0082][loss_edge_left_0: 0.0135][loss_edge_right_0: 0.0138][loss_normal_left_0: 0.0294][loss_normal_right_0: 0.0300][loss_offset_0: 0.0033][loss_joint_left_uv_1: 0.0052][loss_joint_right_uv_1: 0.0052][loss_mesh_left_uv_1: 0.0069][loss_mesh_right_uv_1: 0.0068][loss_joint_left_xyz_1: 0.0065][loss_joint_right_xyz_1: 0.0063][loss_mesh_left_xyz_1: 0.0079][loss_mesh_right_xyz_1: 0.0078][loss_edge_left_1: 0.0135][loss_edge_right_1: 0.0137][loss_normal_left_1: 0.0292][loss_normal_right_1: 0.0291][loss_offset_1: 0.0031][loss_joint_left_uv_2: 0.0051][loss_joint_right_uv_2: 0.0050][loss_mesh_left_uv_2: 0.0068][loss_mesh_right_uv_2: 0.0067][loss_joint_left_xyz_2: 0.0065][loss_joint_right_xyz_2: 0.0063][loss_mesh_left_xyz_2: 0.0079][loss_mesh_right_xyz_2: 0.0078][loss_edge_left_2: 0.0135][loss_edge_right_2: 0.0136][loss_normal_left_2: 0.0291][loss_normal_right_2: 0.0291][loss_offset_2: 0.0031]
[11/04 02:24:36] Training INFO: Save checkpoint to ./checkpoints/DIR/checkpoint/latest.pth
[11/04 02:30:25] Training INFO: MPJPE_0: left 80.77075399604499 mm, right 81.86647319326214 mm, AVG 81.31861359465356 mm
[11/04 02:30:25] Training INFO: MPVPE_0: left 87.17533539907605 mm, right 89.05994439241933 mm, AVG 88.11763989574769 mm
[11/04 02:30:25] Training INFO: MPJPE_1: left 80.8181789283659 mm, right 82.71075034258412 mm, AVG 81.76446463547501 mm
[11/04 02:30:25] Training INFO: MPVPE_1: left 87.29970373359382 mm, right 89.37457580776776 mm, AVG 88.33713977068078 mm
[11/04 02:30:25] Training INFO: MPJPE_2: left 81.22921638629016 mm, right 83.13988862084408 mm, AVG 82.18455250356712 mm
[11/04 02:30:25] Training INFO: MPVPE_2: left 87.71276979469785 mm, right 89.84211043399922 mm, AVG 88.77744011434854 mm

Have you solved this problem?

from dir.

xungeer29 commented on June 11, 2024

[11/04 02:22:49] Training INFO: [Epoch 49/50][Batch 300/357][lr 0.000000][loss_seg: 0.0541][loss_dense: 0.0002][loss_lovasz: 0.0125][loss_joint_left_uv_0: 0.0055][loss_joint_right_uv_0: 0.0054][loss_mesh_left_uv_0: 0.0071][loss_mesh_right_uv_0: 0.0075][loss_joint_left_xyz_0: 0.0066][loss_joint_right_xyz_0: 0.0064][loss_mesh_left_xyz_0: 0.0080][loss_mesh_right_xyz_0: 0.0082][loss_edge_left_0: 0.0135][loss_edge_right_0: 0.0138][loss_normal_left_0: 0.0294][loss_normal_right_0: 0.0300][loss_offset_0: 0.0033][loss_joint_left_uv_1: 0.0052][loss_joint_right_uv_1: 0.0052][loss_mesh_left_uv_1: 0.0069][loss_mesh_right_uv_1: 0.0068][loss_joint_left_xyz_1: 0.0065][loss_joint_right_xyz_1: 0.0063][loss_mesh_left_xyz_1: 0.0079][loss_mesh_right_xyz_1: 0.0078][loss_edge_left_1: 0.0135][loss_edge_right_1: 0.0137][loss_normal_left_1: 0.0292][loss_normal_right_1: 0.0291][loss_offset_1: 0.0031][loss_joint_left_uv_2: 0.0051][loss_joint_right_uv_2: 0.0050][loss_mesh_left_uv_2: 0.0068][loss_mesh_right_uv_2: 0.0067][loss_joint_left_xyz_2: 0.0065][loss_joint_right_xyz_2: 0.0063][loss_mesh_left_xyz_2: 0.0079][loss_mesh_right_xyz_2: 0.0078][loss_edge_left_2: 0.0135][loss_edge_right_2: 0.0136][loss_normal_left_2: 0.0291][loss_normal_right_2: 0.0291][loss_offset_2: 0.0031]
[11/04 02:24:36] Training INFO: Save checkpoint to ./checkpoints/DIR/checkpoint/latest.pth
[11/04 02:30:25] Training INFO: MPJPE_0: left 80.77075399604499 mm, right 81.86647319326214 mm, AVG 81.31861359465356 mm
[11/04 02:30:25] Training INFO: MPVPE_0: left 87.17533539907605 mm, right 89.05994439241933 mm, AVG 88.11763989574769 mm
[11/04 02:30:25] Training INFO: MPJPE_1: left 80.8181789283659 mm, right 82.71075034258412 mm, AVG 81.76446463547501 mm
[11/04 02:30:25] Training INFO: MPVPE_1: left 87.29970373359382 mm, right 89.37457580776776 mm, AVG 88.33713977068078 mm
[11/04 02:30:25] Training INFO: MPJPE_2: left 81.22921638629016 mm, right 83.13988862084408 mm, AVG 82.18455250356712 mm
[11/04 02:30:25] Training INFO: MPVPE_2: left 87.71276979469785 mm, right 89.84211043399922 mm, AVG 88.77744011434854 mm

Have you solved this problem?

No.

from dir.

luckyday2022 commented on June 11, 2024

[11/04 02:22:49] Training INFO: [Epoch 49/50][Batch 300/357][lr 0.000000][loss_seg: 0.0541][loss_dense: 0.0002][loss_lovasz: 0.0125][loss_joint_left_uv_0: 0.0055][loss_joint_right_uv_0: 0.0054][loss_mesh_left_uv_0: 0.0071][loss_mesh_right_uv_0: 0.0075][loss_joint_left_xyz_0: 0.0066][loss_joint_right_xyz_0: 0.0064][loss_mesh_left_xyz_0: 0.0080][loss_mesh_right_xyz_0: 0.0082][loss_edge_left_0: 0.0135][loss_edge_right_0: 0.0138][loss_normal_left_0: 0.0294][loss_normal_right_0: 0.0300][loss_offset_0: 0.0033][loss_joint_left_uv_1: 0.0052][loss_joint_right_uv_1: 0.0052][loss_mesh_left_uv_1: 0.0069][loss_mesh_right_uv_1: 0.0068][loss_joint_left_xyz_1: 0.0065][loss_joint_right_xyz_1: 0.0063][loss_mesh_left_xyz_1: 0.0079][loss_mesh_right_xyz_1: 0.0078][loss_edge_left_1: 0.0135][loss_edge_right_1: 0.0137][loss_normal_left_1: 0.0292][loss_normal_right_1: 0.0291][loss_offset_1: 0.0031][loss_joint_left_uv_2: 0.0051][loss_joint_right_uv_2: 0.0050][loss_mesh_left_uv_2: 0.0068][loss_mesh_right_uv_2: 0.0067][loss_joint_left_xyz_2: 0.0065][loss_joint_right_xyz_2: 0.0063][loss_mesh_left_xyz_2: 0.0079][loss_mesh_right_xyz_2: 0.0078][loss_edge_left_2: 0.0135][loss_edge_right_2: 0.0136][loss_normal_left_2: 0.0291][loss_normal_right_2: 0.0291][loss_offset_2: 0.0031]
[11/04 02:24:36] Training INFO: Save checkpoint to ./checkpoints/DIR/checkpoint/latest.pth
[11/04 02:30:25] Training INFO: MPJPE_0: left 80.77075399604499 mm, right 81.86647319326214 mm, AVG 81.31861359465356 mm
[11/04 02:30:25] Training INFO: MPVPE_0: left 87.17533539907605 mm, right 89.05994439241933 mm, AVG 88.11763989574769 mm
[11/04 02:30:25] Training INFO: MPJPE_1: left 80.8181789283659 mm, right 82.71075034258412 mm, AVG 81.76446463547501 mm
[11/04 02:30:25] Training INFO: MPVPE_1: left 87.29970373359382 mm, right 89.37457580776776 mm, AVG 88.33713977068078 mm
[11/04 02:30:25] Training INFO: MPJPE_2: left 81.22921638629016 mm, right 83.13988862084408 mm, AVG 82.18455250356712 mm
[11/04 02:30:25] Training INFO: MPVPE_2: left 87.71276979469785 mm, right 89.84211043399922 mm, AVG 88.77744011434854 mm

Have you solved this problem?

No.

Is there something wrong with the test code?

from dir.

xungeer29 commented on June 11, 2024

[11/04 02:22:49] Training INFO: [Epoch 49/50][Batch 300/357][lr 0.000000][loss_seg: 0.0541][loss_dense: 0.0002][loss_lovasz: 0.0125][loss_joint_left_uv_0: 0.0055][loss_joint_right_uv_0: 0.0054][loss_mesh_left_uv_0: 0.0071][loss_mesh_right_uv_0: 0.0075][loss_joint_left_xyz_0: 0.0066][loss_joint_right_xyz_0: 0.0064][loss_mesh_left_xyz_0: 0.0080][loss_mesh_right_xyz_0: 0.0082][loss_edge_left_0: 0.0135][loss_edge_right_0: 0.0138][loss_normal_left_0: 0.0294][loss_normal_right_0: 0.0300][loss_offset_0: 0.0033][loss_joint_left_uv_1: 0.0052][loss_joint_right_uv_1: 0.0052][loss_mesh_left_uv_1: 0.0069][loss_mesh_right_uv_1: 0.0068][loss_joint_left_xyz_1: 0.0065][loss_joint_right_xyz_1: 0.0063][loss_mesh_left_xyz_1: 0.0079][loss_mesh_right_xyz_1: 0.0078][loss_edge_left_1: 0.0135][loss_edge_right_1: 0.0137][loss_normal_left_1: 0.0292][loss_normal_right_1: 0.0291][loss_offset_1: 0.0031][loss_joint_left_uv_2: 0.0051][loss_joint_right_uv_2: 0.0050][loss_mesh_left_uv_2: 0.0068][loss_mesh_right_uv_2: 0.0067][loss_joint_left_xyz_2: 0.0065][loss_joint_right_xyz_2: 0.0063][loss_mesh_left_xyz_2: 0.0079][loss_mesh_right_xyz_2: 0.0078][loss_edge_left_2: 0.0135][loss_edge_right_2: 0.0136][loss_normal_left_2: 0.0291][loss_normal_right_2: 0.0291][loss_offset_2: 0.0031]
[11/04 02:24:36] Training INFO: Save checkpoint to ./checkpoints/DIR/checkpoint/latest.pth
[11/04 02:30:25] Training INFO: MPJPE_0: left 80.77075399604499 mm, right 81.86647319326214 mm, AVG 81.31861359465356 mm
[11/04 02:30:25] Training INFO: MPVPE_0: left 87.17533539907605 mm, right 89.05994439241933 mm, AVG 88.11763989574769 mm
[11/04 02:30:25] Training INFO: MPJPE_1: left 80.8181789283659 mm, right 82.71075034258412 mm, AVG 81.76446463547501 mm
[11/04 02:30:25] Training INFO: MPVPE_1: left 87.29970373359382 mm, right 89.37457580776776 mm, AVG 88.33713977068078 mm
[11/04 02:30:25] Training INFO: MPJPE_2: left 81.22921638629016 mm, right 83.13988862084408 mm, AVG 82.18455250356712 mm
[11/04 02:30:25] Training INFO: MPVPE_2: left 87.71276979469785 mm, right 89.84211043399922 mm, AVG 88.77744011434854 mm

Have you solved this problem?

No.

Is there something wrong with the test code?

The testing results using released checkpoints are correct. Have you also encountered the same problem?

from dir.

luckyday2022 commented on June 11, 2024

只需将batch_size修改为 256。MPJPE的最终结果为81+，MPVPE的最终结果为88+。
我不知道为什么结果这么糟糕。

Have you trained the original model? What's the result?

from dir.

Training results are very poor. about dir HOT 7 OPEN

Comments (7)

Related Issues (8)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent