I have wrote a code from your paper, and tried to reimplement the results on ImageNet,

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-h

Hi, <a class="user-mention notranslate" data-hovercard-type=

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

When will the code come? about onlinelabelsmoothing HOT 6 CLOSED

Kurumi233 commented on May 23, 2024

When will the code come?

from onlinelabelsmoothing.

Comments (6)

zhangchbin commented on May 23, 2024

Hi, @Kurumi233
Our paper is under review. So the code will be public later. By the way, we do not use synchronization batch normalization.
Our code is heavily borrowed from CutMix-Pytorch. And you can refer to this code repo. There are some points that you'd better know. Firstly, we only accumulate the correctly predicted samples, and we only regard the accumulated soft labels during the previous one epoch as the ground-truth soft label. Our all experiments are running on 4 RTX 2080Ti GPUs, and the batch size is 64x4, weight decay is 1e-4, SGD with momentum optimizer. The initial learning rate is set to 0.1, and it decays at epoch 75, 150 and 225, respectively. I think there may be some difference between your re-implementation and our implementation. Please feel it easy to contact me by E-mail: zhangchbin Dot gmail.com (we can chat on the WECHAT.)

from onlinelabelsmoothing.

Kurumi233 commented on May 23, 2024

Hi, @Kurumi233
Our paper is under review. So the code will be public later. By the way, we do not use synchronization batch normalization.
Our code is heavily borrowed from CutMix-Pytorch. And you can refer to this code repo. There are some points that you'd better know. Firstly, we only accumulate the correctly predicted samples, and we only regard the accumulated soft labels during the previous one epoch as the ground-truth soft label. Our all experiments are running on 4 RTX 2080Ti GPUs, and the batch size is 64x4, weight decay is 1e-4, SGD with momentum optimizer. The initial learning rate is set to 0.1, and it decays at epoch 75, 150 and 225, respectively. I think there may be some difference between your re-implementation and our implementation. Please feel it easy to contact me by E-mail: zhangchbin Dot gmail.com (we can chat on the WECHAT.)

OK, thanks. I will check it.

from onlinelabelsmoothing.

zhangchbin commented on May 23, 2024

Hi, @Kurumi233
Our paper is under review. So the code will be public later. By the way, we do not use synchronization batch normalization.
Our code is heavily borrowed from CutMix-Pytorch. And you can refer to this code repo. There are some points that you'd better know. Firstly, we only accumulate the correctly predicted samples, and we only regard the accumulated soft labels during the previous one epoch as the ground-truth soft label. Our all experiments are running on 4 RTX 2080Ti GPUs, and the batch size is 64x4, weight decay is 1e-4, SGD with momentum optimizer. The initial learning rate is set to 0.1, and it decays at epoch 75, 150 and 225, respectively. I think there may be some difference between your re-implementation and our implementation. Please feel it easy to contact me by E-mail: zhangchbin Dot gmail.com (we can chat on the WECHAT.)

OK, thanks. I will check it.

I am glad to help you re-implement it.

from onlinelabelsmoothing.

Kurumi233 commented on May 23, 2024

Hello, now I get the top1 acc 77.67% on ImageNet using Resnet50. I modified the weight-decay from 5e-4 to 1e-4 and changed the batch size to 256. I used to think the batch size does not need to times numbers of GPU when training without sync-bn. The result seems to be within the margin of error. The code has been released to my git repository.

But I still have an expectation. The first time I said "synchronization" doesn't means sync-bn, I think the soft label also needs to consider synchronization when training with multi GPUs. I use a single 32GB GPU which seems to avoid this problem, So I hope you can consider it in your code. Thank you.

from onlinelabelsmoothing.

zhangchbin commented on May 23, 2024

Hi, @Kurumi233
Thanks for your effort and reminder. Firstly, we built the code using DataParallel in PyTorch and got the initial performance of ResNet-50 reported in our paper. Then we re-built the code using DDP in NVIDIA-Apex. Thanks for your kind reminder again. We do not consider the synchronization in our DDP version. We will fix this bug. Note that we have not tried other epochs like 100 (30, 60, 90), because we refer to CutMix. By the way, our method can also improve the performance incorporated with data augmentation methods like CutOut, Mixup and CutMix. Especially, "CutOut + ols" can get the same accuracy with "CutMix".
And we will attach your repo link in our README.

from onlinelabelsmoothing.

Kurumi233 commented on May 23, 2024

@zhangchbin
I am glad that you can accept my suggestion.
I think simple is the best, so I think your work is very interesting and I am happy to do this.

from onlinelabelsmoothing.

When will the code come? about onlinelabelsmoothing HOT 6 CLOSED

Comments (6)

Related Issues (8)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent