Hi, First, I think your paper is very interesting, excellent work!</

Can you provide the training used for training from scratch ? It seems that the

Training from scratch - log report about open-aff HOT 9 CLOSED

yimiandai commented on August 24, 2024

Training from scratch - log report

from open-aff.

Comments (9)

YimianDai commented on August 24, 2024

Hi, I have uploaded the training log from scratch with the prefix "From_Scratch". You can find them in the corresponding folders now.

The ResNet result is almost the same, but the ResNeXt result is around 0.6%~0.8% lower than training on pretrained models. I guess the reason is that training ResNeXt is more difficult than ResNet. Therefore, a pretrained model for ResNeXt helps a lot for a higher accuracy.

By the way, the accuracy on the paper is the accuracy training from scratch. I update the accuracy on paperswithcode website with the highest accuracy.

from open-aff.

moabarar commented on August 24, 2024

Thanks a lot for the quick reply!
Again - well done!

from open-aff.

sbl1996 commented on August 24, 2024

Can you provide the training script used for training from scratch ? It seems that the current provided training log from scratch does not use 'train_cifar.py' in this repo.

from open-aff.

YimianDai commented on August 24, 2024

Sorry. Can you explain a bit more? I do not understand what "the current provided training log from scratch does not use 'train_cifar.py' in this repo." means?

What I have in mind is that I just renamed train_cifar_mixup.py to train_cifar.py when I create this repo. Everything else is the same. It is because my private repo is very redundant and has many codes unrelated with this paper. Therefore, I created this new public repo when I release the code.

from open-aff.

sbl1996 commented on August 24, 2024

As shown in open-aff/params/cifar100/AFF-ResNet-32/From_Scratch_Log_train_cifar100_cifar100-ASKCFuse-resnet-32-c-4-s-1.log, the log line is "INFO:root:[Epoch 0] train=0.079624 val=0.107400 loss=3.099419 time: 20.976282". However, in train_cifar.py, the log format is [Epoch %d] train=%f val=%f loss=%f lr: %f time: %f. They are different.

from open-aff.

YimianDai commented on August 24, 2024

In train_cifar.py, you can choose cosine or step, not cosine only. It depends whether you add --cosine in the training script.

I started with step at first, the training from scratch log version. When I tried to use the pretrained model to have a higher accuracy, I added the cosine code because people say that cosine lr can improve the performance so I had a try. I guess it is the reason why you think my log format mismatch, because I change it when I add the cosine code. The current log format is the version after I add the cosine code.

Unfortunately, it seems that cosine didn't perform better than step on my cifar experiment. However, as long as the network is not changed, the network can use the stored params produced by both step and cosine.

from open-aff.

sbl1996 commented on August 24, 2024

Thank you, I have noticed --cosine option. Except for printing lr, are there any other differences between the training script you used for training from scratch and train_cifar.py?

from open-aff.

YimianDai commented on August 24, 2024

I think I only have made two changes on the code train_cifar.py when I started to run the very first experiment. The first is adding the label smoothing choice, the second is the adding cosine choice.

What I have in mind is that the training log for AFF-ResNeXt-38-32x4d was not trained with label smoothing because it is trained before I added the label smoothing choice. But the rest models are all with label smoothing.

Sorry for the inconvenience about the updates in my code. The ideal case is that I trained all these models after everything is fixed with the final version of code. But I have very limited access to GPUs, so I basically presented the training logs and params on the fly.

from open-aff.

sbl1996 commented on August 24, 2024

Thank you very much for your detailed response.

from open-aff.

Training from scratch - log report about open-aff HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent