Giter Site home page Giter Site logo

Comments (9)

YimianDai avatar YimianDai commented on August 24, 2024

Hi, I have uploaded the training log from scratch with the prefix "From_Scratch". You can find them in the corresponding folders now.

The ResNet result is almost the same, but the ResNeXt result is around 0.6%~0.8% lower than training on pretrained models. I guess the reason is that training ResNeXt is more difficult than ResNet. Therefore, a pretrained model for ResNeXt helps a lot for a higher accuracy.

By the way, the accuracy on the paper is the accuracy training from scratch. I update the accuracy on paperswithcode website with the highest accuracy.

from open-aff.

moabarar avatar moabarar commented on August 24, 2024

Thanks a lot for the quick reply!
Again - well done!

from open-aff.

sbl1996 avatar sbl1996 commented on August 24, 2024

Can you provide the training script used for training from scratch ? It seems that the current provided training log from scratch does not use 'train_cifar.py' in this repo.

from open-aff.

YimianDai avatar YimianDai commented on August 24, 2024

Sorry. Can you explain a bit more? I do not understand what "the current provided training log from scratch does not use 'train_cifar.py' in this repo." means?

What I have in mind is that I just renamed train_cifar_mixup.py to train_cifar.py when I create this repo. Everything else is the same. It is because my private repo is very redundant and has many codes unrelated with this paper. Therefore, I created this new public repo when I release the code.

from open-aff.

sbl1996 avatar sbl1996 commented on August 24, 2024

As shown in open-aff/params/cifar100/AFF-ResNet-32/From_Scratch_Log_train_cifar100_cifar100-ASKCFuse-resnet-32-c-4-s-1.log, the log line is "INFO:root:[Epoch 0] train=0.079624 val=0.107400 loss=3.099419 time: 20.976282". However, in train_cifar.py, the log format is [Epoch %d] train=%f val=%f loss=%f lr: %f time: %f. They are different.

from open-aff.

YimianDai avatar YimianDai commented on August 24, 2024

In train_cifar.py, you can choose cosine or step, not cosine only. It depends whether you add --cosine in the training script.

I started with step at first, the training from scratch log version. When I tried to use the pretrained model to have a higher accuracy, I added the cosine code because people say that cosine lr can improve the performance so I had a try. I guess it is the reason why you think my log format mismatch, because I change it when I add the cosine code. The current log format is the version after I add the cosine code.

Unfortunately, it seems that cosine didn't perform better than step on my cifar experiment. However, as long as the network is not changed, the network can use the stored params produced by both step and cosine.

from open-aff.

sbl1996 avatar sbl1996 commented on August 24, 2024

Thank you, I have noticed --cosine option. Except for printing lr, are there any other differences between the training script you used for training from scratch and train_cifar.py?

from open-aff.

YimianDai avatar YimianDai commented on August 24, 2024

I think I only have made two changes on the code train_cifar.py when I started to run the very first experiment. The first is adding the label smoothing choice, the second is the adding cosine choice.

What I have in mind is that the training log for AFF-ResNeXt-38-32x4d was not trained with label smoothing because it is trained before I added the label smoothing choice. But the rest models are all with label smoothing.

Sorry for the inconvenience about the updates in my code. The ideal case is that I trained all these models after everything is fixed with the final version of code. But I have very limited access to GPUs, so I basically presented the training logs and params on the fly.

from open-aff.

sbl1996 avatar sbl1996 commented on August 24, 2024

Thank you very much for your detailed response.

from open-aff.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.