Giter Site home page Giter Site logo

Comments (8)

TDeVries avatar TDeVries commented on July 20, 2024

Hi, I went through my code and logs for the STL-10 experiments and found two things:

  1. In the paper I stated the patch size used for STL-10 to be 24 for the no data augmentation case and 32 with data augmentation. On looking at my logs, it appears that the values used were actually 48 and 60, respectively.

  2. It appears that I accidentally used the normalization parameters from CIFAR-10 for normalizing STL-10 instead of calculating the new mean and std. While the CIFAR-10 values are pretty close to what you used, it could still have caused some non-negligible change in model performance, especially considering the small train set size of STL-10. So that being said, these test results should not be compared to other STL-10 results that normalize the dataset properly. You could try substituting in the CIFAR-10 normalization values to your pipeline to see if it increases the score at all, since that may be what is causing the difference.

Let me know if those changes allow you to reproduce the results, otherwise we can look into it further.

from cutout.

xyzacademic avatar xyzacademic commented on July 20, 2024

I find the reason is I use FP16(even though BN are float32) rather than FP32. When I use float32, final error rate is about 12+. But I have no idea why the gap is so big. Since I also use FP16 on cifar10 experiment, the result is the same as you post.

from cutout.

TDeVries avatar TDeVries commented on July 20, 2024

Okay, cool.

I've tried messing around with the FP16 in PyTorch before, but it seems very finicky whenever using it with batchnorm. Strange that it works for CIFAR-10 but not STL-10.

from cutout.

 avatar commented on July 20, 2024

Hi @TDeVries and @xyzacademic, I'm trying to reproduce the result for STL10 with no data augmentation nor cutout. I adapt the setting described in the paper with the changes mentioned above. However, I cannot get the errors 23.48% ± 0.68% presented in the paper. Instead, I got errors about 30%. The test errors get stuck to 30% after epoch 400 with perturbation within 1%, while training errors stays less than 0.1% and xentropy < 0.01. Could you help me on reproducing this, or possibly upload your code? More specifically, I set parameters as follows.

image size = 48, normalization mean = [0.44671097, 0.4398105 , 0.4066468], std = [0.2603405 , 0.25657743, 0.27126738], wide resnet depth = 16, widenn factor = 8, dropRate=0.3, inital learning rate = 0.1, momentum=0.9, weight_decay=5e-4, , number of epochs = 1000, date type = FP32, learning rate scheduler = MultiStepLR(cnn_optimizer, milestones=[300, 400, 600, 800], gamma=0.2)

from cutout.

TDeVries avatar TDeVries commented on July 20, 2024

Could it be image size? You said you are using 48x48 resolution. For the results in the paper I used the original image size of 96x96.

from cutout.

 avatar commented on July 20, 2024

@TDeVries Thanks! I'll give it a try. Just to confirm that with the image size changed (from 32 to 96 compare to the published code), do you keep nChannels and avg_pool kernel size unchanged, but input dimension of the fully connected layer increased by a factor of 3*3 = 9? Or you increase the avg_pool kernel size from 8 to 24?

from cutout.

TDeVries avatar TDeVries commented on July 20, 2024

nChannels is unchanged. I think the main differences are that I changed the stride in block1 from 1 to 2, and the avg_pool kernel size from 8 to 12.

self.block1 = NetworkBlock(n, nChannels[0], nChannels[1], block, 2, dropRate)
...
out = F.avg_pool2d(out, 12)
Hopefully that should give the correct output size.

Another thing you could try to improve results is to increase the dropout probability from 0.3 to 0.5. I'm not sure how much of an effect it has though.

from cutout.

 avatar commented on July 20, 2024

Thanks for your advice. FYI: I am actually trying to test my hyperparameter optimization algorithm on this problem :)

from cutout.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.