dvlab-research / mislas Goto Github PK
View Code? Open in Web Editor NEWImproving Calibration for Long-Tailed Recognition (CVPR2021)
License: MIT License
Improving Calibration for Long-Tailed Recognition (CVPR2021)
License: MIT License
Hi @zs-zhong , I was wondering weather you used class-balanced sampler in stage-2 ?
Hi @zs-zhong ,
Have you tried 90 epochs training with mixup on ImageNet or iNaturalist ?
I have made some improvements based on your work, but due to the lack of computing resources, training a model for 180/200 epochs is too time-consuming for me, especially for iNaturalist.
In my reproduction, under the condition of training 90 epochs with mixup (alpha 0.2) on ImageNet-LT, epochs of stage-2 is 10, the accuracy of methods with ResNet-50 are as follows:
Stage-1 | mixup | Stage-2 | cRT | LWS | |
---|---|---|---|---|---|
Reported in Decouple | 90 epochs | 10 epochs | 47.3 | 47.7 | |
My Reproduce | 90 epochs | 10 epochs | 48.7 | 49.3 | |
My Reproduce | 90 epochs | ✅ | 10 epochs | 47.6 | 47.4 |
My Reproduce | 180 epochs | 10 epochs | 51.0 | 51.8 | |
Reported in MiSLAS | 180 epochs | 10 epochs | 50.3 | 51.2 | |
Reported in MiSLAS | 180 epochs | ✅ | 10 epochs | 51.7 | 52.0 |
They look much worse than the model trained for 180 epochs with mixup, and it does not even have improvement compared to normal training.
I guess this is because mixup could be regarded as a regularization method, which requires longer training epochs, 90 epochs cannot make the network converge.
However, I cannot get the result of using mixup to train 90 epochs on the iNaturalist data set, because the iNaturalist data set is too large and I can't put it in the memory, which makes it take about a week for me to train R50 once.
If possible, could you please provide the pre-trained ResNet-50 model for training 90 epochs with mixup on iNaturalist? I believe this will also be beneficial for fair comparison of future work.
Thank you again for your contribution and look forward to your reply.
Hi Zhisheng,
Thanks for your great work! Figure 1 in your paper is impressive, can you please provide the code for drawing this figure?
Hi! Thanks for the great work. In issue#2, you mentioned that LWS fix the affine part(alpha, beta in the paper, as far as I understand) and update the running means and variances in Stage-2. Then I understand that LWS also uses shifted BN, however, in figure 4 there are differences in ACC, ECE between mixup+LWS and mixup+LWS+shifted BN.
What makes improvement in that experiment? Is there anything wrong with what I understand?
Hi @zs-zhong ,
Thanks for your great work! Figure 2 in your paper is also impressive, could you please provide the code for drawing this figure?
Hello Zhisheng,
The access to the models is restricted by Google Drive (picture below, in French, translated below the picture). Could you make the models accessible to everyone?
PS: I may have sent you access requests, sorry about that.
Robin
Authorization is required
You need to request owner access or sign in with an account that has the necessary permissions. Find out more
Hello:
In the paper, I think you mean nll_loss is only for the gt label and smooth_loss is for the remaining K-1 label.
But in the code
https://github.com/Jia-Research-Lab/MiSLAS/blob/e8f91e59a910c5543ea1bcabb955ba368c606a00/methods.py#L62
I think you still contain the gt label in the smooth_loss.
I am confusing about this.
Hi, thanks for your works. However, in your paper, the implementation of shift learning has not been described detail.
I guess that the BN parameters are re-trained in Stage-II, since the different means and variances. Is that true?
Hi! Thank you for such an inspiring work! Do you have any plan of releasing your code? I'm looking forward to that.
Plus, I have a small question regarding the method. In the paper you mentioned that when applying mixup in stage 2 yields no obvious improvement, but I cannot find a description of your overall method and I'd like to know in your final framework whether you use mixup in stage 1 only or in both stage 1&2. Thanks again!
I hope I am not wrong. In the code I am seeing that you are calculating test accuracy after every few training iterations and taking the max of them.
My question was
Hello Dr.Zhong, thank you for your excellent work. I'm very interested in what you mentioned in section 3.3
we update the running mean μ and variance σ and yet fix the learnable linear transformation parameters α and β for better normalization in Stage-2.
But, I cannot find the implementation in your code. If you are available, can you tell me the exact location?
Wish you good health and success in your studies!
Hi, thanks for your great work. I am wondering about the BN part, it seems that the methods like "cRT" and "DRW" do update the running mean and variances, right? I can not find the code segment which aims to freeze this part.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.