asappresearch / revisit-bert-finetuning Goto Github PK

View Code? Open in Web Editor NEW

184.0 184.0 14.0 367 KB

For the code release of our arXiv paper "Revisiting Few-sample BERT Fine-tuning" (https://arxiv.org/abs/2006.05987).

License: Other

Python 93.43% Shell 6.57%

revisit-bert-finetuning's People

Contributors

Stargazers

Watchers

Forkers

tiiiger gptcod xiaoanshi virattt d-mishra riyadhctg hellomlwo jmusiel ykumards raphael-sch ming-qin-tech hhkz stw2 nicecodeforked

revisit-bert-finetuning's Issues

BiasCorrection in AdamW

Great paper, thanks.

Quick question, am I correct to assume that the bias correction has been addressed in the transformers library? It seems that their new AdamW is taking an argument BiasCorrection, which evaluates to True by default, hence using BC.

Bug with exclude_last_group

The condition to trigger exclude_last_group is never True because "i" will always be smaller than len(self.param_groups). See #6

Fix README picture

"Re-initialization" is typed without the dash

Pre-trained Weight Decay

Hi, thanks for the great paper and implementation. I have a question regarding pre-trained weight decay.

Assume I don't want to use layerwise learning rate decay (args.layerwise_learning_rate_decay == 1.0), in get_optimizer_grouped_parameters I will get two parameter groups: decay and no decay. Hence in PriorWD all initial parameters will be recorded

self.prior_params = {}
        for i, group in enumerate(self.param_groups):
            for p in group["params"]:
                self.prior_params[id(p)] = p.detach().clone()

In this case, during updating, all parameters will have a weight decay towards the initial value.
However, I suppose that we don't want the classification layer to decay towards the init values, so it seems that we should really create a separate parameter group for the classification layer.

I was wondering whether my reasoning would make sense to you, or where did I get it wrong?

asappresearch / revisit-bert-finetuning Goto Github PK

revisit-bert-finetuning's People

Contributors

Stargazers

Watchers

Forkers

revisit-bert-finetuning's Issues

BiasCorrection in AdamW

Bug with exclude_last_group

Fix README picture

Pre-trained Weight Decay

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent