Comments (2)
Hi,
Thanks for the feedback, it's always interesting to compare the various possible ways to train the model indeed.
The most likely cause for (2) is that MRPC is a small dataset and the model shows a high variance in the results depending on the initialization of the weights for example (see the original BERT repo on that also). The distributed and multi-gpu setups probably do not use the random generators in the exact same order which lead to different initializations.
You can have an intuition of that by training with different seeds, you will see there is easily a 10% variation in the final accuracy...
If you can do that, a better way to compare the results would thus be to take something like 10 different seeds for each training condition and compare the mean and standard deviation of the results.
from transformers.
Thanks for your feedback!
After some investigations, it looks like t_total
is not set properly for distributed training in BertAdam. The actual t_total
per distributed worker should be divided by the worker count.
I have included the following fix in my PR #58
t_total = num_train_steps
if args.local_rank != -1:
t_total = t_total // torch.distributed.get_world_size()
optimizer = BertAdam(optimizer_grouped_parameters,
lr=args.learning_rate,
warmup=args.warmup_proportion,
t_total=t_total)
from transformers.
Related Issues (20)
- Wrong calculation of the step size for the overlapping inference in the distill whisper model
- Loading directly 4bit quantized model HOT 5
- How can we resume training from lora model? HOT 1
- model.save_pretrained fails with error when using Pytorch XLA HOT 5
- Non-test related errors encountered while following PR testing documentation. HOT 3
- Gemma optimizations for finetuning and infernece HOT 8
- `LlamaTokenizerFast` wrong `word_id` references based on batch encoding
- DPO Trainer Crashes on multi-gpu setup!
- Attention sliding window implementation HOT 2
- `add_prefix_space` won't be respected by Llama tokenizer
- Inconsistency behaviours between LlamaTokenizer and LlamaTokenizerFast when lstrip=True HOT 1
- openai/whisper-large-v2 prompt
- Video-LLaVA with Transformers HOT 2
- Conversion Script for Mamba checkpoints (`mamba_ssm` -> `transformers`) HOT 5
- Termios error: (25, 'Inappropriate ioctl for device') HOT 2
- Unexpected behaviour of logit processor during beam search generation in Flax HOT 1
- Why is the default `use_reentrant=True` when no kwargs were set? HOT 2
- Video-LLaVA with transformers library HOT 2
- Llama runs very slow in 4.38.2 HOT 5
- Referencing #29470 Unable to Create Bert Encodings HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.