Comments (9)
Apologies if I haven't totally understood your question.
From the blogpost:
"For all BERT-Base models, we chose the training duration to be 286,720,000 samples of sequence length 128; this covers 78.6% of C4."
To fully pretrain a model with 512 sequence length, you'll just need to follow our guide, but change the max_seq_len
param to 512.
Because of alibi, you can also start with a model trained with sequence length 128, and change max_seq_len
to 512 to adapt it.
from examples.
Thank you!
from examples.
Hi,
I have one follow-up question:
What do I have to consider regarding "global_train_batch_size" and "device_train_microbatch_size" if I want to train with sequence length of 512 instead of 128 tokens? If I leave everything as in the yamls/main/hf-bert-base-uncased.yaml file I probably get memory problems. Do you have any tips in this regard? Or even better: Do you have a yml for this case? I train on a Nvidia 8x80 GB A100.
Try and Error goes with me unfortunately badly, because I always have to wait quite long until I am on the GPU. Therefore the demand. Thanks a lot!
from examples.
global_train_batch_size
is an optimization related setting and you may or may not want to change it. If you increase the sequence length, you see more tokens per batch. device_train_microbatch_size
does not affect the math, and is only related to memory. I'm not sure what setting will work on the exact setup you describe, but you can try device_train_microbatch_size=auto
, which will determine it for you.
from examples.
Perfect, thank you for your quick response!
from examples.
I ran into another issue, sorry...
As mosaic-bert is not finetunable, I use the hf-bert. I follow the approach of the original BERT paper: Train 90% of the steps with a sequence length of 128 and 10% of the steps with a sequence length of 10%.
To accomplish this with your code, i run the "main" scirpt for pretraining twice. The first run completes without any issue. However, in the second run, when I load the previous checkpoint with "load_path" and change sequence length to 512, I get the following error:
ValueError: Reused local directory: ['/mnt/data/train'] vs ['/ mnt/data/train']. Provide a different one.
The data is stored locally. Do you have any idea why this error occurs?
Thank you very much!
from examples.
HI @FinTexIFB , what is your remote
and local
parameter looks like which you are passing to StreamingDataset
? Since your dataset resides locally, you can actually provide your local directory to local
parameter and remote=None
. For example, local='/mnt/data/train'
and remote=None
.
from examples.
Hi @karan6181,
thank you for your response. Yes, setting local='/mnt/data/train'
and remote=None
is exactly what I've done.
However, I found a workaround by simply creating a new container with the same mosaic docker image and installing all dependencies. Now it works, but only once. When I try to continue pre-training with an existing checkpoint afterwards I'll get the error. Maybe that is a bug
from examples.
@FinTexIFB, mosaic-bert is finetunable, as can be seen in this yaml. Does this work for your use case?
from examples.
Related Issues (20)
- Train BERT on own data HOT 3
- Can't save a trained model as a HuggingFace model HOT 5
- Accessing model after pre-training HOT 1
- Confusion regarding conflicting information in model card of "mosaic-bert" on Hugging Face HOT 2
- Explain composer logs emitted during training + Replicate Benchmark Results HOT 1
- Finetuning on windows machine HOT 4
- Finetuning script broken? HOT 4
- --concat_tokens flag in BERT pretraining HOT 2
- config class for bert is not consistent HOT 2
- Please bring code features from MPT-7b back to MPT-1b for use of MPT-1b with SFTTrainer.
- FlashAttention Triton error on the MosaicBERT models other than base HOT 3
- MosaicBERT: pretraining configuration for models > 128 seq. length HOT 5
- MosaicBERT: Convert composer weights to HF HOT 1
- Error when training with Mosaic-Bert
- How to add a custom key to config file?
- Matmul error when using output_all_encoded_layers = True, and pooler
- Can't install the requirements for mosaicml
- MosaicBert: Training stops after first evaluation pass with Flash Attention 2
- Finetuning Error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from examples.