I have been exploring the Mosaic-BERT model and I noticed that it is trained on a sequ

HI @FinTexIFB , what is your remote and <code class="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

@FinTexIFB, mosaic-bert is finetunable, as can be seen in <a href="https://github.com/

Inquiry about Mosaic-BERT and BERT-Base Sequence Lengths about examples HOT 9 CLOSED

mscherrmann commented on September 15, 2024

Inquiry about Mosaic-BERT and BERT-Base Sequence Lengths

from examples.

Comments (9)

dakinggg commented on September 15, 2024

Apologies if I haven't totally understood your question.

From the blogpost:
"For all BERT-Base models, we chose the training duration to be 286,720,000 samples of sequence length 128; this covers 78.6% of C4."

To fully pretrain a model with 512 sequence length, you'll just need to follow our guide, but change the max_seq_len param to 512.

Because of alibi, you can also start with a model trained with sequence length 128, and change max_seq_len to 512 to adapt it.

from examples.

mscherrmann commented on September 15, 2024

Thank you!

from examples.

mscherrmann commented on September 15, 2024

Hi,

I have one follow-up question:

What do I have to consider regarding "global_train_batch_size" and "device_train_microbatch_size" if I want to train with sequence length of 512 instead of 128 tokens? If I leave everything as in the yamls/main/hf-bert-base-uncased.yaml file I probably get memory problems. Do you have any tips in this regard? Or even better: Do you have a yml for this case? I train on a Nvidia 8x80 GB A100.

Try and Error goes with me unfortunately badly, because I always have to wait quite long until I am on the GPU. Therefore the demand. Thanks a lot!

from examples.

dakinggg commented on September 15, 2024

global_train_batch_size is an optimization related setting and you may or may not want to change it. If you increase the sequence length, you see more tokens per batch. device_train_microbatch_size does not affect the math, and is only related to memory. I'm not sure what setting will work on the exact setup you describe, but you can try device_train_microbatch_size=auto, which will determine it for you.

from examples.

mscherrmann commented on September 15, 2024

Perfect, thank you for your quick response!

from examples.

mscherrmann commented on September 15, 2024

I ran into another issue, sorry...

As mosaic-bert is not finetunable, I use the hf-bert. I follow the approach of the original BERT paper: Train 90% of the steps with a sequence length of 128 and 10% of the steps with a sequence length of 10%.

To accomplish this with your code, i run the "main" scirpt for pretraining twice. The first run completes without any issue. However, in the second run, when I load the previous checkpoint with "load_path" and change sequence length to 512, I get the following error:

ValueError: Reused local directory: ['/mnt/data/train'] vs ['/ mnt/data/train']. Provide a different one.

The data is stored locally. Do you have any idea why this error occurs?

Thank you very much!

from examples.

karan6181 commented on September 15, 2024

HI @FinTexIFB , what is your remote and local parameter looks like which you are passing to StreamingDataset ? Since your dataset resides locally, you can actually provide your local directory to local parameter and remote=None. For example, local='/mnt/data/train' and remote=None.

from examples.

mscherrmann commented on September 15, 2024

Hi @karan6181,

thank you for your response. Yes, setting local='/mnt/data/train' and remote=None is exactly what I've done.

However, I found a workaround by simply creating a new container with the same mosaic docker image and installing all dependencies. Now it works, but only once. When I try to continue pre-training with an existing checkpoint afterwards I'll get the error. Maybe that is a bug

from examples.

jacobfulano commented on September 15, 2024

@FinTexIFB, mosaic-bert is finetunable, as can be seen in this yaml. Does this work for your use case?

from examples.

Inquiry about Mosaic-BERT and BERT-Base Sequence Lengths about examples HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent