Giter Site home page Giter Site logo

Comments (9)

dakinggg avatar dakinggg commented on September 15, 2024

Apologies if I haven't totally understood your question.

From the blogpost:
"For all BERT-Base models, we chose the training duration to be 286,720,000 samples of sequence length 128; this covers 78.6% of C4."

To fully pretrain a model with 512 sequence length, you'll just need to follow our guide, but change the max_seq_len param to 512.

Because of alibi, you can also start with a model trained with sequence length 128, and change max_seq_len to 512 to adapt it.

from examples.

mscherrmann avatar mscherrmann commented on September 15, 2024

Thank you!

from examples.

mscherrmann avatar mscherrmann commented on September 15, 2024

Hi,

I have one follow-up question:

What do I have to consider regarding "global_train_batch_size" and "device_train_microbatch_size" if I want to train with sequence length of 512 instead of 128 tokens? If I leave everything as in the yamls/main/hf-bert-base-uncased.yaml file I probably get memory problems. Do you have any tips in this regard? Or even better: Do you have a yml for this case? I train on a Nvidia 8x80 GB A100.

Try and Error goes with me unfortunately badly, because I always have to wait quite long until I am on the GPU. Therefore the demand. Thanks a lot!

from examples.

dakinggg avatar dakinggg commented on September 15, 2024

global_train_batch_size is an optimization related setting and you may or may not want to change it. If you increase the sequence length, you see more tokens per batch. device_train_microbatch_size does not affect the math, and is only related to memory. I'm not sure what setting will work on the exact setup you describe, but you can try device_train_microbatch_size=auto, which will determine it for you.

from examples.

mscherrmann avatar mscherrmann commented on September 15, 2024

Perfect, thank you for your quick response!

from examples.

mscherrmann avatar mscherrmann commented on September 15, 2024

I ran into another issue, sorry...

As mosaic-bert is not finetunable, I use the hf-bert. I follow the approach of the original BERT paper: Train 90% of the steps with a sequence length of 128 and 10% of the steps with a sequence length of 10%.

To accomplish this with your code, i run the "main" scirpt for pretraining twice. The first run completes without any issue. However, in the second run, when I load the previous checkpoint with "load_path" and change sequence length to 512, I get the following error:

ValueError: Reused local directory: ['/mnt/data/train'] vs ['/ mnt/data/train']. Provide a different one.

The data is stored locally. Do you have any idea why this error occurs?

Thank you very much!

from examples.

karan6181 avatar karan6181 commented on September 15, 2024

HI @FinTexIFB , what is your remote and local parameter looks like which you are passing to StreamingDataset ? Since your dataset resides locally, you can actually provide your local directory to local parameter and remote=None. For example, local='/mnt/data/train' and remote=None.

from examples.

mscherrmann avatar mscherrmann commented on September 15, 2024

Hi @karan6181,

thank you for your response. Yes, setting local='/mnt/data/train' and remote=None is exactly what I've done.

However, I found a workaround by simply creating a new container with the same mosaic docker image and installing all dependencies. Now it works, but only once. When I try to continue pre-training with an existing checkpoint afterwards I'll get the error. Maybe that is a bug

from examples.

jacobfulano avatar jacobfulano commented on September 15, 2024

@FinTexIFB, mosaic-bert is finetunable, as can be seen in this yaml. Does this work for your use case?

from examples.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.