Giter Site home page Giter Site logo

Comments (4)

OzoneReloaded avatar OzoneReloaded commented on May 19, 2024

Hello! I've managed to run finetuning on 11 gb GPU with:

gpt_options="
--hidden-size 1024
--seq-length 1024
--cpu-optimizer
--cpu_torch_adam
"

Hope it helps. @Rai220

from ru-gpts.

fen0s avatar fen0s commented on May 19, 2024

I have 16Gb GPU and get CUDA out of memory error (for batch size = 1!):

RuntimeError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 14.76 GiB total capacity; 13.25 GiB already allocated; 21.44 MiB free; 13.84 GiB reserved in total by PyTorch)

Is this memory really not enough to train the large version? May be there is some tips to reduce memory using on pretraining? I using such list of parameters:

    --per_gpu_train_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --overwrite_cache \
    --num_train_epochs 2 \
    --save_steps 1000 \
    --block_size 256 \
    --fp16

Apparently, optimization level of O3 helps, but I haven't quite figured out how to make it generate samples, it just outputs negative probability for some reason. The above answer is for GPT-3 large, not GPT-2 large, so...

from ru-gpts.

fen0s avatar fen0s commented on May 19, 2024

Basically what's needed is gradient checkpointing that was provided in one of transformers library versions. Not sure if I can implement it, especially considering that old versions of transformers library is used in here...

from ru-gpts.

TatianaShavrina avatar TatianaShavrina commented on May 19, 2024

Hey @Rai220 @fen0s The organizers gave participants the opportunity to get access to Cristofari. To get access, please send to [email protected] your request with brief information about your project. We will review your request and get back to you. Please note that the number of such accesses is limited. If necessary, please leave your request as early as possible.

from ru-gpts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.