Traceback (most recent call last): | 1/87970 [00:00<8:35:35, 2.84it/s]

ValueError while using --optimize_on_cpu about transformers HOT 3 CLOSED

huggingface commented on May 1, 2024

ValueError while using --optimize_on_cpu

from transformers.

Comments (3)

thomwolf commented on May 1, 2024 2

Thanks! I pushed a fix for that, you can try it again. You should be able to increase a bit the batch size.

By the way, the real batch size that is used on the gpu is train_batch_size / gradient_accumulation_steps so 2 in your case. I think you should be able to go to 3 with --optimize_on_cpu

The recommended batch_size to get good results (EM, F1) with BERT large on SQuaD is 24. You can try the following possibilities to get to this batch_size:

keeping the same 'real batch size' that you currently have but just a bigger batch_size --train_batch_size 24 --gradient_accumulation_steps 12
trying a 'real batch size' of 3 with optimization on cpu --train_batch_size 24 --gradient_accumulation_steps 8 --optimize_on_cpu
switching to fp16 (implies optimization on cpu): --train_batch_size 24 --gradient_accumulation_steps 6 or 4 --fp16

If your GPU supports fp16, the last solution should be the fastest, otherwise the second should be the fastest. The first solution should work out-of-the box and give better results (EM, F1) but you won't have any speed-up.

from transformers.

thomwolf commented on May 1, 2024 1

Should be fixed now. Don't hesitate to re-open an issue if needed. Thanks for the feedback!

from transformers.

rsanjaykamath commented on May 1, 2024

Yes it works now!

With

--train_batch_size 24 --gradient_accumulation_steps 8 --optimize_on_cpu

I get {"exact_match": 83.78429517502366, "f1": 90.75733469379139} which is pretty close.

Thanks for this amazing work!

from transformers.

Recommend Projects

ValueError while using --optimize_on_cpu about transformers HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent