Hi! Is it normal that I am out of gpu memory when training the en-fr dataset on a

I successfully ran the for 27 epochs with the above booleans set to True mentio

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

CUDA Out of Memory about unsupervisedmt HOT 5 CLOSED

facebookresearch commented on July 19, 2024

CUDA Out of Memory

from unsupervisedmt.

Comments (5)

ptamas88 commented on July 19, 2024 3

I successfully ran the script for 27 epochs with the above booleans set to True mentioned by @glample , and set --n_enc_layers 3 --n_dec_layers 3 --share_enc 2 --share_dec 2, and also --max_len=100.
Thanks for the help.

from unsupervisedmt.

ptamas88 commented on July 19, 2024 1

Hello @glample ,

I have 6core(12 threads) intel xeon E5-1650 cpu with 64 gb ddr4 ram, and 2 gtx 1080Ti. As i have read from an other issue the code only uses single gpu so it doesnt matter.
software side: ubuntu 16.04, cuda 9, pytorch 0.4.1, python 3.5.2.
The training command is almost the same as in the readme, only thing I changed is the --otf_num_processes = 12 because of the cpu.

python3 main.py --exp_name test --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 3 --share_dec 3 --share_lang_emb True --share_output_emb True --langs 'en,fr' --n_mono -1 --mono_dataset 'en:./data/mono/all.en.tok.60000.pth,,;fr:./data/mono/all.fr.tok.60000.pth,,' --para_dataset 'en-fr:,./data/para/dev/newstest2013-ref.XX.60000.pth,./data/para/dev/newstest2014-fren-src.XX.60000.pth' --mono_directions 'en,fr' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'fr-en-fr,en-fr-en' --pretrained_emb './data/mono/all.en-fr.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 12 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_en_fr_valid,10

Since I asked the question I tried an other run with decreasing the encoder,decoder and shared layers too by 1. To this:
--n_enc_layers 3 --n_dec_layers 3 --share_enc 2 --share_dec 2
It's running, now at 3150 steps, and the gpu ram increased from the 8000 MB to 11063 MB since the start.
If it fails, I will try with sharing all the layers.
Thanks for the tips, I will update when have any more info.
PS: Should I expect far worse results with the decreased layer number?

from unsupervisedmt.

glample commented on July 19, 2024

Hi,

What configuration are you using? What is your training command?
11GB should be fine, but I have never tried on 1080Ti so I'm not sure. To reduce the memory usage you can either:

share more layers (sharing everything is usually fine)
use less layers
set a smaller maximum sentence length during training, i.e. --max_len 100 for instance (this is probably the best solution with sharing more layers)

from unsupervisedmt.

glample commented on July 19, 2024

I'm not sure, I never tried with 3 layers. 4 and 5 gave roughly the same results though. Another solution is to use 40k BPE instead of 60k, but 60k was usually giving slightly better results.

from unsupervisedmt.

glample commented on July 19, 2024

Actually there are other things you can share that will significantly reduce the memory usage:
share_encdec_emb to share the encoder and decoder input embeddings
share_decpro_emb to share the decoder input and output embeddings
share_output_emb to share the source and target output embeddings in the decoder
All these parameters are by default set to False, and on the language pairs I tried setting any of them to True didn't give very different results.

from unsupervisedmt.

CUDA Out of Memory about unsupervisedmt HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent