Giter Site home page Giter Site logo

Comments (5)

ptamas88 avatar ptamas88 commented on July 19, 2024 3

I successfully ran the script for 27 epochs with the above booleans set to True mentioned by @glample , and set --n_enc_layers 3 --n_dec_layers 3 --share_enc 2 --share_dec 2, and also --max_len=100.
Thanks for the help.

from unsupervisedmt.

ptamas88 avatar ptamas88 commented on July 19, 2024 1

Hello @glample ,

I have 6core(12 threads) intel xeon E5-1650 cpu with 64 gb ddr4 ram, and 2 gtx 1080Ti. As i have read from an other issue the code only uses single gpu so it doesnt matter.
software side: ubuntu 16.04, cuda 9, pytorch 0.4.1, python 3.5.2.
The training command is almost the same as in the readme, only thing I changed is the --otf_num_processes = 12 because of the cpu.

python3 main.py --exp_name test --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 3 --share_dec 3 --share_lang_emb True --share_output_emb True --langs 'en,fr' --n_mono -1 --mono_dataset 'en:./data/mono/all.en.tok.60000.pth,,;fr:./data/mono/all.fr.tok.60000.pth,,' --para_dataset 'en-fr:,./data/para/dev/newstest2013-ref.XX.60000.pth,./data/para/dev/newstest2014-fren-src.XX.60000.pth' --mono_directions 'en,fr' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'fr-en-fr,en-fr-en' --pretrained_emb './data/mono/all.en-fr.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 12 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_en_fr_valid,10

Since I asked the question I tried an other run with decreasing the encoder,decoder and shared layers too by 1. To this:
--n_enc_layers 3 --n_dec_layers 3 --share_enc 2 --share_dec 2
It's running, now at 3150 steps, and the gpu ram increased from the 8000 MB to 11063 MB since the start.
If it fails, I will try with sharing all the layers.
Thanks for the tips, I will update when have any more info.
PS: Should I expect far worse results with the decreased layer number?

from unsupervisedmt.

glample avatar glample commented on July 19, 2024

Hi,

What configuration are you using? What is your training command?
11GB should be fine, but I have never tried on 1080Ti so I'm not sure. To reduce the memory usage you can either:

  • share more layers (sharing everything is usually fine)
  • use less layers
  • set a smaller maximum sentence length during training, i.e. --max_len 100 for instance (this is probably the best solution with sharing more layers)

from unsupervisedmt.

glample avatar glample commented on July 19, 2024

I'm not sure, I never tried with 3 layers. 4 and 5 gave roughly the same results though. Another solution is to use 40k BPE instead of 60k, but 60k was usually giving slightly better results.

from unsupervisedmt.

glample avatar glample commented on July 19, 2024

Actually there are other things you can share that will significantly reduce the memory usage:
share_encdec_emb to share the encoder and decoder input embeddings
share_decpro_emb to share the decoder input and output embeddings
share_output_emb to share the source and target output embeddings in the decoder
All these parameters are by default set to False, and on the language pairs I tried setting any of them to True didn't give very different results.

from unsupervisedmt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.