Giter Site home page Giter Site logo

Comments (7)

glample avatar glample commented on July 19, 2024

Hi,

What datasets did you use for English-German? How many sentences do you have?
Can you provide your full training log?

from unsupervisedmt.

tinyka avatar tinyka commented on July 19, 2024

@glample
Here is the script I generate dataset:
I used wmt14, wmt15, wmt16 and wmt17 monolingual dataset, and randomly sampled 10M sentences:

wget -c http://www.statmt.org/wmt15/training-monolingual-news-crawl-v2/news.2014.en.shuffled.v2.gz
wget -c http://data.statmt.org/wmt16/translation-task/news.2015.en.shuffled.gz
wget -c http://data.statmt.org/wmt17/translation-task/news.2016.en.shuffled.gz
wget -c http://data.statmt.org/wmt18/translation-task/news.2017.en.shuffled.deduped.gz

wget -c http://www.statmt.org/wmt15/training-monolingual-news-crawl-v2/news.2014.de.shuffled.v2.gz
wget -c http://data.statmt.org/wmt16/translation-task/training-monolingual-news-crawl.tgz
wget -c http://data.statmt.org/wmt17/translation-task/news.2016.de.shuffled.gz
wget -c http://data.statmt.org/wmt17/translation-task/training-monolingual-news-crawl.tgz

and here is my full training log:
https://gist.github.com/tinyka/4388e8859a7f613be4b72db00e248bf2

from unsupervisedmt.

glample avatar glample commented on July 19, 2024

Hi,

Sorry for the delay on this, I'm not sure what is going on here. I was just able to reproduce the En-De results with my previous code, I didn't try the open source one on the German-English data (I only checked that En-Fr was working properly), I'll give it a try now and get back to you.

from unsupervisedmt.

glample avatar glample commented on July 19, 2024

For now, the results seem consistent with what I had before. After epoch 43, on the test set I have: 19.99 BLEU on de->en and 15.64 on en->de.

Regarding your hyper-parameters, I would suggest using 5 layers, but especially sharing all of them:
--n_enc_layers 5 --n_dec_layers 5 --share_enc 5 --share_dec 5. The number of layers doesn't seem to matter much, but sharing all of them here seems to make a non-negligible difference.

Freezing the decoder embeddings (freeze_enc_emb=True), and not the encoder ones (freeze_enc_emb=False) seems to be a bit better than not freezing anything (but overall it doesn't make a big difference).

Maybe also use a --word_blank 0.1 instead of 0.2. It's also a negligible difference, but apart from that I basically have the same hyper-parameters.

Also, it takes a few days for the model to get above 21 BLEU, even if the curve is pretty flat around 19 / 20. But in your case clearly the model seems to have converged. I would suggest trying with these hyper-parameters changes above. Also, I took the first 10M lines of:

wget -c http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2007.de.shuffled.gz
wget -c http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2008.de.shuffled.gz
wget -c http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2009.de.shuffled.gz
wget -c http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2010.de.shuffled.gz

but the domain should be almost identical (actually your corpora should even have a domain closer to newstest2016) so I don't expect this to make a difference.

from unsupervisedmt.

tinyka avatar tinyka commented on July 19, 2024

@glample Thank you so much for your reply.
I will rerun the code according to your recommend settings.

from unsupervisedmt.

WinnieHAN avatar WinnieHAN commented on July 19, 2024

Hi, I also met the same reproduction problem of de-en/en-de. I followed the settings and get 12.21 for en-de and 8.03 for de-en.

My command is:
python main.py --exp_name 'de-en-55' --transformer 'True' --n_enc_layers '5' --n_dec_layers '5' --share_enc '5' --share_dec '5' --share_lang_emb 'True' --share_output_emb 'True' --langs 'de,en' --n_mono '-1' --mono_dataset 'en:./data/mono-de-en/all.en.tok.60000.pth,,;de:./data/mono-de-en/all.de.tok.60000.pth,,' --para_dataset 'de-en:,./data/para-de-en/dev/newstest2016-ende-src.XX.60000.pth,./data/para-de-en/dev/newstest2015-ende-ref.XX.60000.pth' --mono_directions 'de,en' --word_shuffle '3' --word_dropout '0.1' --word_blank '0.1' --pivo_directions 'de-en-de,en-de-en' --pretrained_emb './data/mono-de-en/all.en-de.60000.vec' --pretrained_out 'True' --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd '1' --otf_num_processes '30' --otf_sync_params_every '1000' --enc_optimizer 'adam,lr=0.0001' --epoch_size '500000' --stopping_criterion 'bleu_de_en_valid,10' --freeze_dec_emb 'True' --freeze_enc_emb 'False' --exp_id "7fvh95gae7"

The detailed result after 40 epochs is as followed:

40 bleu_en_de_test 11.49 bleu_de_en_test 7.49
41 bleu_en_de_test 12.09 bleu_de_en_test 8.03
42 bleu_en_de_test 12.26 bleu_de_en_test 7.25
43 bleu_en_de_test 11.78 bleu_de_en_test 7.64
44 bleu_en_de_test 11.76 bleu_de_en_test 7.57
45 bleu_en_de_test 12.19 bleu_de_en_test 7.76
46 bleu_en_de_test 12.19 bleu_de_en_test 7.69
47 bleu_en_de_test 12.09 bleu_de_en_test 7.74
48 bleu_en_de_test 11.72 bleu_de_en_test 7.81
49 bleu_en_de_test 12.23 bleu_de_en_test 7.47
50 bleu_en_de_test 12.06 bleu_de_en_test 7.89
51 bleu_en_de_test 12.29 bleu_de_en_test 7.89
52 bleu_en_de_test 12.21 bleu_de_en_test 7.88

Is there any setup that I misunderstand?

from unsupervisedmt.

StillKeepTry avatar StillKeepTry commented on July 19, 2024

@tinyka Did you reproduce the result of ende translation task now?

from unsupervisedmt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.