Comments (7)
Hi,
What datasets did you use for English-German? How many sentences do you have?
Can you provide your full training log?
from unsupervisedmt.
@glample
Here is the script I generate dataset:
I used wmt14, wmt15, wmt16 and wmt17 monolingual dataset, and randomly sampled 10M sentences:
wget -c http://www.statmt.org/wmt15/training-monolingual-news-crawl-v2/news.2014.en.shuffled.v2.gz
wget -c http://data.statmt.org/wmt16/translation-task/news.2015.en.shuffled.gz
wget -c http://data.statmt.org/wmt17/translation-task/news.2016.en.shuffled.gz
wget -c http://data.statmt.org/wmt18/translation-task/news.2017.en.shuffled.deduped.gz
wget -c http://www.statmt.org/wmt15/training-monolingual-news-crawl-v2/news.2014.de.shuffled.v2.gz
wget -c http://data.statmt.org/wmt16/translation-task/training-monolingual-news-crawl.tgz
wget -c http://data.statmt.org/wmt17/translation-task/news.2016.de.shuffled.gz
wget -c http://data.statmt.org/wmt17/translation-task/training-monolingual-news-crawl.tgz
and here is my full training log:
https://gist.github.com/tinyka/4388e8859a7f613be4b72db00e248bf2
from unsupervisedmt.
Hi,
Sorry for the delay on this, I'm not sure what is going on here. I was just able to reproduce the En-De results with my previous code, I didn't try the open source one on the German-English data (I only checked that En-Fr was working properly), I'll give it a try now and get back to you.
from unsupervisedmt.
For now, the results seem consistent with what I had before. After epoch 43, on the test set I have: 19.99 BLEU on de->en and 15.64 on en->de.
Regarding your hyper-parameters, I would suggest using 5 layers, but especially sharing all of them:
--n_enc_layers 5 --n_dec_layers 5 --share_enc 5 --share_dec 5
. The number of layers doesn't seem to matter much, but sharing all of them here seems to make a non-negligible difference.
Freezing the decoder embeddings (freeze_enc_emb=True
), and not the encoder ones (freeze_enc_emb=False
) seems to be a bit better than not freezing anything (but overall it doesn't make a big difference).
Maybe also use a --word_blank 0.1
instead of 0.2
. It's also a negligible difference, but apart from that I basically have the same hyper-parameters.
Also, it takes a few days for the model to get above 21 BLEU, even if the curve is pretty flat around 19 / 20. But in your case clearly the model seems to have converged. I would suggest trying with these hyper-parameters changes above. Also, I took the first 10M lines of:
wget -c http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2007.de.shuffled.gz
wget -c http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2008.de.shuffled.gz
wget -c http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2009.de.shuffled.gz
wget -c http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2010.de.shuffled.gz
but the domain should be almost identical (actually your corpora should even have a domain closer to newstest2016) so I don't expect this to make a difference.
from unsupervisedmt.
@glample Thank you so much for your reply.
I will rerun the code according to your recommend settings.
from unsupervisedmt.
Hi, I also met the same reproduction problem of de-en/en-de. I followed the settings and get 12.21 for en-de and 8.03 for de-en.
My command is:
python main.py --exp_name 'de-en-55' --transformer 'True' --n_enc_layers '5' --n_dec_layers '5' --share_enc '5' --share_dec '5' --share_lang_emb 'True' --share_output_emb 'True' --langs 'de,en' --n_mono '-1' --mono_dataset 'en:./data/mono-de-en/all.en.tok.60000.pth,,;de:./data/mono-de-en/all.de.tok.60000.pth,,' --para_dataset 'de-en:,./data/para-de-en/dev/newstest2016-ende-src.XX.60000.pth,./data/para-de-en/dev/newstest2015-ende-ref.XX.60000.pth' --mono_directions 'de,en' --word_shuffle '3' --word_dropout '0.1' --word_blank '0.1' --pivo_directions 'de-en-de,en-de-en' --pretrained_emb './data/mono-de-en/all.en-de.60000.vec' --pretrained_out 'True' --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd '1' --otf_num_processes '30' --otf_sync_params_every '1000' --enc_optimizer 'adam,lr=0.0001' --epoch_size '500000' --stopping_criterion 'bleu_de_en_valid,10' --freeze_dec_emb 'True' --freeze_enc_emb 'False' --exp_id "7fvh95gae7"
The detailed result after 40 epochs is as followed:
40 bleu_en_de_test 11.49 bleu_de_en_test 7.49
41 bleu_en_de_test 12.09 bleu_de_en_test 8.03
42 bleu_en_de_test 12.26 bleu_de_en_test 7.25
43 bleu_en_de_test 11.78 bleu_de_en_test 7.64
44 bleu_en_de_test 11.76 bleu_de_en_test 7.57
45 bleu_en_de_test 12.19 bleu_de_en_test 7.76
46 bleu_en_de_test 12.19 bleu_de_en_test 7.69
47 bleu_en_de_test 12.09 bleu_de_en_test 7.74
48 bleu_en_de_test 11.72 bleu_de_en_test 7.81
49 bleu_en_de_test 12.23 bleu_de_en_test 7.47
50 bleu_en_de_test 12.06 bleu_de_en_test 7.89
51 bleu_en_de_test 12.29 bleu_de_en_test 7.89
52 bleu_en_de_test 12.21 bleu_de_en_test 7.88
Is there any setup that I misunderstand?
from unsupervisedmt.
@tinyka Did you reproduce the result of ende translation task now?
from unsupervisedmt.
Related Issues (20)
- why MemoryError
- Why codes file is empty.? HOT 4
- for different language, where to make change?
- How to train NMT + PBSMT ?
- UnboundLocalError: local variable 'n_words' referenced before assignment
- About number of shared layers
- RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [14, 32, 1536]], which is output 0 of AddBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). HOT 1
- How to run PBSMT +NMT ?
- transformer multihead attention scaling layer error
- Setting the random seed does not result in same outputs across runs
- I have trouble when run get_data_enfr.sh
- How can I modify the code to train may own dataset on specific language?
- Low utilization rate of cuda HOT 1
- How to train the vector of phrases
- Low BLEU on PBSMT HOT 3
- bpe_end issue
- Getting raise EOFError() while executing Linux Command through Netmiko
- How i can run MUSE alignment in .sh
- How to train the model without para_dataset
- Error in runny bash command. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from unsupervisedmt.