Comments (18)
Hi, can you try to follow exactly the same parameters I give here: #34 (comment)
To be honest I'm not sure about the difference. Did you train your cross-lingual BPE embeddings using the same parameters?
from unsupervisedmt.
Also, were you able to reproduce the results on English-French? I would like to understand why it seems easier to reproduce the En-Fr results than the En-De ones.
from unsupervisedmt.
@glample Could you help me to examine the experiment setting of reproducing English-German? I have successfully reproduced the result of en-fr but failed in en-de task.
Here is the link which is the data preprocessing script i used for english-german.
And this is the script i used for training:
MODEL_PATH=models
TASK=English-German
EXP=baseline
PRETRAINED='./en-de/data/all.en-de.60000.vec'
MONO_DATASET='en:./en-de/data/all.en.tok.60000.pth,,;de:./en-de/data/all.de.tok.60000.pth,,'
PARA_DATASET='de-en:,./en-de/English-German/dev/newstest2015-deen-ref.XX.60000.pth,./en-de/English-German/dev/newstest2016-ende-src.XX.60000.pth'
mkdir -p $MODEL_PATH/$TASK/test/$EXP
CUDA_VISIBLE_DEVICES=0 python main.py --exp_name test \
--dump_path $MODEL_PATH/$TASK \
--transformer True \
--n_enc_layers 4 \
--n_dec_layers 4 \
--share_enc 4 \
--share_dec 4 \
--share_lang_emb True \
--share_output_emb True \
--langs 'de,en' \
--n_mono -1 \
--mono_dataset $MONO_DATASET \
--para_dataset $PARA_DATASET \
--mono_directions 'de,en' \
--word_shuffle 3 \
--word_dropout 0.1 \
--word_blank 0.2 \
--pivo_directions 'de-en-de,en-de-en' \
--pretrained_emb $PRETRAINED \
--pretrained_out True \
--lambda_xe_mono '0:1,100000:0.1,300000:0' \
--lambda_xe_otfd 1 \
--otf_num_processes 30 \
--otf_sync_params_every 1000 \
--enc_optimizer adam,lr=0.0001 \
--epoch_size 500000 \
--freeze_enc_emb False \
--freeze_dec_emb False \
--batch_size 32 \
--group_by_size True \
--max_epoch 100 \
--exp_id $EXP
The number of layer is set as 4 since i want to keep the setting is consistent with the paper reported. However, after the training procedure, i only achieved 19.84 Bleu in de-en and 15.22 Bleu in en-de, which is still far away from the paper reported (21 in de-en and 17.1 in en-de). So, could you help me to resolve it?
from unsupervisedmt.
@glample The setup of the de-en/en-de was the same as #34 (as shown in my problem description) and I trained cross-lingual BPE embeddings using the same parameters.
The result of en-fr/fr-en has been successfully reproduced in my experiments.
I will try the setup of @StillKeepTry to make sure that my training dataset is consistent. I will report it several days later.
from unsupervisedmt.
@StillKeepTry I had a look at your preprocessing script and I couldn't find anything wrong.
Since reproducing the en-de results seems complicated I uploaded the preprocessing script I used to reproduce the English-German experiments last time: https://github.com/facebookresearch/UnsupervisedMT/blob/master/NMT/get_data_deen.sh
Could you guys have a look at it and tell me how it compares to yours? And maybe use it for your next experiments, so that we are sure that the problem doesn't come from the data difference?
from unsupervisedmt.
@glample Thank you for your reply. It really seems no difference between my data preprocessing script and yours. ==! I will use your script to rerun the experiment later to check it again.
from unsupervisedmt.
@glample I have compared the data generated by your script with my previous data. I can sure the data are entirely identical. This is the vocab generated by my script. You can also download it for a comparison. I can also provide some of my training log for you.
from unsupervisedmt.
I uploaded a training log of an experiment I ran a couple of months ago: https://gist.github.com/glample/f69f3c742a9c435f510884d188401df5
The parameters are a bit different and I don't expect them to make a big difference, but just in case could you try with my exact same configuration? Then you can compare after 1 epoch if the loss / BLEU are the same.
from unsupervisedmt.
@glample Thank you for your reply. I have quickly deployed experiment by your setting and this is my training log with 2 epochs:
https://gist.github.com/StillKeepTry/8177e91a73ab4e90b66fdd032e081e0c.
It expresses slight better than my previous experiment (i think these improvement is accepted since i only use 4 layers in previous experiments), but still seems weaker than your training at the first two epochs. I will report the latest training log if the model is convergence.
from unsupervisedmt.
@StillKeepTry Have you reproduced the results of the en-de translation tasks? I followed the settings and get 15.05 for en-de and 19.74 for de-en, which is lower than the result of paper.
from unsupervisedmt.
@JiqiangLiu No. Your reproduce maybe similar to me. When using 10M en-de monolingual data, it seems the limit of model is nearly 20.0 for de-en and 15.0 for en-de in my experiments. I raise the number of monolingual to 50M, it can receive approximate 1.0 improvement. In other words, it can achieve 21.0 for de-en and 16.0 for en-de under 50M data. Although the result of de-en could achieve the paper reported but the performance of en-de still troubles me.
from unsupervisedmt.
@StillKeepTry Thanks for your reply. I will continue to experiment and report result if I have a comparable result.
from unsupervisedmt.
@glample Hi, sorry to disturb you again. Would you like to share me some pre-trained embeddings of German/Romanian language for PBSMT. I have downloaded some pre-trained embedding from fasttext website but the results seem quite unideal.
from unsupervisedmt.
Similar to @StillKeepTry , I still can not perfectly reproduce the results. My result is 15.4 for en-de and 20.04 for de-en.
I report my log on https://gist.github.com/WinnieHAN/cc120d4ef31c5b6f9342cc0fc06a407b.
from unsupervisedmt.
Hi,
I suggest you guys have a look at https://github.com/facebookresearch/XLM this repo is much better, provides multi-GPU support and obtains significantly better results. Also we provide pretrained models that should allow you to reproduce the results very quickly.
from unsupervisedmt.
@glample Thank you for your contribution, we will follow your new work.
from unsupervisedmt.
Hi @glample, does XLM support an arbitrary number of languages during MT training?
from unsupervisedmt.
Yes, you can have as many languages as you want in XLM.
from unsupervisedmt.
Related Issues (20)
- why MemoryError
- Why codes file is empty.? HOT 4
- for different language, where to make change?
- How to train NMT + PBSMT ?
- UnboundLocalError: local variable 'n_words' referenced before assignment
- About number of shared layers
- RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [14, 32, 1536]], which is output 0 of AddBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). HOT 1
- How to run PBSMT +NMT ?
- transformer multihead attention scaling layer error
- Setting the random seed does not result in same outputs across runs
- I have trouble when run get_data_enfr.sh
- How can I modify the code to train may own dataset on specific language?
- Low utilization rate of cuda HOT 1
- How to train the vector of phrases
- Low BLEU on PBSMT HOT 3
- bpe_end issue
- Getting raise EOFError() while executing Linux Command through Netmiko
- How i can run MUSE alignment in .sh
- How to train the model without para_dataset
- Error in runny bash command. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from unsupervisedmt.