facebookresearch / unsupervisedmt Goto Github PK

View Code? Open in Web Editor NEW

1.5K 122.0 264.0 279 KB

Phrase-Based & Neural Unsupervised Machine Translation

License: Other

Shell 10.70% Python 89.30%

unsupervisedmt's Issues

How to run UMT in inference mode when training is done?

As title. I want to test the result on some of my provided data.

The difference of vocabulary produced by fastText and fast

If the two languages do not share a vocabulary， the vocabulary could be generated by fastText, or also by the fastBPE. I find that these two vocabulary is not the same, what is the difference?

this code is run on python3 well, not python2

As I tested, this code is run well on python3. There are some bugs when running on python2. The author can make more correct requirement for running this code.

Error when dimension of fastText embedding is changed

I changed the dimension of the fastText utility from 512 to 128 to adapt to my own corpora:

$FASTTEXT skipgram -epoch $N_EPOCHS -minCount 0 -dim **128** 
     -thread $N_THREADS -ws 5 -neg 10 -input $CONCAT_BPE -output $CONCAT_BPE

The result is as follows:

INFO - 09/16/18 15:31:04 - 0:00:05 - Reloading embeddings from ./mydata/mono/all.yy-xx.500.vec ...
Traceback (most recent call last):
  File "main.py", line 242, in <module>
    encoder, decoder, discriminator, lm = build_mt_model(params, data)
  File "/local/home/qwang/UnsupervisedMT/NMT/src/model/__init__.py", line 98, in build_mt_model
    return build_attention_model(params, data, cuda=cuda)
  File "/local/home/qwang/UnsupervisedMT/NMT/src/model/attention.py", line 813, in build_attention_model
    initialize_embeddings(encoder, decoder, params, data)
  File "/local/home/qwang/UnsupervisedMT/NMT/src/model/pretrain_embeddings.py", line 90, in initialize_embeddings
    pretrained_0, word2id_0 = reload_embeddings(params.pretrained_emb, params.emb_dim)
  File "/local/home/qwang/UnsupervisedMT/NMT/src/model/pretrain_embeddings.py", line 76, in reload_embeddings
    return reload_txt_emb(path, dim)
  File "/local/home/qwang/UnsupervisedMT/NMT/src/model/pretrain_embeddings.py", line 48, in reload_txt_emb
    assert dim == int(split[1])
AssertionError

UserWarning: size_average and reduce args will be deprecated, please use reduction='elementwise_mean' instead.

Hi, I'm trying to train your unsupervised NMT system (using my own data), but I'm getting these UserWarnings all the time:

UserWarning: size_average and reduce args will be deprecated, please use reduction='elementwise_mean' instead.

I'm using pyTorch 0.4.1. I also tried it with 1.0 (by compiling from source, master branch), but the same thing happened. Is this a known issue?

Kind regards,
Laurens

Which versions of Torch and CUDA are needed for this code?

What should I do if the languages are unrelated

Hi @glample

Now my training languages are zh-en, as we all konw, they are unrelated。
According to my understanding, I did the following things：

First, apply BPE on the two languages separately .
Sencond, use fastText on two languages separately to get embeddings zh.vec and en.vec.
Third, train MUSE on zh.vec and en.vec， get vectors-zh.txt, vectors-en.txt, and vectors-zh-en.txt. vectors-zh.txt, vectors-en.txt are aligned by Muse. vectors-zh-en.txt is formed by joining vectors-zh.txt vectors-zh.txt together.

Is there any problem with what I did above？
Should I put the two language embeddings together when running the code? Which of the following input arguments should I choose，1 or 2?

--share_lang_emb False --share_output_emb False --pretrained_emb 'vectors-zh.txt,vectors-en.txt'
--share_lang_emb True --share_output_emb True --pretrained_emb 'vectors-zh-en.txt'

Question About Discriminator

In trainer.py, I can not figure out fake_y.Why uses the fake target to train discriminator?

Help me understand the Output/Parameters and inference.

I am training spanish to english NMT model. It prints below logs after epoch 0 when i run NMT/main.py.

INFO - 09/24/18 12:38:38 - 1:47:10 - 600 - 12.01 sent/s - 377.00 words/s - XE-es-es: 4.5986 || XE-en-en: 4.9864 || XE-en-es-en: 5.7000 || XE-es-en-es: 5.5308 || ENC-L2-en: 4.8750 || ENC-L2-es: 4.7380 - LR enc=1.0000e-04,dec=1.0000e-04 - Sentences generation time: 225.13s (42.23%)

What confusing me is what does XE-es-es and XE-en-en means ? Shouldn't it be XE-es-en and XE-en-es, if it's the loss? Also if anyone could explain all the parameters being printed, during training, it would be helpful in understanding on what is happenning.

Raise EOFError

When the NMT runs for a while, the following error is reported. My run command is as follows.
Is there any parameter that is too large? How to improve the stability of the program?

python main.py  -exp_name base3  --transformer True  --n_enc_layer 4  --n_dec_layer 4  --share_enc 3     --share_dec 3 --share_lang_emb True  --share_output_emb True --langs 'en,fr' --n_mono -1 --mono_dataset 'en:./data/mono/all.en.tok.60000.pth,,;fr:./data/mono/all.fr.tok.60000.pth,,' 
--para_dataset 'en-fr:,./data/para/dev/newstest2013-ref.XX.60000.pth,./data/para/dev/newstest2014-fren-src.XX.60000.pth' 
--mono_directions 'en,fr'  --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'fr-en-fr,en-fr-en' 
--pretrained_emb './data/mono/all.en-fr.60000.vec' --pretrained_out True 
--lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 15 
--otf_sync_params_every 1000  --enc_optimizer adam,lr=0.0001  --epoch_size 500000 
--stopping_criterion bleu_en_fr_valid,10
--max_len 100

Traceback (most recent call last):
File "main.py", line 317, in
batches = next(otf_iterator)
File "/mnt/disk-c/liujiqiang/low-resourceMT/UnsupervisedMT/en2fr/baseline/NMT/src/trainer.py", line 561, in otf_bt_gen_async
results = cache[0].gen()
File "/mnt/disk-c/liujiqiang/low-resourceMT/UnsupervisedMT/en2fr/baseline/NMT/src/multiprocessing_event_loop.py", line 203, in gen
return next(self.generator)
File "/mnt/disk-c/liujiqiang/low-resourceMT/UnsupervisedMT/en2fr/baseline/NMT/src/multiprocessing_event_loop.py", line 73, in fetch_all_result_generator
result_type, result = self.return_pipes[rank].recv()
File "/home/liujq/Python-3.6.4/lib/python3.6/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/home/liujq/Python-3.6.4/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/liujq/Python-3.6.4/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError

Why I get BLEU 1.01 on zh-en of PBSMT ?

Hi,

I was confused for several days.

I followed the steps of PBSMT/run.sh to do my work, and I think the most important step is "Running MUSE to generate cross-lingual embeddings". I aligned the 'zh' and 'en' pre-trained word vectors you provided on [https://fasttext.cc/docs/en/crawl-vectors.html] with MUSE, and got "Adv-NN P@1=21.3、Adv-CSLS P@1=26.9、Adv-Refine-NN P@1=18.5、Adv-Refine-CSLS P@1=24.0".

Then, I used the aligned embeddings to generate the phrase-table, but finally I got BLEU of 1.01. I don't think the result is right. Something must have gone wrong.

My command of MUSE is:
python unsupervised.py --src_lang ch \ --tgt_lang en \ --src_emb /data/experiment/embeddings/wiki.ch.300.vec.20w \ --tgt_emb /data/experiment/embeddings/wiki.en.300.vec.20w \ --exp_name test \ --exp_id 0 \ --normalize_embeddings center \ --emb_dim 300 \ --dis_most_frequent 50000 \ --epoch_size 500000 \ --dico_eval /data/experiment/unsupervisedMT/fordict/zh-en.5000-6500.sim.txt \ --n_refinement 5 \ --export "pth"

My command for generate phrase table is:
python create-phrase-table.py \ --src_lang $SRC \ --tgt_lang $TGT \ --src_emb $ALIGNED_EMBEDDINGS_SRC \ --tgt_emb $ALIGNED_EMBEDDINGS_TGT \ --csls 1 \ --max_rank 200 \ --max_vocab 300000 \ --inverse_score 1 \ --temperature 45 \ --phrase_table_path ${PHRASE_TABLE_PATH::-3}

Does the problem lay in the word embeddings, shoud I use the word embeddings trained on my training data with fastText for MUSE? I have tried it (use the word embedding trained on my training data), but got "Adv-NN P@1=0.07、Adv-CSLS P@1=0.07、Adv-Refine-NN P@1=0.00、Adv-Refine-CSLS P@1=0.00". My command is : ./fasttext skipgram -epoch 10 -minCount 0 -dim 300 -thread 48 -ws 5 -neg 10 -input $SRC_TOK -output $EMB_SRC. So I did't use the word embedding generated on training data , because I think I didn't align them well.

So, where is the fault?

How to set the epoch_size and max_epoch

Hi, Could you tell me how to set these two parameters? How long did it take to train in the paper ? Thank you.

Fast text step taking long time - alternatives ?

Hi,
While Running get_data.sh , the last step to generate Cross-lingual embeddings (.vec) file using fast text is taking a long time (ran it for a day on 4 CPU's, still running ).

As an alterantive to this all.en-fr.60000.vec file, can I just download the crawl-vectors for English and French from this site - https://github.com/facebookresearch/fastText/blob/master/docs/crawl-vectors.md and pass it as argument to the main file, something like below:
--pretrained_emb './data/mono/cc.fr.300.vec, ./data/mono/cc.fr.300.vec'

Thanks !

Mohammed Ayub

Inconformity in decoder's forward and generate

I am curious about why generate() function needs this step decoded[0] = self.bos_index[lang_id] while forward() does't do this.

which codes are related to the iterative back-translation of Algorithm 2 of PBSMT

In algorithm 2 of PBSMT, it generates parallel pairs to optimize the model by iteration. But in the code of "PBSmT/run.sh", I cannot find the iteration process. Could you help me explain it? Thanks very much.

[Question] Did you use Discriminator for recent results

Hi,

It is clear that you used discriminator for the models in the paper Unsupervised Machine Translation With Monolingual Data Only.

But do you also use it to obtain results for NMT models presented in Phrase-Based & Neural Unsupervised Machine Translation paper?

From the code it seems like if --n_dis parameter equals 0 (which is default), the discriminator is not used. You found it to have no positive effect?

Thanks!
Maksym

Interpret the result of PBSMT

Could somebody explain to me what each of the numbers below means?

BLEU = 13.49, 51.9/21.1/10.2/5.2 (BP=0.869, ratio=0.877, hyp_len=71143, ref_len=81098)
End of training. Experiment is stored in: ./UnsupervisedMT/PBSMT/moses_train_en-fr

Thanks

No module named torch

Hi @glample

I'm getting an error while replicating the steps. I did install pytorch in my conda environment. It gives error saying "No Module name torch" I see that's being imported in /src/data/dictionary.py . Not sure if I'm missing something.

torch version?

what is the torch version for this code?

cannot reproduce the results of unsupervised NMT

I use the command python main.py --exp_name test --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 3 --share_dec 3 --share_lang_emb True --share_output_emb True --langs 'en,fr' --n_mono -1 --mono_dataset 'en:./data/mono/all.en.tok.60000.pth,,;fr:./data/mono/all.fr.tok.60000.pth,,' --para_dataset 'en-fr:,./data/para/dev/newstest2013-ref.XX.60000.pth,./data/para/dev/newstest2014-fren-src.XX.60000.pth' --mono_directions 'en,fr' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'fr-en-fr,en-fr-en' --pretrained_emb './data/mono/all.en-fr.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 30 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_en_fr_valid,10

to run the codes, but finally I can just get BLEU score:
bleu_en_fr_test -> 18.400000
bleu_fr_en_test -> 18.610000

I cannot get the score above 23.0 BLEU. This the final log:
INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - BLEU ./dumped/test/2w3b5142k5/hyp107.en-fr-en.test.txt ./dumped/test/2w3b5142k5/ref.fr-en.test.txt : 45.160000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - epoch -> 107.000000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_en_fr_valid -> 65.701571 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_en_fr_valid -> 15.980000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_fr_en_valid -> 91.976447 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_fr_en_valid -> 15.940000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_en_fr_test -> 41.084037 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_en_fr_test -> 18.400000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_fr_en_test -> 59.131132 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_fr_en_test -> 18.610000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_fr_en_fr_valid -> 3.330721 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_fr_en_fr_valid -> 44.190000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_fr_en_fr_test -> 2.995039 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_fr_en_fr_test -> 44.170000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_en_fr_en_valid -> 3.518063 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_en_fr_en_valid -> 45.580000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_en_fr_en_test -> 3.443367 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_en_fr_en_test -> 45.160000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - __log__:{"epoch": 107, "ppl_en_fr_valid": 65.70157069383522, "bleu_en_fr_valid": 15.98, "ppl_fr_en_valid": 91.9764473614227, "bleu_fr_en_valid": 15.94, "ppl_en_fr_test": 41.084036855000534, "bleu_en_fr_test": 18.4, "ppl_fr_en_test": 59.131132225283814, "bleu_fr_en_test": 18.61, "ppl_fr_en_fr_valid": 3.3307206599236614, "bleu_fr_en_fr_valid": 44.19, "ppl_fr_en_fr_test": 2.9950391883629113, "bleu_fr_en_fr_test": 44.17, "ppl_en_fr_en_valid": 3.518062580498395, "bleu_en_fr_en_valid": 45.58, "ppl_en_fr_en_test": 3.4433665669537183, "bleu_en_fr_en_test": 45.16} INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - Not a better validation score (10 / 10). INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - Stopping criterion has been below its best value more than 10 epochs. Ending the experiment...

I follow the README step by step. So is there anything that I miss?

Help understand the share* input arguments.

Thanks for releasing the code.

Please help me understand these share* arguments in the input and their functions. Which of these arguments define sharing between forward and reverse translation models (src-tgt and tgt-src)?
Here are the args:

share_lang_emb
share_encdec_emb
share_decpro_emb
share_output_emb
share_lstm_proj
share_enc
share_dec

Also, can you provide some intuition on the scenarios when these should be changed from the default values? Like, when the languages are distant, or low-resource etc.
For share_enc, share_dec, I understand that if we have 4 encoder and 4 decoder layers and I set these to 2 and 2 respectively, I am sharing the first 2 encoder/decoder layers. Is that correct? What happens in the case of the reverse translation model (tgt-src), are all of these shared?
For share_decpro_emb, following Press and Wolf (2016), I understand the input and output embeddings for the decoder are shared. Currently, they are also tied to the reverse model decoder (tgt-src) because we have a joint vocabulary. How do I not share these decoder embeddings across languages (ex: distant pairs like en-hi)?
For share_output_emb, when you say Share decoder output embeddings, sharing with what? (forward and reverse models?)
In your Unsupervised NMT+ PBSMT paper, in section 4.3.1, it is said 'all lookup tables between encoder-decoder, and source-target language are shared'? Isn't later (src-tgt) a consequence of joint BPE vocabulary model? Also, can you clarify how many different look-up tables you are using and how that choice might be affected by the case of distant languages with different alphabets?

Thanks again,
Ashim

Why sort the two languages

In check_all_data_params@src/data/loader.py I saw the following lines

assert sorted(params.langs) == params.langs
...
assert lang1 < lang2 and lang1 in params.langs and lang2 in params.langs

I currently want to adapt the UnsupervisedMT to my own corpora which has .yy suffix as source language and .xx suffix as target language. However, the checks in here make it impossible to pass to the training phase. I am wondering why there is a need to sort the languages. If I deleted these checks will they affect the later training phase?

Thanks.

binarize language model

UnsupervisedMT/PBSMT/run.sh

Line 236 in 6b614e7

$TRAIN_LM -o 5 < $SRC_TRUE > $SRC_LM_BLM

Is it should be

$MOSES_PATH/bin/build_binary $SRC_LM_ARPA $SRC_LM_BLM

unable to understand the meaning of "otf_backprop_temperature".

Hi,
I was looking at the code for "otf_backprop_temperature" and while doing back-translation and keeping the default setting(=-1), back-translating from lang1-lang2-lang3 would prevent the lossy translations from lang1-lang2 and just look into translations from lang2-lang3. With the default values, is this understanding correct?

        if backprop_temperature == -1:
            # lang2 -> lang3
            encoded = self.encoder(sent2, len2, lang_id=lang2_id)
        else:
            # lang1 -> lang2
            encoded = self.encoder(sent1, len1, lang_id=lang1_id)
            scores = self.decoder(encoded, sent2[:-1], lang_id=lang2_id)
            assert scores.size() == (len2.max() - 1, bs, n_words2)

            # lang2 -> lang3
            bos = torch.cuda.FloatTensor(1, bs, n_words2).zero_()
            bos[0, :, params.bos_index[lang2_id]] = 1
            sent2_input = torch.cat([bos, F.softmax(scores / backprop_temperature, -1)], 0)
            encoded = self.encoder(sent2_input, len2, lang_id=lang2_id)

How to add para_dataset src and tgt into a single .pth file

Currently the process.py file gives a .pth file for a single text file so for src and tgt I will have two src.pth and tgt.pth, but when I need to pass the files to main.py it expects in the format

src-tgt:train12,val12,test12

How to pass src and tgt .pth files separately ?

RuntimeError: CUDA error: out of memory

When I run the unsupervised NMT codes, the following error is reported. My run command is as follows. Is there any parameter that is too large?

python main.py --exp_name mnzh --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 3 --share_dec 3 --share_lang_emb True --share_output_emb True --langs 'en,fr' --n_mono -1 --mono_dataset 'en:./data/mono/all.en.tok.60000.pth,,;fr:./data/mono/all.fr.tok.60000.pth,,' --para_dataset 'en-fr:,./data/para/dev/newstest2013-ref.XX.60000.pth,./data/para/dev/newstest2014-fren-src.XX.60000.pth' --mono_directions 'en,fr' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'fr-en-fr,en-fr-en' --pretrained_emb './data/mono/all.en-fr.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 8 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_en_fr_valid,10

GPU: GTX 1070
Python version: 3.6.2
Pytorch version: 0.4.1
CUDA Version 8.0.44

Look forward to your reply. Thanks!

"RuntimeError: received 0 items of ancdata" in otf_bt_gen_async

H when I run the unsupervised NMT codes, it raises the error after running for a while:

Traceback (most recent call last): File "../main.py", line 317, in <module> batches = next(otf_iterator) File "/data/XXXX/nmt/unsupervised/UnsupervisedMT/NMT/src/trainer.py", line 561, in otf_bt_gen_async results = cache[0].gen() File "/data/XXXX/nmt/unsupervised/UnsupervisedMT/NMT/src/multiprocessing_event_loop.py", line 203, in gen return next(self.generator) File "/data/XXXX/nmt/unsupervised/UnsupervisedMT/NMT/src/multiprocessing_event_loop.py", line 73, in fetch_all_result_generator result_type, result = self.return_pipes[rank].recv() File "/home/XXXX/anaconda3/lib/python3.6/multiprocessing/connection.py", line 251, in recv return _ForkingPickler.loads(buf.getbuffer()) File "/home/XXXX/anaconda3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 201, in rebuild_storage_fd fd = df.detach() File "/home/XXXX/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach return reduction.recv_handle(conn) File "/home/XXXX/anaconda3/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle return recvfds(s, 1)[0] File "/home/XXXX/anaconda3/lib/python3.6/multiprocessing/reduction.py", line 161, in recvfds len(ancdata)) RuntimeError: received 0 items of ancdata

during the back translation process.

So what's the problem? Does it mean the workload of back translation is too heavy?

I put the full log here:
https://gist.github.com/tobyyouup/8426bb216d05482efd0bbdc8dcbd04e5

Look forward to your reply. Thanks!

AttributeError: 'torch.dtype' object has no attribute 'type'

After complete preprocessing, I begin to train with the same configure as specified in https://github.com/facebookresearch/UnsupervisedMT.

Meet the following problem:

INFO - 08/22/18 10:26:03 - 0:01:30 - Populating initial OTF generation cache ...
INFO - 08/22/18 10:26:03 - 0:01:30 - Creating new training otf,en iterator ...
INFO - 08/22/18 10:26:08 - 0:01:35 - Creating new training otf,fr iterator ...
Traceback (most recent call last):
  File "main.py", line 333, in <module>
    trainer.iter()
  File "/home/zhangruiqing01/MT/lrl/UnsupervisedMT/NMT/src/trainer.py", line 721, in iter
    self.print_stats()
  File "/home/zhangruiqing01/MT/lrl/UnsupervisedMT/NMT/src/trainer.py", line 749, in print_stats
    for k, l in mean_loss if len(self.stats[l]) > 0])
  File "/home/zhangruiqing01/MT/lrl/UnsupervisedMT/NMT/src/trainer.py", line 749, in <listcomp>
    for k, l in mean_loss if len(self.stats[l]) > 0])
  File "/home/zhangruiqing01/tools/anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 2909, in mean
    out=out, **kwargs)
  File "/home/zhangruiqing01/tools/anaconda3/lib/python3.6/site-packages/numpy/core/_methods.py", line 80, in _mean
    ret = ret.dtype.type(ret / rcount)
AttributeError: 'torch.dtype' object has no attribute 'type'

ENV:
centos 6.3
torch.version : '0.5.0a0+9e75ec1'
numpy.version : '1.13.1'
Cuda 8.0
k40

No language model pretraining in these results?

Hi @glample ,
I was reading through the paper and the code and realized that though you mention (in the paper) that pretraining the language model is really important(otherwise the back-translation wouldn't work well), you don't explicitly pretrain the LM in the code(especially in the snippet where you mentioned the training command for NMT)- explicitly, the (--lm_before) flag is not set and by default it is 0(no LM pretrain). So the results which you report in the paper are with the LM or without? Because I would expect that pretrained LM would increase performance. If it achieves this performance, isn't a bit strange?

Thanks for your time.
Pranay

Inference

Hello,
Can you help me how to infer with the saved checkpoint?
I am stuck with loading back the model to torch and call model.eval() function.
Thanks

Running the model completely on CPU gives -1 BLEU

Hi,

I'm trying to run the model training on CPU.
Changes Done:
I have removed all the references to cuda i.e. .cuda() mentions and also changed torch.cuda to just torch in the evaluator.py and trainer.py files .
Changed SIGUR1 to SIGTERM in mulitprocessing_event_loop.py because python multiprocessing in windows is different than in Linux.
I had to put torch.cuda.is_available() in build_mt_model(params, data, torch.cuda.is_available()) in line 243 main.py since I was getting a CUDA error.

With all the above changes done, I'm getting very different results when training, there seems to be an error calculating blue score as below.

@glample

Any help appreciated.

Thanks !

Mohammed Ayub

Curious- is there a way to not do BPE and do one hot encoding instead?

Hi,
I was working on BPE and wasn't really sure what was happening and so I wanted to try out doing one hot encoding (I know some words cannot be seen during train time but that's fine(aren't much for my dataset)) instead to see how to model performs. Is there a way I can plug in one hot encoding into this code?

Thanks,
Pranay

help I am confused about why this function just using the catche[0]?

UnsupervisedMT/NMT/src/trainer.py

Lines 550 to 570 in 8ea2527

    
           def otf_bt_gen_async(self, init_cache_size=None): 
        
               logger.info("Populating initial OTF generation cache ...") 
        
               if init_cache_size is None: 
        
                   init_cache_size = self.num_replicas 
        
               cache = [ 
        
                   self.call_async(rank=i % self.num_replicas, action='_async_otf_bt_gen', 
        
                                   result_type='otf_gen', fetch_all=True, 
        
                                   batches=self.get_worker_batches()) 
        
                   for i in range(init_cache_size) 
        
               ] 
        
               while True: 
        
                   results = cache[0].gen() 
        
                   for rank, _ in results: 
        
                       cache.pop(0)  # keep the cache a fixed size 
        
                       cache.append( 
        
                           self.call_async(rank=rank, action='_async_otf_bt_gen', 
        
                                           result_type='otf_gen', fetch_all=True, 
        
                                           batches=self.get_worker_batches()) 
        
                       ) 
        
                   for _, result in results: 
        
                       yield result

Thank authors released the code . I am confused the function why this function just using the catche[0] to get results and not using other caches, whether it means the other is useless or not ?or other caches has been used in a unseen way?

please help me figure out this, thanks a lot.

CUDA Out of Memory

Hi!
Is it normal that I am out of gpu memory when training the en-fr dataset on a GTX 1080Ti? (11 GB VRAM)
The training failed after 1000 step with the following message:

INFO - 09/08/18 11:28:33 - 0:29:30 -    1000 -   72.85 sent/s -  2105.00 words/s - XE-en-en:  3.7582 || XE-fr-fr:  3.5753 || XE-fr-en-fr:  5.0131 || XE-en-fr-en:  5.4994 || ENC-L2-en:  5.1436 || ENC-L2-fr:  5.1294 - LR dec=1.0000e-04,enc=1.0000e-04 - Sentences generation time:  56.93s (64.80%)
Traceback (most recent call last):
  File "main.py", line 328, in <module>
    trainer.otf_bt(batch, params.lambda_xe_otfd, params.otf_backprop_temperature)
  File "/data/UnsupervisedMT/NMT/src/trainer.py", line 706, in otf_bt
    loss.backward()
  File "/usr/local/lib/python3.5/dist-packages/torch/tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/usr/local/lib/python3.5/dist-packages/torch/autograd/__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA error: out of memory

Should I decrease the layer number of the transformer model or the size of the dataset?

Thanks

How can I run this code on mutli GPU?

Hi, since I am not so familier with pytorch, I want to know how can I run this code on multi-GPU framework?

reproduction problem with en-de/de-en translation

Hi, I met the same reproduction problem of de-en/en-de. I followed the settings and get 12.21 for en-de and 8.03 for de-en.

My command is:
python main.py --exp_name 'de-en-55' --transformer 'True' --n_enc_layers '5' --n_dec_layers '5' --share_enc '5' --share_dec '5' --share_lang_emb 'True' --share_output_emb 'True' --langs 'de,en' --n_mono '-1' --mono_dataset 'en:./data/mono-de-en/all.en.tok.60000.pth,,;de:./data/mono-de-en/all.de.tok.60000.pth,,' --para_dataset 'de-en:,./data/para-de-en/dev/newstest2016-ende-src.XX.60000.pth,./data/para-de-en/dev/newstest2015-ende-ref.XX.60000.pth' --mono_directions 'de,en' --word_shuffle '3' --word_dropout '0.1' --word_blank '0.1' --pivo_directions 'de-en-de,en-de-en' --pretrained_emb './data/mono-de-en/all.en-de.60000.vec' --pretrained_out 'True' --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd '1' --otf_num_processes '30' --otf_sync_params_every '1000' --enc_optimizer 'adam,lr=0.0001' --epoch_size '500000' --stopping_criterion 'bleu_de_en_valid,10' --freeze_dec_emb 'True' --freeze_enc_emb 'False' --exp_id "7fvh95gae7"

The detailed result after 40 epochs is as followed:

40 bleu_en_de_test 11.49 bleu_de_en_test 7.49
41 bleu_en_de_test 12.09 bleu_de_en_test 8.03
42 bleu_en_de_test 12.26 bleu_de_en_test 7.25
43 bleu_en_de_test 11.78 bleu_de_en_test 7.64
44 bleu_en_de_test 11.76 bleu_de_en_test 7.57
45 bleu_en_de_test 12.19 bleu_de_en_test 7.76
46 bleu_en_de_test 12.19 bleu_de_en_test 7.69
47 bleu_en_de_test 12.09 bleu_de_en_test 7.74
48 bleu_en_de_test 11.72 bleu_de_en_test 7.81
49 bleu_en_de_test 12.23 bleu_de_en_test 7.47
50 bleu_en_de_test 12.06 bleu_de_en_test 7.89
51 bleu_en_de_test 12.29 bleu_de_en_test 7.89
52 bleu_en_de_test 12.21 bleu_de_en_test 7.88

Is there any setup that I misunderstand?

how long does NMT model run on GPU's?

Hi,

I'm trying to estimate cost for developing different models using this repo. Has anyone ran these experiments on multiple GPU's. How long should I expect the NMT model run for 1 GPU, 4 GPU and 8 GPU configs.

Any help appreciated
Cheers.

Mohammed Ayub

How to set otf_num_processes

Hi, it is not realistic to set otf_num_processes to 30 that is default as many people are using the server . How do you set this parameter and what does it affect? Thank you,

Cannot initialize CUDA without ATen_cuda library

While I am trying to train an NMT model, I got the following error message. I build PyTorch 0.5 from source on Ubuntu 16.04 running Python3.5.

Traceback (most recent call last):
File "main.py", line 242, in
encoder, decoder, discriminator, lm = build_mt_model(params, data)
File "/home/gezmu/UnsupervisedMT/NMT/src/model/init.py", line 98, in build_mt_model
return build_attention_model(params, data, cuda=cuda)
File "/home/gezmu/UnsupervisedMT/NMT/src/model/attention.py", line 801, in build_attention_model
encoder.cuda()
File "/home/gezmu/umt/lib/python3.5/site-packages/torch/nn/modules/module.py", line 258, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/gezmu/umt/lib/python3.5/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/home/gezmu/umt/lib/python3.5/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/home/gezmu/umt/lib/python3.5/site-packages/torch/nn/modules/module.py", line 191, in _apply
param.data = fn(param.data)
File "/home/gezmu/umt/lib/python3.5/site-packages/torch/nn/modules/module.py", line 258, in
return self._apply(lambda t: t.cuda(device))
RuntimeError: Cannot initialize CUDA without ATen_cuda library. PyTorch splits its backend into two shared libraries: a CPU library and a CUDA library; this error has occurred because you are trying to use some CUDA functionality, but the CUDA library has not been loaded by the dynamic linker for some reason. The CUDA library MUST be loaded, EVEN IF you don't directly use any symbols from the CUDA library! One common culprit is a lack of -Wl,--no-as-needed in your link arguments; many dynamic linkers will delete dynamic library dependencies if you don't depend on any of their symbols. You can check if this has occurred by using ldd on your binary to see if there is a dependency on *_cuda.so library.

About decoding

Hi, I noticed that generating greedily in decoding time. Is there a big improvement in performance if we use beam size in training time? Thank you.

RuntimeError: torch.cuda.FloatTensor is not enabled.

Hi,

I get the following error when I ran the code on my Mac machine (no GPU):

INFO - 09/19/18 10:23:37 - 0:00:16 - ============ Building transformer attention model - Decoder ...
INFO - 09/19/18 10:23:37 - 0:00:16 - Sharing decoder input embeddings
INFO - 09/19/18 10:23:38 - 0:00:17 - Sharing decoder transformer parameters for layer 0
INFO - 09/19/18 10:23:38 - 0:00:17 - Sharing decoder transformer parameters for layer 1
INFO - 09/19/18 10:23:38 - 0:00:17 - Sharing decoder transformer parameters for layer 2
INFO - 09/19/18 10:23:38 - 0:00:18 - Sharing decoder projection matrices

Traceback (most recent call last):
File "main.py", line 242, in
encoder, decoder, discriminator, lm = build_mt_model(params, data)
File "/Users/Ehsan/Documents/Ehsan_General/HMQ/HMQ_Projects/UnSup_MT/UnsupervisedMT/NMT/src/model/init.py", line 98, in build_mt_model
return build_attention_model(params, data, cuda=cuda)
File "/Users/Ehsan/Documents/Ehsan_General/HMQ/HMQ_Projects/UnSup_MT/UnsupervisedMT/NMT/src/model/attention.py", line 801, in build_attention_model
encoder.cuda()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 249, in cuda
return self._apply(lambda t: t.cuda(device))
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 176, in _apply
module._apply(fn)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 176, in _apply
module._apply(fn)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 182, in _apply
param.data = fn(param.data)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 249, in
return self._apply(lambda t: t.cuda(device))
RuntimeError: torch.cuda.FloatTensor is not enabled.

Is there any way that I can run this code on CPU?

Regards

Inference again

Hi Glample,

Lately I have been trying to run the UnsupervisedMT in inference mode. Below is the command that at least work for me temporarily after many rounds of failed attempt.

python main.py \
  --exp_name mzpf \
  --exp_id infer \
  --reload_model './dumped/mzpf/infer/checkpoint.pth' \
  --eval_only 1 \
  --share_lang_emb True \
  --share_output_emb True \
  --langs 'en,fr' \
  --n_mono -1 \
  --reload_enc 1 \
  --reload_dec 1 \
  --reload_dis 0 \
  --mono_dataset 'en:./mzpf/mono/tok.unicode.shuffled.en.500.pth,,;fr:./mzpf/mono/tok.shuffled.fr.500.pth,,' \
  --para_dataset 'en-fr:,./mzpf/para/tok-para.shuffled.stripped.valid.XX.500.pth,./mzpf/para/infer.XX.500.pth' \
  --mono_directions 'en,fr' \
  --lambda_xe_mono '0:1,100000:0.1,300000:0'

In addition to some other parameters, it seems that the most important one is the --para_dataset, in which I have to trick the model by taking a parallel pair of inference set as test set. Currently I happen to have parallel data to run inference, but in case that I do not have parallel data, I am wondering how the inference mode is run? Or whether the above way of running inference as inspired by #15 is correct or not? Thanks.

Problem with xe_costs_en_fr_en loss when reloading checkpoint

Hello,
I have a problem when I stop then restart training ... When I stop training, I have a xe_costs_en_fr_en loss value of about 0.8, but when I restart training, the value is 6; even if the checkpoint has been successfully reloaded, this back-translation loss seems to begin from scratch again (the other ones are ok). Would you know why ?( I am asking this because I'd like to add some complementary losses at certain times of training )
Thank you very much!

what does the "pretrained_out=True" mean?

I cannot figure out what do you mean by "pretrained_out=True". I didn't find how you use it. Do you mean the output projection matrix is also initialized from the pretrainied word embedding? Could you please show where you use it? Thank you. Actually, now my code on tensorflow is almost the same with yours. But I cannot get the comparable results with your code.

RuntimeError: cuda runtime error (2)

While I am trying to train an NMT model, I got the following error message. What is important is that these error messages are generated after a period of successful training. I build PyTorch 0.4.0 from source on Centos running Python3.6. How to fix it?

THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
data_type: valid lang1: en lang2:fr
data_type: test lang1: en lang2:fr
Traceback (most recent call last):
File "main.py", line 328, in
trainer.otf_bt(batch, params.lambda_xe_otfd, params.otf_backprop_temperature)
File "/home/liujq/low-resourceMT/UnsupervisedMT/en2fr/baseline/NMT/src/trainer.py", line 706, in otf_bt
loss.backward()
File "/home/liujq/env_python3.6/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/liujq/env_python3.6/lib/python3.6/site-packages/torch/autograd/init.py", line 89, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

ModuleNotFoundError: No module named 'fb' when running NMT training script

While trying to run the training script python3 main.py I receive the following error:

File "<my_project_directory>/NMT/src/data/loader.py", line 164, in load_para_data
data1 = load_binarized(path.replace('XX', lang1), params)
File "<my_project_directory>/NMT/src/data/loader.py", line 32, in load_binarized
data = torch.load(path)
File "<my_virtualenv_directory>/lib/python3.6/site-packages/torch/serialization.py", line 358, in load
return _load(f, map_location, pickle_module)
File "<my_virtualenv_directory>/lib/python3.6/site-packages/torch/serialization.py", line 542, in _load
result = unpickler.load()
ModuleNotFoundError: No module named 'fb'

It seems that there is some dependency missing that I'm not aware of.
I use Ubuntu 18.04.1, Python 3.6.6 and PyTorch 0.4.1

unsupervised model selection criterion on en-fr

Hi,
I use the unsupervised criterion based on the BLEU score of a “round-trip” translation as in Lample et al. (2018), but the valid is newstest2013 that is not extracted from th monolingal training corpora. My training corpus is the same as Unsuperivised machine translation using monolingual corpora only. The training stopped at 54 epoch and get BLEU 18.70 for newstest2014. I believe that if training could longer, bleu will be higher. How to interpret the traininh stopping so soon?
My command is

python main.py --exp_name 'base' --transformer 'True' --n_enc_layer '3' --n_dec_layer '3' --share_enc '3' --share_dec '3' --share_lang_emb 'True' --share_output_emb 'True' --langs 'en,fr' --n_mono '-1' --mono_dataset 'en:./data/mono/all.en.tok.60000.pth,,;fr:./data/mono/all.fr.tok.60000.pth,,' --para_dataset 'en-fr:,./data/para/dev/newstest2013-ref.XX.60000.pth,./data/para/dev/newstest2014-fren-src.XX.60000.pth' --mono_directions 'en,fr' --word_dropout '0.1' --word_blank '0.2' --pivo_directions 'fr-en-fr,en-fr-en' --pretrained_emb './data/mono/all.en-fr.60000.vec' --pretrained_out 'True' --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd '1' --otf_num_processes '12' --otf_sync_params_every '1000' --enc_optimizer 'adam,lr=0.0001' --epoch_size '500000' --stopping_criterion 'bleu_unsupervised_criterion,10'

So, where is the fault?

cannot reproduce the results of unsupervised NMT on ende/deen translation task

I can reproduce the results of unsupervised nmt on enfr/fren translation task after 125 epochs:

epoch -> 125.000000
ppl_en_fr_valid -> 36.457879
bleu_en_fr_valid -> 21.250000
ppl_fr_en_valid -> 49.971469
bleu_fr_en_valid -> 20.360000
ppl_en_fr_test -> 20.280857
bleu_en_fr_test -> 24.520000
ppl_fr_en_test -> 28.081766
bleu_fr_en_test -> 24.040000
ppl_fr_en_fr_valid -> 1.869078
bleu_fr_en_fr_valid -> 60.330000
ppl_fr_en_fr_test -> 1.725349
bleu_fr_en_fr_test -> 61.530000
ppl_en_fr_en_valid -> 1.827019
bleu_en_fr_en_valid -> 62.540000
ppl_en_fr_en_test -> 1.849352
bleu_en_fr_en_test -> 62.170000

But I cannot reproduce the results of unsupervised NMT on ende/deen translation task.

I followed the settings in paper and extracted en/de monolingual dataset from wmt14 to wmt17，used newstest2015 as validation set， and newstest2016 as test set. Other settings are same with the settings on enfr translation task.

My command is

python main.py --exp_name test --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 3 --share_dec 3 --share_lang_emb True --share_output_emb True --langs 'de,en' --n_mono -1 --mono_dataset 'en:./data_ende/mono/all.en.tok.60000.pth,,;de:./data_ende/mono/all.de.tok.60000.pth,,' --para_dataset 'de-en:,./data_ende/para/dev/newstest2015-ende-ref.XX.60000.pth,./data_ende/para/dev/newstest2016-ende-src.XX.60000.pth' --mono_directions 'de,en' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'en-de-en,de-en-de' --pretrained_emb './data_ende/mono/all.en-de.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 30 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_de_en_valid,100

After 122 epochs, the bleu on testset is still lower than the reported results:

epoch -> 122.000000
ppl_de_en_valid -> 54.630438
bleu_de_en_valid -> 16.020000
ppl_en_de_valid -> 57.778759
bleu_en_de_valid -> 12.210000
ppl_de_en_test -> 41.127902
bleu_de_en_test -> 17.890000
ppl_en_de_test -> 42.743846
bleu_en_de_test -> 13.520000
ppl_de_en_de_valid -> 2.874830
bleu_de_en_de_valid -> 38.840000
ppl_de_en_de_test -> 2.799033
bleu_de_en_de_test -> 38.160000
ppl_en_de_en_valid -> 2.646685
bleu_en_de_en_valid -> 46.750000
ppl_en_de_en_test -> 2.542617
bleu_en_de_en_test -> 47.100000

I also tried to share all encoder layers and decoder layers following the settings in the paper:

python main.py --exp_name test --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 4 --share_dec 4 --share_lang_emb True --share_output_emb True --langs 'de,en' --n_mono -1 --mono_dataset 'en:./data_ende/mono/all.en.tok.60000.pth,,;de:./data_ende/mono/all.de.tok.60000.pth,,' --para_dataset 'de-en:,./data_ende/para/dev/newstest2015-ende-ref.XX.60000.pth,./data_ende/para/dev/newstest2016-ende-src.XX.60000.pth' --mono_directions 'de,en' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'en-de-en,de-en-de' --pretrained_emb './data_ende/mono/all.en-de.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 30 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_de_en_valid,100

But got the similar results with only share 3 encoder layers and 3 decoder layers.

Is there anyone could reproduce the results on ende/deen translation task? Is there anything that I miss?

out of memory

my memory usage is increased by 0.2M everytime when decoding，does anyone have the same problem？ I mean memory but not cuda memory

Just use the language model without back translation

Hello, I just want to use language model but don't use backTranslation, what should I do? I obversed two parameters: --otf_update_enc and --otf_update_dec. Is there a parameter to set the step of Back Translation?

I am looking forward to replying to me.

	def otf_bt_gen_async(self, init_cache_size=None):
	logger.info("Populating initial OTF generation cache ...")
	if init_cache_size is None:
	init_cache_size = self.num_replicas
	cache = [
	self.call_async(rank=i % self.num_replicas, action='_async_otf_bt_gen',
	result_type='otf_gen', fetch_all=True,
	batches=self.get_worker_batches())
	for i in range(init_cache_size)
	]
	while True:
	results = cache[0].gen()
	for rank, _ in results:
	cache.pop(0) # keep the cache a fixed size
	cache.append(
	self.call_async(rank=rank, action='_async_otf_bt_gen',
	result_type='otf_gen', fetch_all=True,
	batches=self.get_worker_batches())
	)
	for _, result in results:
	yield result

facebookresearch / unsupervisedmt Goto Github PK

unsupervisedmt's Issues

Recommend Projects

Recommend Topics

Recommend Org