Giter Site home page Giter Site logo

unsupervisedmt's Introduction

Unsupervised Machine Translation

This repository contains the original implementation of the unsupervised PBSMT and NMT models presented in
Phrase-Based & Neural Unsupervised Machine Translation (EMNLP 2018).

Note: for the NMT approach, we recommend you have a look at Cross-lingual Language Model Pretraining and the associated GitHub repository https://github.com/facebookresearch/XLM which contains a better model and a more efficient implementation of unsupervised machine translation.

Model

The NMT implementation supports:

  • Three machine translation architectures (seq2seq, biLSTM + attention, Transformer)
  • Ability to share an arbitrary number of parameters across models / languages
  • Denoising auto-encoder training
  • Parallel data training
  • Back-parallel data training
  • On-the-fly multithreaded generation of back-parallel data

As well as other features not used in the original paper (and left for future work):

  • Arbitrary number of languages during training
  • Language model pre-training / co-training with shared parameters
  • Adversarial training

The PBSMT implementation supports:

  • Unsupervised phrase-table generation scripts
  • Automated Moses training

Dependencies

  • Python 3
  • NumPy
  • PyTorch (currently tested on version 0.5)
  • Moses (clean and tokenize text / train PBSMT model)
  • fastBPE (generate and apply BPE codes)
  • fastText (generate embeddings)
  • MUSE (generate cross-lingual embeddings)

For the NMT implementation, the NMT/get_data_enfr.sh script will take care of installing everything (except PyTorch). The same script is also provided for English-German: NMT/get_data_deen.sh. The NMT implementation only requires Moses preprocessing scripts, which does not require to install Moses.

The PBSMT implementation will require a working implementation of Moses, which you will have to install by yourself. Compiling Moses is not always straightforward, a good alternative is to download the binary executables.

Unsupervised NMT

Download / preprocess data

The first thing to do to run the NMT model is to download and preprocess data. To do so, just run:

git clone https://github.com/facebookresearch/UnsupervisedMT.git
cd UnsupervisedMT/NMT
./get_data_enfr.sh

The script will successively:

  • Install tools
    • Download Moses scripts
    • Download and compile fastBPE
    • Download and compile fastText
  • Download and prepare monolingual data
    • Download / extract / tokenize monolingual data
    • Generate and apply BPE codes on monolingual data
    • Extract training vocabulary
    • Binarize monolingual data
  • Download and prepare parallel data (for evaluation)
    • Download / extract / tokenize parallel data
    • Apply BPE codes on parallel data with training vocabulary
    • Binarize parallel data
  • Train cross-lingual embeddings

get_data_enfr.sh contains a few parameters defined at the beginning of the file:

  • N_MONO number of monolingual sentences for each language (default 10000000)
  • CODES number of BPE codes (default 60000)
  • N_THREADS number of threads in data preprocessing (default 48)
  • N_EPOCHS number of fastText epochs (default 10)

Adding more monolingual data will improve the performance, but will take longer to preprocess and train (10 million sentences is what was used in the paper for NMT). The script should output a data summary that contains the location of all files required to start experiments:

Monolingual training data:
    EN: ./data/mono/all.en.tok.60000.pth
    FR: ./data/mono/all.fr.tok.60000.pth
Parallel validation data:
    EN: ./data/para/dev/newstest2013-ref.en.60000.pth
    FR: ./data/para/dev/newstest2013-ref.fr.60000.pth
Parallel test data:
    EN: ./data/para/dev/newstest2014-fren-src.en.60000.pth
    FR: ./data/para/dev/newstest2014-fren-src.fr.60000.pth

Concatenated data in: ./data/mono/all.en-fr.60000
Cross-lingual embeddings in: ./data/mono/all.en-fr.60000.vec

Note that there are several ways to train cross-lingual embeddings:

  • Train monolingual embeddings separately for each language, and align them with MUSE (please refer to the original paper for more details).
  • Concatenate the source and target monolingual corpora in a single file, and train embeddings with fastText on that generated file (this is what is implemented in the get_data_enfr.sh script).

The second method works better when the source and target languages are similar and share a lot of common words (such as French and English). However, when the overlap between the source and target vocabulary is too small, the alignment will be very poor and you should opt for the first method using MUSE to generate your cross-lingual embeddings.

Train the NMT model

Given binarized monolingual training data, parallel evaluation data, and pretrained cross-lingual embeddings, you can train the model using the following command:

python main.py 

## main parameters
--exp_name test                             # experiment name

## network architecture
--transformer True                          # use a transformer architecture
--n_enc_layers 4                            # use 4 layers in the encoder
--n_dec_layers 4                            # use 4 layers in the decoder

## parameters sharing
--share_enc 3                               # share 3 out of the 4 encoder layers
--share_dec 3                               # share 3 out of the 4 decoder layers
--share_lang_emb True                       # share lookup tables
--share_output_emb True                     # share projection output layers

## datasets location
--langs 'en,fr'                             # training languages (English, French)
--n_mono -1                                 # number of monolingual sentences (-1 for everything)
--mono_dataset $MONO_DATASET                # monolingual dataset
--para_dataset $PARA_DATASET                # parallel dataset

## denoising auto-encoder parameters
--mono_directions 'en,fr'                   # train the auto-encoder on English and French
--word_shuffle 3                            # shuffle words
--word_dropout 0.1                          # randomly remove words
--word_blank 0.2                            # randomly blank out words

## back-translation directions
--pivo_directions 'en-fr-en,fr-en-fr'       # back-translation directions (en->fr->en and fr->en->fr)

## pretrained embeddings
--pretrained_emb $PRETRAINED                # cross-lingual embeddings path
--pretrained_out True                       # also pretrain output layers

## dynamic loss coefficients
--lambda_xe_mono '0:1,100000:0.1,300000:0'  # auto-encoder loss coefficient
--lambda_xe_otfd 1                          # back-translation loss coefficient

## CPU on-the-fly generation
--otf_num_processes 30                      # number of CPU jobs for back-parallel data generation
--otf_sync_params_every 1000                # CPU parameters synchronization frequency

## optimization
--enc_optimizer adam,lr=0.0001              # model optimizer
--group_by_size True                        # group sentences by length inside batches
--batch_size 32                             # batch size
--epoch_size 500000                         # epoch size
--stopping_criterion bleu_en_fr_valid,10    # stopping criterion
--freeze_enc_emb False                      # freeze encoder embeddings
--freeze_dec_emb False                      # freeze decoder embeddings


## With
MONO_DATASET='en:./data/mono/all.en.tok.60000.pth,,;fr:./data/mono/all.fr.tok.60000.pth,,'
PARA_DATASET='en-fr:,./data/para/dev/newstest2013-ref.XX.60000.pth,./data/para/dev/newstest2014-fren-src.XX.60000.pth'
PRETRAINED='./data/mono/all.en-fr.60000.vec'

Some parameters must respect a particular format:

  • langs
    • A list of languages, sorted by language ID.
    • en,fr for "English and French"
    • de,en,es,fr for "German, English, Spanish and French"
  • mono_dataset
    • A dictionary that maps a language to train, validation and test files.
    • Validation and test files are optional (usually we only need them for training).
    • en:train.en,valid.en,test.en;fr:train.fr,valid.fr,test.fr
  • para_dataset
    • A dictionary that maps a language pair to train, validation and test files.
    • Training file is optional (in unsupervised MT we only use parallel data for evaluation).
    • en-fr:train.en-fr.XX,valid.en-fr.XX,test.en-fr.XX to indicate the validation and test paths.
  • mono_directions
    • A list of languages on which we want to train the denoising auto-encoder.
    • en,fr to train the auto-encoder both on English and French.
  • para_directions
    • A list of tuples on which we want to train the MT system in a standard supervised way.
    • en-fr,fr-de will train the model in both the en->fr and fr->de directions.
    • Requires to provide the model with parallel data.
  • pivo_directions
    • A list of triplets on which we want to perform back-translation.
    • fr-en-fr,en-fr-en will train the model on the fr->en->fr and en->fr->en directions.
    • en-fr-de,de-fr-en will train the model on the en->fr->de and de->fr->en directions (assuming that fr is the unknown language, and that English-German parallel data is provided).

Other parameters:

  • --otf_num_processes 30 indicates that 30 CPU threads will be generating back-translation data on the fly, using the current model parameters
  • --otf_sync_params_every 1000 indicates that models on CPU threads will be synchronized every 1000 training steps
  • --lambda_xe_otfd 1 means that the coefficient associated to the back-translation loss is fixed to a constant of 1
  • --lambda_xe_mono '0:1,100000:0.1,300000:0' means that the coefficient associated to the denoising auto-encoder loss is initially set to 1, will linearly decrease to 0.1 over the first 100000 steps, then to 0 over the following 200000 steps, and will finally be equal to 0 during the remaining of the experiment (i.e. we train with back-translation only)

Putting all this together, the training command becomes:

python main.py --exp_name test --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 3 --share_dec 3 --share_lang_emb True --share_output_emb True --langs 'en,fr' --n_mono -1 --mono_dataset 'en:./data/mono/all.en.tok.60000.pth,,;fr:./data/mono/all.fr.tok.60000.pth,,' --para_dataset 'en-fr:,./data/para/dev/newstest2013-ref.XX.60000.pth,./data/para/dev/newstest2014-fren-src.XX.60000.pth' --mono_directions 'en,fr' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'fr-en-fr,en-fr-en' --pretrained_emb './data/mono/all.en-fr.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 30 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_en_fr_valid,10

On newstest2014 en-fr, the above command should give above 23.0 BLEU after 25 epochs (i.e. after one day of training on a V100).

Unsupervised PBSMT

Running the PBSMT approach requires to have a working version of Moses. On some systems Moses is not very straightforward to compile, and it is sometimes much simpler to download the binaries directly.

Once you have a working version of Moses, edit the MOSES_PATH variable inside the PBSMT/run.sh script to indicate the location of Moses directory. Then, simply run:

cd PBSMT
./run.sh

The script will successively:

  • Install tools
    • Check Moses files
    • Download MUSE and download evaluation files
  • Download pretrained word embeddings
  • Download and prepare monolingual data
    • Download / extract / tokenize monolingual data
    • Learn truecasers and apply them on monolingual data
    • Learn and binarize language models for Moses decoding
  • Download and prepare parallel data (for evaluation):
    • Download / extract / tokenize parallel data
    • Truecase parallel data
  • Run MUSE to generate cross-lingual embeddings
  • Generate an unsupervised phrase-table using MUSE alignments
  • Run Moses
    • Create Moses configuration file
    • Run Moses on test sentences
    • Detruecase translations
  • Evaluate translations

run.sh contains a few parameters defined at the beginning of the file:

  • MOSES_PATH folder containing Moses installation
  • N_MONO number of monolingual sentences for each language (default 10000000)
  • N_THREADS number of threads in data preprocessing (default 48)
  • SRC source language (default English)
  • TGT target language (default French)

The script should return something like this:

BLEU = 13.49, 51.9/21.1/10.2/5.2 (BP=0.869, ratio=0.877, hyp_len=71143, ref_len=81098)
End of training. Experiment is stored in: ./UnsupervisedMT/PBSMT/moses_train_en-fr

If you use 50M instead of 10M sentences in your language model, you should get BLEU = 15.66, 52.9/23.2/12.3/7.0. Using a bigger language model, as well as phrases instead of words, will improve the results even further.

References

Please cite [1] and [2] if you found the resources in this repository useful.

[1] G. Lample, M. Ott, A. Conneau, L. Denoyer, MA. Ranzato Phrase-Based & Neural Unsupervised Machine Translation

Phrase-Based & Neural Unsupervised Machine Translation

@inproceedings{lample2018phrase,
  title={Phrase-Based \& Neural Unsupervised Machine Translation},
  author={Lample, Guillaume and Ott, Myle and Conneau, Alexis and Denoyer, Ludovic and Ranzato, Marc'Aurelio},
  booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year={2018}
}

Unsupervised Machine Translation With Monolingual Data Only

[2] G. Lample, A. Conneau, L. Denoyer, MA. Ranzato Unsupervised Machine Translation With Monolingual Data Only

@inproceedings{lample2017unsupervised,
  title = {Unsupervised machine translation using monolingual corpora only},
  author = {Lample, Guillaume and Conneau, Alexis and Denoyer, Ludovic and Ranzato, Marc'Aurelio},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year = {2018}
}

Word Translation Without Parallel Data

[3] A. Conneau*, G. Lample*, L. Denoyer, MA. Ranzato, H. Jégou, Word Translation Without Parallel Data

* Equal contribution. Order has been determined with a coin flip.

@inproceedings{conneau2017word,
  title = {Word Translation Without Parallel Data},
  author = {Conneau, Alexis and Lample, Guillaume and Ranzato, Marc'Aurelio and Denoyer, Ludovic and J\'egou, Herv\'e},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year = {2018}
}

License

See the LICENSE file for more details.

unsupervisedmt's People

Contributors

glample avatar louismartin avatar sedflix avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

unsupervisedmt's Issues

Fast text step taking long time - alternatives ?

Hi,
While Running get_data.sh , the last step to generate Cross-lingual embeddings (.vec) file using fast text is taking a long time (ran it for a day on 4 CPU's, still running ).
image

As an alterantive to this all.en-fr.60000.vec file, can I just download the crawl-vectors for English and French from this site - https://github.com/facebookresearch/fastText/blob/master/docs/crawl-vectors.md and pass it as argument to the main file, something like below:
--pretrained_emb './data/mono/cc.fr.300.vec, ./data/mono/cc.fr.300.vec'

Thanks !

Mohammed Ayub

*help* I am confused about why this function just using the catche[0]?

def otf_bt_gen_async(self, init_cache_size=None):
logger.info("Populating initial OTF generation cache ...")
if init_cache_size is None:
init_cache_size = self.num_replicas
cache = [
self.call_async(rank=i % self.num_replicas, action='_async_otf_bt_gen',
result_type='otf_gen', fetch_all=True,
batches=self.get_worker_batches())
for i in range(init_cache_size)
]
while True:
results = cache[0].gen()
for rank, _ in results:
cache.pop(0) # keep the cache a fixed size
cache.append(
self.call_async(rank=rank, action='_async_otf_bt_gen',
result_type='otf_gen', fetch_all=True,
batches=self.get_worker_batches())
)
for _, result in results:
yield result

Thank authors released the code . I am confused the function why this function just using the catche[0] to get results and not using other caches, whether it means the other is useless or not ?or other caches has been used in a unseen way?

please help me figure out this, thanks a lot.

Why I get BLEU 1.01 on zh-en of PBSMT ?

Hi,

I was confused for several days.

I followed the steps of PBSMT/run.sh to do my work, and I think the most important step is "Running MUSE to generate cross-lingual embeddings". I aligned the 'zh' and 'en' pre-trained word vectors you provided on [https://fasttext.cc/docs/en/crawl-vectors.html] with MUSE, and got "Adv-NN P@1=21.3、Adv-CSLS P@1=26.9、Adv-Refine-NN P@1=18.5、Adv-Refine-CSLS P@1=24.0".

Then, I used the aligned embeddings to generate the phrase-table, but finally I got BLEU of 1.01. I don't think the result is right. Something must have gone wrong.

My command of MUSE is:
python unsupervised.py --src_lang ch \ --tgt_lang en \ --src_emb /data/experiment/embeddings/wiki.ch.300.vec.20w \ --tgt_emb /data/experiment/embeddings/wiki.en.300.vec.20w \ --exp_name test \ --exp_id 0 \ --normalize_embeddings center \ --emb_dim 300 \ --dis_most_frequent 50000 \ --epoch_size 500000 \ --dico_eval /data/experiment/unsupervisedMT/fordict/zh-en.5000-6500.sim.txt \ --n_refinement 5 \ --export "pth"

My command for generate phrase table is:
python create-phrase-table.py \ --src_lang $SRC \ --tgt_lang $TGT \ --src_emb $ALIGNED_EMBEDDINGS_SRC \ --tgt_emb $ALIGNED_EMBEDDINGS_TGT \ --csls 1 \ --max_rank 200 \ --max_vocab 300000 \ --inverse_score 1 \ --temperature 45 \ --phrase_table_path ${PHRASE_TABLE_PATH::-3}

Does the problem lay in the word embeddings, shoud I use the word embeddings trained on my training data with fastText for MUSE? I have tried it (use the word embedding trained on my training data), but got "Adv-NN P@1=0.07、Adv-CSLS P@1=0.07、Adv-Refine-NN P@1=0.00、Adv-Refine-CSLS P@1=0.00". My command is : ./fasttext skipgram -epoch 10 -minCount 0 -dim 300 -thread 48 -ws 5 -neg 10 -input $SRC_TOK -output $EMB_SRC. So I did't use the word embedding generated on training data , because I think I didn't align them well.

So, where is the fault?

cannot reproduce the results of unsupervised NMT

I use the command python main.py --exp_name test --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 3 --share_dec 3 --share_lang_emb True --share_output_emb True --langs 'en,fr' --n_mono -1 --mono_dataset 'en:./data/mono/all.en.tok.60000.pth,,;fr:./data/mono/all.fr.tok.60000.pth,,' --para_dataset 'en-fr:,./data/para/dev/newstest2013-ref.XX.60000.pth,./data/para/dev/newstest2014-fren-src.XX.60000.pth' --mono_directions 'en,fr' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'fr-en-fr,en-fr-en' --pretrained_emb './data/mono/all.en-fr.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 30 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_en_fr_valid,10

to run the codes, but finally I can just get BLEU score:
bleu_en_fr_test -> 18.400000
bleu_fr_en_test -> 18.610000

I cannot get the score above 23.0 BLEU. This the final log:
INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - BLEU ./dumped/test/2w3b5142k5/hyp107.en-fr-en.test.txt ./dumped/test/2w3b5142k5/ref.fr-en.test.txt : 45.160000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - epoch -> 107.000000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_en_fr_valid -> 65.701571 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_en_fr_valid -> 15.980000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_fr_en_valid -> 91.976447 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_fr_en_valid -> 15.940000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_en_fr_test -> 41.084037 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_en_fr_test -> 18.400000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_fr_en_test -> 59.131132 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_fr_en_test -> 18.610000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_fr_en_fr_valid -> 3.330721 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_fr_en_fr_valid -> 44.190000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_fr_en_fr_test -> 2.995039 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_fr_en_fr_test -> 44.170000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_en_fr_en_valid -> 3.518063 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_en_fr_en_valid -> 45.580000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - ppl_en_fr_en_test -> 3.443367 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - bleu_en_fr_en_test -> 45.160000 INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - __log__:{"epoch": 107, "ppl_en_fr_valid": 65.70157069383522, "bleu_en_fr_valid": 15.98, "ppl_fr_en_valid": 91.9764473614227, "bleu_fr_en_valid": 15.94, "ppl_en_fr_test": 41.084036855000534, "bleu_en_fr_test": 18.4, "ppl_fr_en_test": 59.131132225283814, "bleu_fr_en_test": 18.61, "ppl_fr_en_fr_valid": 3.3307206599236614, "bleu_fr_en_fr_valid": 44.19, "ppl_fr_en_fr_test": 2.9950391883629113, "bleu_fr_en_fr_test": 44.17, "ppl_en_fr_en_valid": 3.518062580498395, "bleu_en_fr_en_valid": 45.58, "ppl_en_fr_en_test": 3.4433665669537183, "bleu_en_fr_en_test": 45.16} INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - Not a better validation score (10 / 10). INFO - 09/04/18 17:44:09 - 1 day, 21:24:48 - Stopping criterion has been below its best value more than 10 epochs. Ending the experiment...

I follow the README step by step. So is there anything that I miss?

AttributeError: 'torch.dtype' object has no attribute 'type'

After complete preprocessing, I begin to train with the same configure as specified in https://github.com/facebookresearch/UnsupervisedMT.

Meet the following problem:

INFO - 08/22/18 10:26:03 - 0:01:30 - Populating initial OTF generation cache ...
INFO - 08/22/18 10:26:03 - 0:01:30 - Creating new training otf,en iterator ...
INFO - 08/22/18 10:26:08 - 0:01:35 - Creating new training otf,fr iterator ...
Traceback (most recent call last):
  File "main.py", line 333, in <module>
    trainer.iter()
  File "/home/zhangruiqing01/MT/lrl/UnsupervisedMT/NMT/src/trainer.py", line 721, in iter
    self.print_stats()
  File "/home/zhangruiqing01/MT/lrl/UnsupervisedMT/NMT/src/trainer.py", line 749, in print_stats
    for k, l in mean_loss if len(self.stats[l]) > 0])
  File "/home/zhangruiqing01/MT/lrl/UnsupervisedMT/NMT/src/trainer.py", line 749, in <listcomp>
    for k, l in mean_loss if len(self.stats[l]) > 0])
  File "/home/zhangruiqing01/tools/anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 2909, in mean
    out=out, **kwargs)
  File "/home/zhangruiqing01/tools/anaconda3/lib/python3.6/site-packages/numpy/core/_methods.py", line 80, in _mean
    ret = ret.dtype.type(ret / rcount)
AttributeError: 'torch.dtype' object has no attribute 'type'

ENV:
centos 6.3
torch.version : '0.5.0a0+9e75ec1'
numpy.version : '1.13.1'
Cuda 8.0
k40

RuntimeError: torch.cuda.FloatTensor is not enabled.

Hi,

I get the following error when I ran the code on my Mac machine (no GPU):


INFO - 09/19/18 10:23:37 - 0:00:16 - ============ Building transformer attention model - Decoder ...
INFO - 09/19/18 10:23:37 - 0:00:16 - Sharing decoder input embeddings
INFO - 09/19/18 10:23:38 - 0:00:17 - Sharing decoder transformer parameters for layer 0
INFO - 09/19/18 10:23:38 - 0:00:17 - Sharing decoder transformer parameters for layer 1
INFO - 09/19/18 10:23:38 - 0:00:17 - Sharing decoder transformer parameters for layer 2
INFO - 09/19/18 10:23:38 - 0:00:18 - Sharing decoder projection matrices

Traceback (most recent call last):
File "main.py", line 242, in
encoder, decoder, discriminator, lm = build_mt_model(params, data)
File "/Users/Ehsan/Documents/Ehsan_General/HMQ/HMQ_Projects/UnSup_MT/UnsupervisedMT/NMT/src/model/init.py", line 98, in build_mt_model
return build_attention_model(params, data, cuda=cuda)
File "/Users/Ehsan/Documents/Ehsan_General/HMQ/HMQ_Projects/UnSup_MT/UnsupervisedMT/NMT/src/model/attention.py", line 801, in build_attention_model
encoder.cuda()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 249, in cuda
return self._apply(lambda t: t.cuda(device))
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 176, in _apply
module._apply(fn)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 176, in _apply
module._apply(fn)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 182, in _apply
param.data = fn(param.data)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 249, in
return self._apply(lambda t: t.cuda(device))
RuntimeError: torch.cuda.FloatTensor is not enabled.


Is there any way that I can run this code on CPU?

Regards

Inference

Hello,
Can you help me how to infer with the saved checkpoint?
I am stuck with loading back the model to torch and call model.eval() function.
Thanks

Error when dimension of fastText embedding is changed

I changed the dimension of the fastText utility from 512 to 128 to adapt to my own corpora:

$FASTTEXT skipgram -epoch $N_EPOCHS -minCount 0 -dim **128** 
     -thread $N_THREADS -ws 5 -neg 10 -input $CONCAT_BPE -output $CONCAT_BPE

The result is as follows:

INFO - 09/16/18 15:31:04 - 0:00:05 - Reloading embeddings from ./mydata/mono/all.yy-xx.500.vec ...
Traceback (most recent call last):
  File "main.py", line 242, in <module>
    encoder, decoder, discriminator, lm = build_mt_model(params, data)
  File "/local/home/qwang/UnsupervisedMT/NMT/src/model/__init__.py", line 98, in build_mt_model
    return build_attention_model(params, data, cuda=cuda)
  File "/local/home/qwang/UnsupervisedMT/NMT/src/model/attention.py", line 813, in build_attention_model
    initialize_embeddings(encoder, decoder, params, data)
  File "/local/home/qwang/UnsupervisedMT/NMT/src/model/pretrain_embeddings.py", line 90, in initialize_embeddings
    pretrained_0, word2id_0 = reload_embeddings(params.pretrained_emb, params.emb_dim)
  File "/local/home/qwang/UnsupervisedMT/NMT/src/model/pretrain_embeddings.py", line 76, in reload_embeddings
    return reload_txt_emb(path, dim)
  File "/local/home/qwang/UnsupervisedMT/NMT/src/model/pretrain_embeddings.py", line 48, in reload_txt_emb
    assert dim == int(split[1])
AssertionError

No module named torch

Hi @glample

I'm getting an error while replicating the steps. I did install pytorch in my conda environment. It gives error saying "No Module name torch" I see that's being imported in /src/data/dictionary.py . Not sure if I'm missing something.

image

"RuntimeError: received 0 items of ancdata" in otf_bt_gen_async

H when I run the unsupervised NMT codes, it raises the error after running for a while:

Traceback (most recent call last): File "../main.py", line 317, in <module> batches = next(otf_iterator) File "/data/XXXX/nmt/unsupervised/UnsupervisedMT/NMT/src/trainer.py", line 561, in otf_bt_gen_async results = cache[0].gen() File "/data/XXXX/nmt/unsupervised/UnsupervisedMT/NMT/src/multiprocessing_event_loop.py", line 203, in gen return next(self.generator) File "/data/XXXX/nmt/unsupervised/UnsupervisedMT/NMT/src/multiprocessing_event_loop.py", line 73, in fetch_all_result_generator result_type, result = self.return_pipes[rank].recv() File "/home/XXXX/anaconda3/lib/python3.6/multiprocessing/connection.py", line 251, in recv return _ForkingPickler.loads(buf.getbuffer()) File "/home/XXXX/anaconda3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 201, in rebuild_storage_fd fd = df.detach() File "/home/XXXX/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach return reduction.recv_handle(conn) File "/home/XXXX/anaconda3/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle return recvfds(s, 1)[0] File "/home/XXXX/anaconda3/lib/python3.6/multiprocessing/reduction.py", line 161, in recvfds len(ancdata)) RuntimeError: received 0 items of ancdata

during the back translation process.

So what's the problem? Does it mean the workload of back translation is too heavy?

I put the full log here:
https://gist.github.com/tobyyouup/8426bb216d05482efd0bbdc8dcbd04e5

Look forward to your reply. Thanks!

Help understand the share* input arguments.

Thanks for releasing the code.

  1. Please help me understand these share* arguments in the input and their functions. Which of these arguments define sharing between forward and reverse translation models (src-tgt and tgt-src)?
    Here are the args:

share_lang_emb
share_encdec_emb
share_decpro_emb
share_output_emb
share_lstm_proj
share_enc
share_dec

  1. Also, can you provide some intuition on the scenarios when these should be changed from the default values? Like, when the languages are distant, or low-resource etc.

  2. For share_enc, share_dec, I understand that if we have 4 encoder and 4 decoder layers and I set these to 2 and 2 respectively, I am sharing the first 2 encoder/decoder layers. Is that correct? What happens in the case of the reverse translation model (tgt-src), are all of these shared?

  3. For share_decpro_emb, following Press and Wolf (2016), I understand the input and output embeddings for the decoder are shared. Currently, they are also tied to the reverse model decoder (tgt-src) because we have a joint vocabulary. How do I not share these decoder embeddings across languages (ex: distant pairs like en-hi)?

  4. For share_output_emb, when you say Share decoder output embeddings, sharing with what? (forward and reverse models?)

  5. In your Unsupervised NMT+ PBSMT paper, in section 4.3.1, it is said 'all lookup tables between encoder-decoder, and source-target language are shared'? Isn't later (src-tgt) a consequence of joint BPE vocabulary model? Also, can you clarify how many different look-up tables you are using and how that choice might be affected by the case of distant languages with different alphabets?

Thanks again,
Ashim

Help me understand the Output/Parameters and inference.

I am training spanish to english NMT model. It prints below logs after epoch 0 when i run NMT/main.py.

INFO - 09/24/18 12:38:38 - 1:47:10 - 600 - 12.01 sent/s - 377.00 words/s - XE-es-es: 4.5986 || XE-en-en: 4.9864 || XE-en-es-en: 5.7000 || XE-es-en-es: 5.5308 || ENC-L2-en: 4.8750 || ENC-L2-es: 4.7380 - LR enc=1.0000e-04,dec=1.0000e-04 - Sentences generation time: 225.13s (42.23%)

What confusing me is what does XE-es-es and XE-en-en means ? Shouldn't it be XE-es-en and XE-en-es, if it's the loss? Also if anyone could explain all the parameters being printed, during training, it would be helpful in understanding on what is happenning.

what does the "pretrained_out=True" mean?

I cannot figure out what do you mean by "pretrained_out=True". I didn't find how you use it. Do you mean the output projection matrix is also initialized from the pretrainied word embedding? Could you please show where you use it? Thank you. Actually, now my code on tensorflow is almost the same with yours. But I cannot get the comparable results with your code.

Interpret the result of PBSMT

Could somebody explain to me what each of the numbers below means?

BLEU = 13.49, 51.9/21.1/10.2/5.2 (BP=0.869, ratio=0.877, hyp_len=71143, ref_len=81098)
End of training. Experiment is stored in: ./UnsupervisedMT/PBSMT/moses_train_en-fr

Thanks

Cannot initialize CUDA without ATen_cuda library

While I am trying to train an NMT model, I got the following error message. I build PyTorch 0.5 from source on Ubuntu 16.04 running Python3.5.

Traceback (most recent call last):
File "main.py", line 242, in
encoder, decoder, discriminator, lm = build_mt_model(params, data)
File "/home/gezmu/UnsupervisedMT/NMT/src/model/init.py", line 98, in build_mt_model
return build_attention_model(params, data, cuda=cuda)
File "/home/gezmu/UnsupervisedMT/NMT/src/model/attention.py", line 801, in build_attention_model
encoder.cuda()
File "/home/gezmu/umt/lib/python3.5/site-packages/torch/nn/modules/module.py", line 258, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/gezmu/umt/lib/python3.5/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/home/gezmu/umt/lib/python3.5/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/home/gezmu/umt/lib/python3.5/site-packages/torch/nn/modules/module.py", line 191, in _apply
param.data = fn(param.data)
File "/home/gezmu/umt/lib/python3.5/site-packages/torch/nn/modules/module.py", line 258, in
return self._apply(lambda t: t.cuda(device))
RuntimeError: Cannot initialize CUDA without ATen_cuda library. PyTorch splits its backend into two shared libraries: a CPU library and a CUDA library; this error has occurred because you are trying to use some CUDA functionality, but the CUDA library has not been loaded by the dynamic linker for some reason. The CUDA library MUST be loaded, EVEN IF you don't directly use any symbols from the CUDA library! One common culprit is a lack of -Wl,--no-as-needed in your link arguments; many dynamic linkers will delete dynamic library dependencies if you don't depend on any of their symbols. You can check if this has occurred by using ldd on your binary to see if there is a dependency on *_cuda.so library.

unsupervised model selection criterion on en-fr

Hi,
I use the unsupervised criterion based on the BLEU score of a “round-trip” translation as in Lample et al. (2018), but the valid is newstest2013 that is not extracted from th monolingal training corpora. My training corpus is the same as Unsuperivised machine translation using monolingual corpora only. The training stopped at 54 epoch and get BLEU 18.70 for newstest2014. I believe that if training could longer, bleu will be higher. How to interpret the traininh stopping so soon?
My command is

python main.py --exp_name 'base' --transformer 'True' --n_enc_layer '3' --n_dec_layer '3' --share_enc '3' --share_dec '3' --share_lang_emb 'True' --share_output_emb 'True' --langs 'en,fr' --n_mono '-1' --mono_dataset 'en:./data/mono/all.en.tok.60000.pth,,;fr:./data/mono/all.fr.tok.60000.pth,,' --para_dataset 'en-fr:,./data/para/dev/newstest2013-ref.XX.60000.pth,./data/para/dev/newstest2014-fren-src.XX.60000.pth' --mono_directions 'en,fr' --word_dropout '0.1' --word_blank '0.2' --pivo_directions 'fr-en-fr,en-fr-en' --pretrained_emb './data/mono/all.en-fr.60000.vec' --pretrained_out 'True' --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd '1' --otf_num_processes '12' --otf_sync_params_every '1000' --enc_optimizer 'adam,lr=0.0001' --epoch_size '500000' --stopping_criterion 'bleu_unsupervised_criterion,10'

So, where is the fault?

how long does NMT model run on GPU's?

Hi,

I'm trying to estimate cost for developing different models using this repo. Has anyone ran these experiments on multiple GPU's. How long should I expect the NMT model run for 1 GPU, 4 GPU and 8 GPU configs.

Any help appreciated
Cheers.

Mohammed Ayub

Problem with xe_costs_en_fr_en loss when reloading checkpoint

Hello,
I have a problem when I stop then restart training ... When I stop training, I have a xe_costs_en_fr_en loss value of about 0.8, but when I restart training, the value is 6; even if the checkpoint has been successfully reloaded, this back-translation loss seems to begin from scratch again (the other ones are ok). Would you know why ?( I am asking this because I'd like to add some complementary losses at certain times of training )
Thank you very much!

ModuleNotFoundError: No module named 'fb' when running NMT training script

While trying to run the training script python3 main.py I receive the following error:

File "<my_project_directory>/NMT/src/data/loader.py", line 164, in load_para_data
data1 = load_binarized(path.replace('XX', lang1), params)
File "<my_project_directory>/NMT/src/data/loader.py", line 32, in load_binarized
data = torch.load(path)
File "<my_virtualenv_directory>/lib/python3.6/site-packages/torch/serialization.py", line 358, in load
return _load(f, map_location, pickle_module)
File "<my_virtualenv_directory>/lib/python3.6/site-packages/torch/serialization.py", line 542, in _load
result = unpickler.load()
ModuleNotFoundError: No module named 'fb'

It seems that there is some dependency missing that I'm not aware of.
I use Ubuntu 18.04.1, Python 3.6.6 and PyTorch 0.4.1

About decoding

Hi, I noticed that generating greedily in decoding time. Is there a big improvement in performance if we use beam size in training time? Thank you.

RuntimeError: CUDA error: out of memory

When I run the unsupervised NMT codes, the following error is reported. My run command is as follows. Is there any parameter that is too large?

python main.py --exp_name mnzh --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 3 --share_dec 3 --share_lang_emb True --share_output_emb True --langs 'en,fr' --n_mono -1 --mono_dataset 'en:./data/mono/all.en.tok.60000.pth,,;fr:./data/mono/all.fr.tok.60000.pth,,' --para_dataset 'en-fr:,./data/para/dev/newstest2013-ref.XX.60000.pth,./data/para/dev/newstest2014-fren-src.XX.60000.pth' --mono_directions 'en,fr' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'fr-en-fr,en-fr-en' --pretrained_emb './data/mono/all.en-fr.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 8 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_en_fr_valid,10

image

GPU: GTX 1070
Python version: 3.6.2
Pytorch version: 0.4.1
CUDA Version 8.0.44

Look forward to your reply. Thanks!

Raise EOFError

When the NMT runs for a while, the following error is reported. My run command is as follows.
Is there any parameter that is too large? How to improve the stability of the program?

python main.py  -exp_name base3  --transformer True  --n_enc_layer 4  --n_dec_layer 4  --share_enc 3     --share_dec 3 --share_lang_emb True  --share_output_emb True --langs 'en,fr' --n_mono -1 --mono_dataset 'en:./data/mono/all.en.tok.60000.pth,,;fr:./data/mono/all.fr.tok.60000.pth,,' 
--para_dataset 'en-fr:,./data/para/dev/newstest2013-ref.XX.60000.pth,./data/para/dev/newstest2014-fren-src.XX.60000.pth' 
--mono_directions 'en,fr'  --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'fr-en-fr,en-fr-en' 
--pretrained_emb './data/mono/all.en-fr.60000.vec' --pretrained_out True 
--lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 15 
--otf_sync_params_every 1000  --enc_optimizer adam,lr=0.0001  --epoch_size 500000 
--stopping_criterion bleu_en_fr_valid,10
--max_len 100

Traceback (most recent call last):
File "main.py", line 317, in
batches = next(otf_iterator)
File "/mnt/disk-c/liujiqiang/low-resourceMT/UnsupervisedMT/en2fr/baseline/NMT/src/trainer.py", line 561, in otf_bt_gen_async
results = cache[0].gen()
File "/mnt/disk-c/liujiqiang/low-resourceMT/UnsupervisedMT/en2fr/baseline/NMT/src/multiprocessing_event_loop.py", line 203, in gen
return next(self.generator)
File "/mnt/disk-c/liujiqiang/low-resourceMT/UnsupervisedMT/en2fr/baseline/NMT/src/multiprocessing_event_loop.py", line 73, in fetch_all_result_generator
result_type, result = self.return_pipes[rank].recv()
File "/home/liujq/Python-3.6.4/lib/python3.6/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/home/liujq/Python-3.6.4/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/liujq/Python-3.6.4/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError

cannot reproduce the results of unsupervised NMT on ende/deen translation task

I can reproduce the results of unsupervised nmt on enfr/fren translation task after 125 epochs:

epoch -> 125.000000
ppl_en_fr_valid -> 36.457879
bleu_en_fr_valid -> 21.250000
ppl_fr_en_valid -> 49.971469
bleu_fr_en_valid -> 20.360000
ppl_en_fr_test -> 20.280857
bleu_en_fr_test -> 24.520000
ppl_fr_en_test -> 28.081766
bleu_fr_en_test -> 24.040000
ppl_fr_en_fr_valid -> 1.869078
bleu_fr_en_fr_valid -> 60.330000
ppl_fr_en_fr_test -> 1.725349
bleu_fr_en_fr_test -> 61.530000
ppl_en_fr_en_valid -> 1.827019
bleu_en_fr_en_valid -> 62.540000
ppl_en_fr_en_test -> 1.849352
bleu_en_fr_en_test -> 62.170000

But I cannot reproduce the results of unsupervised NMT on ende/deen translation task.

I followed the settings in paper and extracted en/de monolingual dataset from wmt14 to wmt17,used newstest2015 as validation set, and newstest2016 as test set. Other settings are same with the settings on enfr translation task.

My command is

python main.py --exp_name test --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 3 --share_dec 3 --share_lang_emb True --share_output_emb True --langs 'de,en' --n_mono -1 --mono_dataset 'en:./data_ende/mono/all.en.tok.60000.pth,,;de:./data_ende/mono/all.de.tok.60000.pth,,' --para_dataset 'de-en:,./data_ende/para/dev/newstest2015-ende-ref.XX.60000.pth,./data_ende/para/dev/newstest2016-ende-src.XX.60000.pth' --mono_directions 'de,en' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'en-de-en,de-en-de' --pretrained_emb './data_ende/mono/all.en-de.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 30 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_de_en_valid,100

After 122 epochs, the bleu on testset is still lower than the reported results:

epoch -> 122.000000
ppl_de_en_valid -> 54.630438
bleu_de_en_valid -> 16.020000
ppl_en_de_valid -> 57.778759
bleu_en_de_valid -> 12.210000
ppl_de_en_test -> 41.127902
bleu_de_en_test -> 17.890000
ppl_en_de_test -> 42.743846
bleu_en_de_test -> 13.520000
ppl_de_en_de_valid -> 2.874830
bleu_de_en_de_valid -> 38.840000
ppl_de_en_de_test -> 2.799033
bleu_de_en_de_test -> 38.160000
ppl_en_de_en_valid -> 2.646685
bleu_en_de_en_valid -> 46.750000
ppl_en_de_en_test -> 2.542617
bleu_en_de_en_test -> 47.100000

I also tried to share all encoder layers and decoder layers following the settings in the paper:

python main.py --exp_name test --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 4 --share_dec 4 --share_lang_emb True --share_output_emb True --langs 'de,en' --n_mono -1 --mono_dataset 'en:./data_ende/mono/all.en.tok.60000.pth,,;de:./data_ende/mono/all.de.tok.60000.pth,,' --para_dataset 'de-en:,./data_ende/para/dev/newstest2015-ende-ref.XX.60000.pth,./data_ende/para/dev/newstest2016-ende-src.XX.60000.pth' --mono_directions 'de,en' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'en-de-en,de-en-de' --pretrained_emb './data_ende/mono/all.en-de.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 30 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_de_en_valid,100

But got the similar results with only share 3 encoder layers and 3 decoder layers.

Is there anyone could reproduce the results on ende/deen translation task? Is there anything that I miss?

How to add para_dataset src and tgt into a single .pth file

Currently the process.py file gives a .pth file for a single text file so for src and tgt I will have two src.pth and tgt.pth, but when I need to pass the files to main.py it expects in the format

src-tgt:train12,val12,test12

How to pass src and tgt .pth files separately ?

UserWarning: size_average and reduce args will be deprecated, please use reduction='elementwise_mean' instead.

Hi, I'm trying to train your unsupervised NMT system (using my own data), but I'm getting these UserWarnings all the time:

UserWarning: size_average and reduce args will be deprecated, please use reduction='elementwise_mean' instead.

I'm using pyTorch 0.4.1. I also tried it with 1.0 (by compiling from source, master branch), but the same thing happened. Is this a known issue?

Kind regards,
Laurens

RuntimeError: cuda runtime error (2)

While I am trying to train an NMT model, I got the following error message. What is important is that these error messages are generated after a period of successful training. I build PyTorch 0.4.0 from source on Centos running Python3.6. How to fix it?

THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
data_type: valid lang1: en lang2:fr
data_type: test lang1: en lang2:fr
Traceback (most recent call last):
File "main.py", line 328, in
trainer.otf_bt(batch, params.lambda_xe_otfd, params.otf_backprop_temperature)
File "/home/liujq/low-resourceMT/UnsupervisedMT/en2fr/baseline/NMT/src/trainer.py", line 706, in otf_bt
loss.backward()
File "/home/liujq/env_python3.6/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/liujq/env_python3.6/lib/python3.6/site-packages/torch/autograd/init.py", line 89, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

CUDA Out of Memory

Hi!
Is it normal that I am out of gpu memory when training the en-fr dataset on a GTX 1080Ti? (11 GB VRAM)
The training failed after 1000 step with the following message:

INFO - 09/08/18 11:28:33 - 0:29:30 -    1000 -   72.85 sent/s -  2105.00 words/s - XE-en-en:  3.7582 || XE-fr-fr:  3.5753 || XE-fr-en-fr:  5.0131 || XE-en-fr-en:  5.4994 || ENC-L2-en:  5.1436 || ENC-L2-fr:  5.1294 - LR dec=1.0000e-04,enc=1.0000e-04 - Sentences generation time:  56.93s (64.80%)
Traceback (most recent call last):
  File "main.py", line 328, in <module>
    trainer.otf_bt(batch, params.lambda_xe_otfd, params.otf_backprop_temperature)
  File "/data/UnsupervisedMT/NMT/src/trainer.py", line 706, in otf_bt
    loss.backward()
  File "/usr/local/lib/python3.5/dist-packages/torch/tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/usr/local/lib/python3.5/dist-packages/torch/autograd/__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA error: out of memory

Should I decrease the layer number of the transformer model or the size of the dataset?

Thanks

unable to understand the meaning of "otf_backprop_temperature".

Hi,
I was looking at the code for "otf_backprop_temperature" and while doing back-translation and keeping the default setting(=-1), back-translating from lang1-lang2-lang3 would prevent the lossy translations from lang1-lang2 and just look into translations from lang2-lang3. With the default values, is this understanding correct?

        if backprop_temperature == -1:
            # lang2 -> lang3
            encoded = self.encoder(sent2, len2, lang_id=lang2_id)
        else:
            # lang1 -> lang2
            encoded = self.encoder(sent1, len1, lang_id=lang1_id)
            scores = self.decoder(encoded, sent2[:-1], lang_id=lang2_id)
            assert scores.size() == (len2.max() - 1, bs, n_words2)

            # lang2 -> lang3
            bos = torch.cuda.FloatTensor(1, bs, n_words2).zero_()
            bos[0, :, params.bos_index[lang2_id]] = 1
            sent2_input = torch.cat([bos, F.softmax(scores / backprop_temperature, -1)], 0)
            encoded = self.encoder(sent2_input, len2, lang_id=lang2_id)

How to set otf_num_processes

Hi, it is not realistic to set otf_num_processes to 30 that is default as many people are using the server . How do you set this parameter and what does it affect? Thank you,

No language model pretraining in these results?

Hi @glample ,
I was reading through the paper and the code and realized that though you mention (in the paper) that pretraining the language model is really important(otherwise the back-translation wouldn't work well), you don't explicitly pretrain the LM in the code(especially in the snippet where you mentioned the training command for NMT)- explicitly, the (--lm_before) flag is not set and by default it is 0(no LM pretrain). So the results which you report in the paper are with the LM or without? Because I would expect that pretrained LM would increase performance. If it achieves this performance, isn't a bit strange?

Thanks for your time.
Pranay

Why sort the two languages

In check_all_data_params@src/data/loader.py I saw the following lines

assert sorted(params.langs) == params.langs
...
assert lang1 < lang2 and lang1 in params.langs and lang2 in params.langs

I currently want to adapt the UnsupervisedMT to my own corpora which has .yy suffix as source language and .xx suffix as target language. However, the checks in here make it impossible to pass to the training phase. I am wondering why there is a need to sort the languages. If I deleted these checks will they affect the later training phase?

Thanks.

Just use the language model without back translation

Hello, I just want to use language model but don't use backTranslation, what should I do? I obversed two parameters: --otf_update_enc and --otf_update_dec. Is there a parameter to set the step of Back Translation?

I am looking forward to replying to me.

Running the model completely on CPU gives -1 BLEU

Hi,

I'm trying to run the model training on CPU.
Changes Done:
I have removed all the references to cuda i.e. .cuda() mentions and also changed torch.cuda to just torch in the evaluator.py and trainer.py files .
Changed SIGUR1 to SIGTERM in mulitprocessing_event_loop.py because python multiprocessing in windows is different than in Linux.
I had to put torch.cuda.is_available() in build_mt_model(params, data, torch.cuda.is_available()) in line 243 main.py since I was getting a CUDA error.

With all the above changes done, I'm getting very different results when training, there seems to be an error calculating blue score as below.

image

@glample

Any help appreciated.

Thanks !

Mohammed Ayub

out of memory

my memory usage is increased by 0.2M everytime when decoding,does anyone have the same problem? I mean memory but not cuda memory

reproduction problem with en-de/de-en translation

Hi, I met the same reproduction problem of de-en/en-de. I followed the settings and get 12.21 for en-de and 8.03 for de-en.

My command is:
python main.py --exp_name 'de-en-55' --transformer 'True' --n_enc_layers '5' --n_dec_layers '5' --share_enc '5' --share_dec '5' --share_lang_emb 'True' --share_output_emb 'True' --langs 'de,en' --n_mono '-1' --mono_dataset 'en:./data/mono-de-en/all.en.tok.60000.pth,,;de:./data/mono-de-en/all.de.tok.60000.pth,,' --para_dataset 'de-en:,./data/para-de-en/dev/newstest2016-ende-src.XX.60000.pth,./data/para-de-en/dev/newstest2015-ende-ref.XX.60000.pth' --mono_directions 'de,en' --word_shuffle '3' --word_dropout '0.1' --word_blank '0.1' --pivo_directions 'de-en-de,en-de-en' --pretrained_emb './data/mono-de-en/all.en-de.60000.vec' --pretrained_out 'True' --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd '1' --otf_num_processes '30' --otf_sync_params_every '1000' --enc_optimizer 'adam,lr=0.0001' --epoch_size '500000' --stopping_criterion 'bleu_de_en_valid,10' --freeze_dec_emb 'True' --freeze_enc_emb 'False' --exp_id "7fvh95gae7"

The detailed result after 40 epochs is as followed:

40 bleu_en_de_test 11.49 bleu_de_en_test 7.49
41 bleu_en_de_test 12.09 bleu_de_en_test 8.03
42 bleu_en_de_test 12.26 bleu_de_en_test 7.25
43 bleu_en_de_test 11.78 bleu_de_en_test 7.64
44 bleu_en_de_test 11.76 bleu_de_en_test 7.57
45 bleu_en_de_test 12.19 bleu_de_en_test 7.76
46 bleu_en_de_test 12.19 bleu_de_en_test 7.69
47 bleu_en_de_test 12.09 bleu_de_en_test 7.74
48 bleu_en_de_test 11.72 bleu_de_en_test 7.81
49 bleu_en_de_test 12.23 bleu_de_en_test 7.47
50 bleu_en_de_test 12.06 bleu_de_en_test 7.89
51 bleu_en_de_test 12.29 bleu_de_en_test 7.89
52 bleu_en_de_test 12.21 bleu_de_en_test 7.88

Is there any setup that I misunderstand?

Curious- is there a way to not do BPE and do one hot encoding instead?

Hi,
I was working on BPE and wasn't really sure what was happening and so I wanted to try out doing one hot encoding (I know some words cannot be seen during train time but that's fine(aren't much for my dataset)) instead to see how to model performs. Is there a way I can plug in one hot encoding into this code?

Thanks,
Pranay

Inference again

Hi Glample,

Lately I have been trying to run the UnsupervisedMT in inference mode. Below is the command that at least work for me temporarily after many rounds of failed attempt.

python main.py \
  --exp_name mzpf \
  --exp_id infer \
  --reload_model './dumped/mzpf/infer/checkpoint.pth' \
  --eval_only 1 \
  --share_lang_emb True \
  --share_output_emb True \
  --langs 'en,fr' \
  --n_mono -1 \
  --reload_enc 1 \
  --reload_dec 1 \
  --reload_dis 0 \
  --mono_dataset 'en:./mzpf/mono/tok.unicode.shuffled.en.500.pth,,;fr:./mzpf/mono/tok.shuffled.fr.500.pth,,' \
  --para_dataset 'en-fr:,./mzpf/para/tok-para.shuffled.stripped.valid.XX.500.pth,./mzpf/para/infer.XX.500.pth' \
  --mono_directions 'en,fr' \
  --lambda_xe_mono '0:1,100000:0.1,300000:0'

In addition to some other parameters, it seems that the most important one is the --para_dataset, in which I have to trick the model by taking a parallel pair of inference set as test set. Currently I happen to have parallel data to run inference, but in case that I do not have parallel data, I am wondering how the inference mode is run? Or whether the above way of running inference as inspired by #15 is correct or not? Thanks.

[Question] Did you use Discriminator for recent results

Hi,

It is clear that you used discriminator for the models in the paper Unsupervised Machine Translation With Monolingual Data Only.

But do you also use it to obtain results for NMT models presented in Phrase-Based & Neural Unsupervised Machine Translation paper?

From the code it seems like if --n_dis parameter equals 0 (which is default), the discriminator is not used. You found it to have no positive effect?

Thanks!
Maksym

What should I do if the languages are unrelated

Hi @glample

Now my training languages are zh-en, as we all konw, they are unrelated。
According to my understanding, I did the following things:

  1. First, apply BPE on the two languages separately .
  2. Sencond, use fastText on two languages separately to get embeddings zh.vec and en.vec.
  3. Third, train MUSE on zh.vec and en.vec, get vectors-zh.txt, vectors-en.txt, and vectors-zh-en.txt. vectors-zh.txt, vectors-en.txt are aligned by Muse. vectors-zh-en.txt is formed by joining vectors-zh.txt vectors-zh.txt together.

Is there any problem with what I did above?
Should I put the two language embeddings together when running the code? Which of the following input arguments should I choose,1 or 2?

  1. --share_lang_emb False --share_output_emb False --pretrained_emb 'vectors-zh.txt,vectors-en.txt'
  2. --share_lang_emb True --share_output_emb True --pretrained_emb 'vectors-zh-en.txt'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.