Comments (15)
@akanshajainn for now the simplest way would be to take the sentences you want to translate, and define them as a test set. If you tokenize these sentences, split them into BPE pieces, and binarize them into a .pth, you can then load this data as if it was your evaluation set.
Then you can simply translate your new test set with the exact same command you used for training, but with these few extra arguments: --reload_model $MODEL_PATH --reload_enc 1 --reload_dec 1 --eval_only 1 --para_dataset $PARA_DATASET
. Where MODEL_PATH
is the path of your trained model, and PARA_DATASET
can be the same thing as before but where you replaced the original test set by your own test set.
Then the model will dump the translations of your new test set in the new dump_path
folder.
from unsupervisedmt.
@glample thank you fr this detailed explanations, this is the best and well documented repo for NMT. It would be very important to have inference from stdin
like marian nmt does or like startspace and fasttext do, that could be helpful to test the newly trained models live.
from unsupervisedmt.
glample/fastBPE@fea9a45 should help for inference. For the translation script, please check the new repo where I added this functionality: facebookresearch/XLM@0b193eb
from unsupervisedmt.
XE-en-en is the reconstruction loss of the adversarial auto-encoder for English
XE-es-es is the reconstruction loss of the adversarial auto-encoder for Spanish
It's noted XE-xx-xx where XE means cross-entropy (because we use a cross-entropy reconstruction loss).
XE-en-es-en is the back-translation loss when taking an English sentence, translating it to Spanish, and back to English
You will not have a XE-es-en loss, unless you provide Spanish-English parallel data.
ENC-L2-en is the average norm of the encoded English latent vectors. This is not very relevant, you can ignore this metric.
from unsupervisedmt.
No, the order does not matter for --pivo_directions
. If you swap the order it just means that at each iteration the model will perform one direction before the other, but this won't have any impact in practice.
from unsupervisedmt.
Hey! @mohammedayub44 I was interrupted by CUDA out of memory. I could achieve upto BLEU 17. Now I am stuck at how to infer using the trained model? @glample
from unsupervisedmt.
I opened #65 for interactive mode and/or run-time querying specifically.
from unsupervisedmt.
Okay thanks now it makes sense.. there is one more confusion for --pivo_directions parameter for NMT/main.py. We have to give back translation direction, Does the order matter? If it does, then which one is correct: tgt-src-tgt, src-tgt-src or src-tgt-src, tgt-src-tgt ? Same for three langs?
The confusion arises(Assuming src - en and tgt - fr) in the complete description of parameters of main.py it is given(src-tgt-src, tgt-src-tgt )
## back-translation directions
--pivo_directions 'en-fr-en,fr-en-fr' # back-translation directions (en->fr->en and fr->en->fr)
and below in section "Some parameters must respect a particular format:" it is written(tgt-src-tgt, src-tgt-src):
pivo_directions
A list of triplets on which we want to perform back-translation.
fr-en-fr,en-fr-en will train the model on the fr->en->fr and en->fr->en directions.
en-fr-de,de-fr-en will train the model on the en->fr->de and de->fr->en directions (assuming that fr is the unknown language, and that English-German parallel data is provided).
and in combined command the direction given are(tgt-src-tgt, src-tgt-src) :
python main.py --exp_name test --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 3 --share_dec 3 --share_lang_emb True --share_output_emb True --langs 'en,fr' --n_mono -1 --mono_dataset 'en:./data/mono/all.en.tok.60000.pth,,;fr:./data/mono/all.fr.tok.60000.pth,,' --para_dataset 'en-fr:,./data/para/dev/newstest2013-ref.XX.60000.pth,./data/para/dev/newstest2014-fren-src.XX.60000.pth' --mono_directions 'en,fr' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'fr-en-fr,en-fr-en' --pretrained_emb './data/mono/all.en-fr.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 30 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_en_fr_valid,10
Kindly help me clear this.
from unsupervisedmt.
@akanshajainn did you happen to get any benchmarks with the Spanish data ?
from unsupervisedmt.
@akanshajainn Thanks for that.
Hi @glample are there benchmark results you have run for Spanish data.
from unsupervisedmt.
Hi,
No, I have not tried Spanish. I expect it would perform similarly to En-Fr. BLEU might be different, but this may be due to your test set which is more difficult. Even in En-Fr the BLEU varies quite a lot depending on the version of newstest.
from unsupervisedmt.
Hey @glample thanks for the information. Now could you please also tell how to infer using the trained model? Another thing I am exploring is to infer on CPU. Is that also available ?
from unsupervisedmt.
@glample guessing an Inference tutorial would be really helpful for all.
from unsupervisedmt.
@mohammedayub44 Yes sorry about that. I'm very busy these days, but I will try to make a script to execute all these steps soon.
from unsupervisedmt.
@glample no worries at all. Thanks for your active help.
from unsupervisedmt.
Related Issues (20)
- why MemoryError
- Why codes file is empty.? HOT 4
- for different language, where to make change?
- How to train NMT + PBSMT ?
- UnboundLocalError: local variable 'n_words' referenced before assignment
- About number of shared layers
- RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [14, 32, 1536]], which is output 0 of AddBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). HOT 1
- How to run PBSMT +NMT ?
- transformer multihead attention scaling layer error
- Setting the random seed does not result in same outputs across runs
- I have trouble when run get_data_enfr.sh
- How can I modify the code to train may own dataset on specific language?
- Low utilization rate of cuda HOT 1
- How to train the vector of phrases
- Low BLEU on PBSMT HOT 3
- bpe_end issue
- Getting raise EOFError() while executing Linux Command through Netmiko
- How i can run MUSE alignment in .sh
- How to train the model without para_dataset
- Error in runny bash command. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from unsupervisedmt.