cindyxinyiwang / multidds Goto Github PK
View Code? Open in Web Editor NEWCode for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"
License: MIT License
Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"
License: MIT License
Hi
Thank you for sharing all the codes.
I got an error `ImportError: cannot import name 'libbleu' from 'fairseq'
when I run the train.py
I think it might be due to the old version of fairseq.
Could you tell me all the dependencies for this repo?
Thank you
Hi Cindy,
I was studying your code for in trainer.py and it seems like that you perform update for RL scorer (the data actor) in both update_language_sampler
function and train_step
function. Initially I thought you only update the RL in update_language_sampler()
where you compute the cosine similarity of two gradients, but then I saw this block of code (which seems to only update ave_emb actor, so I wonder if you actually use this block of code?)
# optimize data actor
for k in cached_loss.keys():
reward = 1./eta * (cur_loss[k] - cached_loss[k])
if self.args.out_score_type == 'sigmoid':
#loss = -(torch.log(1e-20 + data_actor_out[k]) * reward.data)
loss = -(data_actor_out[k] * reward.data)
elif self.args.out_score_type == 'exp':
loss = -(torch.log(1e-20 + data_actor_out[k]) * reward.data)
if cur_loss[k].size(0) > 0:
loss.div_(cur_loss[k].size(0))
loss.sum().backward()
if self.args.data_actor == 'ave_emb':
self.data_optimizer.step()
self.data_optimizer.zero_grad()
Thank you for your help and clarification!
Hi, I found the two files in databin are empty:
ted_8_related/combined-train.spm8000.src
ted_8_related/combined-train.spm8000.eng
Could you re-upload the above files?
Hi,
I succeeded in re-implementing the result in diverse_ted8_m2o setting. However, I failed to re-implement the result in diverse_ted8_o2m. The score of each language is 4 BLEU lower than the result shown in the paper (Table-11).
The below is my result in diverse_ted8_o2m:
Have you met this situation?
Hello, I'm Steven, from Johns Hopkins University. I'm currently working on a research project, studying different methods to denoise the training data for low resource languages. I came across your papers (DDS, TCS, and multiDDS) and I'm very interested in your implementation. I start checking this code repo very carefully and I found some issues (I sort of "fixed" them in my own way in a forked repo, if you think it's useful to incorporate in your repo, I can submit a pull request for you to review my changes). Here are the issues:
the code in fairseq/seach.py (torch.div) is deprecated so I update them using the most recent fairseq's beamsearch code.
I think this is the most important part of the code since you calculated the gradient similarity between the training set and dev set to get the similarity score to update the language distribution. There are some undefined or unused variables like self.optimizers
, all_sim_list
. I changed them so that the code only use one vector sim_list
though theoretically there should be a N*N (N is number of language pairs) sim_list, and that's why you need all_sim_list to append different sim_list right? My change only helps me to run my own code since I'm using just 1pair of language instead of multilingual settings, but I think it shouldn't be hard to fix it, you might just leave those variables there by accident.
It seems that if I use --sacrebleu to generate, the result is not a string but | Generate test with beam=5: <sacrebleu.metrics.bleu.BLEUScore object at 0x7fec308a75b0>
I'm not what causes the object to be printed.
Ave type data_actor
Since I'm more interested in a one-pair
setting instead of multilingual input, I want the scorer to directly work on src_tokens and trg_tokens
, which is the method you proposed in the DDS
paper. If I interpret your code correctly, this block should never be run right?
# data selection: reset epoch iter to filter out unselected data
if epoch_itr.epoch == args.select_by_dds_epoch and args.select_by_dds_epoch > 0:
epoch_itr, _ = trainer.get_filtered_train_iterator(epoch_itr.epoch, filtered_maxpos_indices=filtered_maxpos_indices)
Since I want to work with data-filtering, and I realize base data-actor is only seeing
language IDs instead of real tokens, I have to useave
type. To make it work, I changed your initialization steps (basically I added elif self.args.data_actor == 'ave':
and adam optimizer for it in your trainer.py
). I'm not sure if this modification is correct but select_by_dds_epoch
works after this change. Therefore, I just want some confirmation/help from you that this is indeed the correct way to implement a data-filtering with ave
data actor.
I'm just curious what is the usage --utility-type
in the args. I didn't find where it's triggered when I debug through my console. Also, could you share with me the training script/hyper parameters you use for DDS (Optimizing Data Usage via Differentiable Rewards) since I want to train directly on 1 pair of languages and replicate your result.
I'm really impressed by how well you modified the fairseq toolkit and incorporated the reinforcement optimization to change the data loading. If I have any misunderstanding about your methods or code implementation, please let me know. Also, please let me know that if you want me to submit a pull request for you to better view my changes. Thank you for your help in advance!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.