cindyxinyiwang / multidds Goto Github PK

View Code? Open in Web Editor NEW

23.0 23.0 9.0 28.64 MB

Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

License: MIT License

PLSQL 0.01% Python 93.95% C++ 0.49% Cuda 1.72% Shell 3.26% Perl 0.29% Lua 0.28%

multidds's People

Contributors

Stargazers

Watchers

Forkers

shawnkx steventan0110 manowen zjpbinary cccntu wxjiao lexyspyrou shepherd233 relojeffrey

multidds's Issues

Dependency

Thank you for sharing all the codes.

I got an error `ImportError: cannot import name 'libbleu' from 'fairseq'

when I run the train.py

I think it might be due to the old version of fairseq.

Could you tell me all the dependencies for this repo?

Thank you

Trainer Gradient Update For Scorer in both train_step and update_language_sampler

Hi Cindy,

I was studying your code for in trainer.py and it seems like that you perform update for RL scorer (the data actor) in both update_language_sampler function and train_step function. Initially I thought you only update the RL in update_language_sampler() where you compute the cosine similarity of two gradients, but then I saw this block of code (which seems to only update ave_emb actor, so I wonder if you actually use this block of code?)

# optimize data actor
            for k in cached_loss.keys():
                reward = 1./eta * (cur_loss[k] - cached_loss[k])
                if self.args.out_score_type == 'sigmoid':
                    #loss = -(torch.log(1e-20 + data_actor_out[k]) * reward.data)
                    loss = -(data_actor_out[k] * reward.data)
                elif self.args.out_score_type == 'exp':
                    loss = -(torch.log(1e-20 + data_actor_out[k]) * reward.data)
                if cur_loss[k].size(0) > 0:
                    loss.div_(cur_loss[k].size(0))
                loss.sum().backward()
            if self.args.data_actor == 'ave_emb': 
                self.data_optimizer.step()
                self.data_optimizer.zero_grad()

Thank you for your help and clarification!

Dataset (Ted-8-Related) is missing

Hi, I found the two files in databin are empty:
ted_8_related/combined-train.spm8000.src
ted_8_related/combined-train.spm8000.eng

Could you re-upload the above files?

Couldn't re-implement the result in diverse_ted8_o2m setting

Hi,
I succeeded in re-implementing the result in diverse_ted8_m2o setting. However, I failed to re-implement the result in diverse_ted8_o2m. The score of each language is 4 BLEU lower than the result shown in the paper (Table-11).

The below is my result in diverse_ted8_o2m:

Have you met this situation?

Some code fix and questions about the modifications on fairseq source code

Hello, I'm Steven, from Johns Hopkins University. I'm currently working on a research project, studying different methods to denoise the training data for low resource languages. I came across your papers (DDS, TCS, and multiDDS) and I'm very interested in your implementation. I start checking this code repo very carefully and I found some issues (I sort of "fixed" them in my own way in a forked repo, if you think it's useful to incorporate in your repo, I can submit a pull request for you to review my changes). Here are the issues:

fairseq beamsearch is out of date:

the code in fairseq/seach.py (torch.div) is deprecated so I update them using the most recent fairseq's beamsearch code.

undefined variable in trainer.py/update_language_sampler()

I think this is the most important part of the code since you calculated the gradient similarity between the training set and dev set to get the similarity score to update the language distribution. There are some undefined or unused variables like self.optimizers, all_sim_list. I changed them so that the code only use one vector sim_list though theoretically there should be a N*N (N is number of language pairs) sim_list, and that's why you need all_sim_list to append different sim_list right? My change only helps me to run my own code since I'm using just 1pair of language instead of multilingual settings, but I think it shouldn't be hard to fix it, you might just leave those variables there by accident.

generator is not reporting score properly

It seems that if I use --sacrebleu to generate, the result is not a string but | Generate test with beam=5: <sacrebleu.metrics.bleu.BLEUScore object at 0x7fec308a75b0> I'm not what causes the object to be printed.

The code is not working with `Ave type data_actor`

Since I'm more interested in a one-pair setting instead of multilingual input, I want the scorer to directly work on src_tokens and trg_tokens, which is the method you proposed in the DDS paper. If I interpret your code correctly, this block should never be run right?

    # data selection: reset epoch iter to filter out unselected data
    if epoch_itr.epoch == args.select_by_dds_epoch and args.select_by_dds_epoch > 0:
        epoch_itr, _ = trainer.get_filtered_train_iterator(epoch_itr.epoch, filtered_maxpos_indices=filtered_maxpos_indices)

Since I want to work with data-filtering, and I realize base data-actor is only seeing language IDs instead of real tokens, I have to useave type. To make it work, I changed your initialization steps (basically I added elif self.args.data_actor == 'ave': and adam optimizer for it in your trainer.py). I'm not sure if this modification is correct but select_by_dds_epoch works after this change. Therefore, I just want some confirmation/help from you that this is indeed the correct way to implement a data-filtering with ave data actor.

Last question

I'm just curious what is the usage --utility-type in the args. I didn't find where it's triggered when I debug through my console. Also, could you share with me the training script/hyper parameters you use for DDS (Optimizing Data Usage via Differentiable Rewards) since I want to train directly on 1 pair of languages and replicate your result.

I'm really impressed by how well you modified the fairseq toolkit and incorporated the reinforcement optimization to change the data loading. If I have any misunderstanding about your methods or code implementation, please let me know. Also, please let me know that if you want me to submit a pull request for you to better view my changes. Thank you for your help in advance!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.