Giter Site home page Giter Site logo

cindyxinyiwang / multidds Goto Github PK

View Code? Open in Web Editor NEW
23.0 23.0 9.0 28.64 MB

Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

License: MIT License

PLSQL 0.01% Python 93.95% C++ 0.49% Cuda 1.72% Shell 3.26% Perl 0.29% Lua 0.28%

multidds's People

Contributors

alexeib avatar alvations avatar cindyxinyiwang avatar cndn avatar davidecaroselli avatar edunov avatar freewym avatar halilakin avatar hitvoice avatar hmc-cs-mdrissi avatar huihuifan avatar jhcross avatar jingfeidu avatar kartikayk avatar lematt1991 avatar liezl200 avatar louismartin avatar michaelauli avatar myleott avatar ngoyal2707 avatar nng555 avatar pipibjc avatar rutyrinott avatar skritika avatar stephenroller avatar taineleau avatar taylanbil avatar theweiho avatar xianxl avatar zhiqwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

multidds's Issues

Dependency

Hi

Thank you for sharing all the codes.

I got an error `ImportError: cannot import name 'libbleu' from 'fairseq'

when I run the train.py

I think it might be due to the old version of fairseq.

Could you tell me all the dependencies for this repo?

Thank you

Trainer Gradient Update For Scorer in both train_step and update_language_sampler

Hi Cindy,

I was studying your code for in trainer.py and it seems like that you perform update for RL scorer (the data actor) in both update_language_sampler function and train_step function. Initially I thought you only update the RL in update_language_sampler() where you compute the cosine similarity of two gradients, but then I saw this block of code (which seems to only update ave_emb actor, so I wonder if you actually use this block of code?)

# optimize data actor
            for k in cached_loss.keys():
                reward = 1./eta * (cur_loss[k] - cached_loss[k])
                if self.args.out_score_type == 'sigmoid':
                    #loss = -(torch.log(1e-20 + data_actor_out[k]) * reward.data)
                    loss = -(data_actor_out[k] * reward.data)
                elif self.args.out_score_type == 'exp':
                    loss = -(torch.log(1e-20 + data_actor_out[k]) * reward.data)
                if cur_loss[k].size(0) > 0:
                    loss.div_(cur_loss[k].size(0))
                loss.sum().backward()
            if self.args.data_actor == 'ave_emb': 
                self.data_optimizer.step()
                self.data_optimizer.zero_grad()

Thank you for your help and clarification!

Dataset (Ted-8-Related) is missing

Hi, I found the two files in databin are empty:
ted_8_related/combined-train.spm8000.src
ted_8_related/combined-train.spm8000.eng

Could you re-upload the above files?

Couldn't re-implement the result in diverse_ted8_o2m setting

Hi,
I succeeded in re-implementing the result in diverse_ted8_m2o setting. However, I failed to re-implement the result in diverse_ted8_o2m. The score of each language is 4 BLEU lower than the result shown in the paper (Table-11).

The below is my result in diverse_ted8_o2m:
image

Have you met this situation?

Some code fix and questions about the modifications on fairseq source code

Hello, I'm Steven, from Johns Hopkins University. I'm currently working on a research project, studying different methods to denoise the training data for low resource languages. I came across your papers (DDS, TCS, and multiDDS) and I'm very interested in your implementation. I start checking this code repo very carefully and I found some issues (I sort of "fixed" them in my own way in a forked repo, if you think it's useful to incorporate in your repo, I can submit a pull request for you to review my changes). Here are the issues:

fairseq beamsearch is out of date:

the code in fairseq/seach.py (torch.div) is deprecated so I update them using the most recent fairseq's beamsearch code.

undefined variable in trainer.py/update_language_sampler()

I think this is the most important part of the code since you calculated the gradient similarity between the training set and dev set to get the similarity score to update the language distribution. There are some undefined or unused variables like self.optimizers, all_sim_list. I changed them so that the code only use one vector sim_list though theoretically there should be a N*N (N is number of language pairs) sim_list, and that's why you need all_sim_list to append different sim_list right? My change only helps me to run my own code since I'm using just 1pair of language instead of multilingual settings, but I think it shouldn't be hard to fix it, you might just leave those variables there by accident.

generator is not reporting score properly

It seems that if I use --sacrebleu to generate, the result is not a string but | Generate test with beam=5: <sacrebleu.metrics.bleu.BLEUScore object at 0x7fec308a75b0> I'm not what causes the object to be printed.

The code is not working with Ave type data_actor

Since I'm more interested in a one-pair setting instead of multilingual input, I want the scorer to directly work on src_tokens and trg_tokens, which is the method you proposed in the DDS paper. If I interpret your code correctly, this block should never be run right?

    # data selection: reset epoch iter to filter out unselected data
    if epoch_itr.epoch == args.select_by_dds_epoch and args.select_by_dds_epoch > 0:
        epoch_itr, _ = trainer.get_filtered_train_iterator(epoch_itr.epoch, filtered_maxpos_indices=filtered_maxpos_indices)

Since I want to work with data-filtering, and I realize base data-actor is only seeing language IDs instead of real tokens, I have to useave type. To make it work, I changed your initialization steps (basically I added elif self.args.data_actor == 'ave': and adam optimizer for it in your trainer.py). I'm not sure if this modification is correct but select_by_dds_epoch works after this change. Therefore, I just want some confirmation/help from you that this is indeed the correct way to implement a data-filtering with ave data actor.

Last question

I'm just curious what is the usage --utility-type in the args. I didn't find where it's triggered when I debug through my console. Also, could you share with me the training script/hyper parameters you use for DDS (Optimizing Data Usage via Differentiable Rewards) since I want to train directly on 1 pair of languages and replicate your result.

I'm really impressed by how well you modified the fairseq toolkit and incorporated the reinforcement optimization to change the data loading. If I have any misunderstanding about your methods or code implementation, please let me know. Also, please let me know that if you want me to submit a pull request for you to better view my changes. Thank you for your help in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.