Giter Site home page Giter Site logo

vnmt's Introduction

VNMT

Current implementation for VNMT only support 1 layer NMT! Deep layers are meaningless.

Source code for variational neural machine translation, we will make it available soon!

If you use this code, please cite our paper:

@InProceedings{zhang-EtAl:2016:EMNLP20162,
  author    = {Zhang, Biao  and  Xiong, Deyi  and  su, jinsong  and  Duan, Hong  and  Zhang, Min},
  title     = {Variational Neural Machine Translation},
  booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
  month     = {November},
  year      = {2016},
  address   = {Austin, Texas},
  publisher = {Association for Computational Linguistics},
  pages     = {521--530},
  url       = {https://aclweb.org/anthology/D16-1050}
}

Basic Requirement

Our source is based on the GroundHog. Before use our code, please install it.

How to Run?

To train a good VNMT model, you need follow two steps.

Step 1. Pretraining

Pretrain a base model using the GroundHog.

Step 2. Retraining

Go to the work directory, and put the pretrained model to this directory, i.e. use the pretrained model to initialize the parameters of VNMT.

Simply Run (Clearly, before that you need re-config the chinese.py file to your own dataset :))

run.sh

That's it!

Notice that our test and deveopment set is the NIST dataset, which follow the sgm format! Please see the work/data/dev for example.

For any comments or questions, please email Biao Zhang.

vnmt's People

Contributors

bzhanggo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

vnmt's Issues

How do you compute q(z| x, y) at test time?

Hello,

I am reading your paper Variational Neural Machine Translation.

The paper is great! I was wondering one thing: How do you compute q(z| x, y) at test time?
I cannot find an obvious approach to this as at test time we don't have y to compute their representation and then q(z| x, y).

Thanks!

Best,

Have you tried to train VNMT from scratch?

I found training VNMT from scratch leads to a low performance. Then I monitored the training loss, and figured it out: the algorithm just converges too slow and haven't fully converged when I stop it.
Does this suggest a buggy implementation? Or this training dynamics is normal?

Missing 0.5 in your KL Divergence?

Hi,

I noticed that for computing KL divergence, the full formula should be:
0.5*(log sigma_prior - log sigma_post + (sigma_post^2 + (mu_post - mu_prior)^2) / (2*sigma_prior^2)) - 0.5

instead of

log sigma_prior - log sigma_post + (sigma_post^2 + (mu_post - mu_prior)^2) / (2*sigma_prior^2) - 0.5 as in your code as follows: https://github.com/DeepLearnXMU/VNMT/blob/master/src/encdec.py

`# log sigma_prior - log sigma_post + (sigma_post^2 + (mu_post - mu_prior)^2) / (2*sigma_prior^2) - 1/2

kl_der_q_p = (variation_log_sigma_layers[level] - variation_log_sigma_post_layers[level]) +
(UnaryOp('lambda x: x2')(variation_sigma_post_layers[level]) +
UnaryOp('lambda x: x
2')(variation_mu_post_layers[level] - variation_mu_layers[level])) / (UnaryOp('lambda x: 2.*x**2')(variation_sigma_layers[level])) - 0.5`

Am I correct, or am I missing something?

Many thanks!

Best

problems with GroundHog

Hi,

When I use the bleeding edge version of GroundHog, I met this problem . Have you ever encountered this issue?

My Theano version is 0.10.0beta3+129.ga44a3f8. I don't know if it is due to the special Theano version.
Or, could you please share your pretrained and/or finetuned model?
Many thanks~

Is Equation 5 the *proper* ELBO of the likelihood P(Y|X)

Hello,

My last question from your paper: http://www.aclweb.org/anthology/D16-1050. I tried by myself to derive the ELBO of P(Y|X), but I couldn't come-up with any formula. Meanwhile, I found it indeed pretty easy/straightforward to derive the ELBO of P(Y, X), but not P(Y|X) indeed.

So I was wondering whether it is possible to derive ELBO of P(Y|X), or your formula (equation 5) is a kind of an approximation of ELBO instead?

Many thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.