Giter Site home page Giter Site logo

xrosliang / var-attn Goto Github PK

View Code? Open in Web Editor NEW

This project forked from harvardnlp/var-attn

0.0 0.0 0.0 95.43 MB

Latent Alignment and Variational Attention

Home Page: https://arxiv.org/abs/1807.03756

License: MIT License

Python 84.20% Shell 4.17% Perl 7.07% Smalltalk 0.37% Emacs Lisp 3.29% JavaScript 0.16% NewLisp 0.31% Ruby 0.32% Slash 0.05% SystemVerilog 0.04% Dockerfile 0.03%

var-attn's Introduction

Latent Alignment and Variational Attention

This is a Pytorch implementation of the paper Latent Alignment and Variational Attention from a fork of OpenNMT.

Dependencies

The code was tested with python 3.6 and pytorch 0.4. To install the dependencies, run

pip install -r requirements.txt

Running the code

All commands are in the script va.sh.

Preprocessing the data

To preprocess the data, run

source va.sh && preprocess_bpe

The raw data in data/iwslt14-de-en was obtained from the fairseq repo with BPE_TOKENS=14000.

Training the model

To train a model, run one of the following commands:

  • Soft attention
source va.sh && CUDA_VISIBLE_DEVICES=0 train_soft_b6
  • Categorical attention with exact evidence
source va.sh && CUDA_VISIBLE_DEVICES=0 train_exact_b6
  • Variational categorical attention with exact ELBO
source va.sh && CUDA_VISIBLE_DEVICES=0 train_cat_enum_b6
  • Variational categorical attention with REINFORCE
source va.sh && CUDA_VISIBLE_DEVICES=0 train_cat_sample_b6
  • Variational categorical attention with Gumbel-Softmax
source va.sh && CUDA_VISIBLE_DEVICES=0 train_cat_gumbel_b6
  • Variational categorical attention using Wake-Sleep algorithm (Ba et al 2015)
source va.sh && CUDA_VISIBLE_DEVICES=0 train_cat_wsram_b6

Checkpoints will be saved to the project's root directory.

Evaluating on test

The exact perplexity of the generative model can be obtained by running the following command with $model replaced with a saved checkpoint.

source va.sh && CUDA_VISIBLE_DEVICES=0 eval_cat $model

The model can also be used to generate translations of the test data:

source va.sh && CUDA_VISIBLE_DEVICES=0 gen_cat $model
sed -e "s/@@ //g" $model.out | perl tools/multi-bleu.perl data/iwslt14-de-en/test.en

Trained Models

Models with the lowest validation PPL were selected for evaluation on test. Numbers are slightly different from those reported in the paper since this is a re-implementation.

Model Test PPL Test BLEU
Soft Attention 7.17 32.77
Exact Marginalization 6.34 33.29
Variational Attention + Enumeration 6.08 33.69
Variational Attention + Sampling 6.17 33.30

var-attn's People

Contributors

adamlerer avatar apaszke avatar askender avatar bmccann avatar bpopeters avatar chenbeh avatar colesbury avatar da03 avatar guillaumekln avatar gwenniger avatar helson73 avatar henry-e avatar irshadbhat avatar jianyuzhan avatar jingxil avatar jsenellart avatar justinchiu avatar mattiadg avatar orina1123 avatar playma avatar pltrdy avatar scarletpan avatar sebastiangehrmann avatar smartkiwi avatar soumith avatar srush avatar taolei87 avatar thammegowda avatar wjbianjason avatar xutaima avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.