Giter Site home page Giter Site logo

chenxinglili / attention_is_all_you_need Goto Github PK

View Code? Open in Web Editor NEW

This project forked from soskek/attention_is_all_you_need

0.0 0.0 0.0 35 KB

[WIP] Attention Is All You Need (Vaswani et al. 2017) by Chainer.

License: BSD 3-Clause "New" or "Revised" License

Shell 0.57% Python 99.43%

attention_is_all_you_need's Introduction

[WIP] Transformer - Attention Is All You Need

Chainer-based Python implementation of Transformer, an attention-based seq2seq model without convolution and recurrence. This is WIP due to lack of confirmation of trained model performance.

This repository does not aim for complete validation of results in the paper, so I have not eagerly confirmed validity of performance. But, I expect my implementation is almost compatible with a model described in the paper. Some differences where I am aware is as follows:

  • Label smoothing. This is not yet added, but will be added soon.
  • Optimization/training strategy. Detailed information about batchsize, parameter initialization, etc. is unclear in the paper.
  • Vocabulary set, dataset, preprocessing and evaluation. This repo uses a common word-based tokenization, although the paper uses byte-pair encoding. Size of token set also differs. Evaluation (validation) is little unfair and incompatible with one in the paper, e.g., even validation set replaces unknown words to a single "unk" token.
  • Beam search. This is not yet added.
  • Model size. The setting of a model in this repo is one of "base model" in the paper, although you can modify some lines for using "big model".

This is derived from my convolutional seq2seq repo, which is derived from Chainer's official seq2seq example.

See Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017.

Requirement

  • Python 3.6.0+
  • Chainer 2.0.0+ (this version is strictly required)
  • numpy 1.12.1+
  • cupy 1.0.0+ (if using gpu)
  • and their dependencies

Prepare Dataset

You can use any parallel corpus.
For example, run download_wmt.sh which downloads and decompresses training dataset and development dataset from WMT/europal into your current directory. These files and their paths are set in training script seq2seq.py as default.

How to Run

PYTHONIOENCODING=utf-8 python -u seq2seq.py -g=0 -i DATA_DIR -o SAVE_DIR

During training, logs for loss, perplexity, word accuracy and time are printed at a certain internval, in addition to validation tests (perplexity and BLEU for generation) every half epoch. And also, generation test is performed and printed for checking training progress.

Arguments

  • -g: your gpu id. If cpu, set -1.
  • -i DATA_DIR, -s SOURCE, -t TARGET, -svalid SVALID, -tvalid TVALID:
    DATA_DIR directory needs to include a pair of training dataset SOURCE and TARGET with a pair of validation dataset SVALID and TVALID. Each pair should be parallell corpus with line-by-line sentence alignment.
  • -o SAVE_DIR: JSON log report file and a model snapshot will be saved in SAVE_DIR directory (if it does not exist, it will be automatically made).
  • -e: max epochs of training corpus.
  • -b: minibatch size.
  • -u: size of units and word embeddings.
  • -l: number of layers in both the encoder and the decoder.
  • --source-vocab: max size of vocabulary set of source language
  • --target-vocab: max size of vocabulary set of target language

attention_is_all_you_need's People

Contributors

soskek avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.