Chainer-based Python implementation of Transformer, an attention-based seq2seq model without convolution and recurrence.
This is derived from my other repository, which is derived from Chainer's official seq2seq example.
See Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017.
- Python 3.6.0+
- Chainer 2.0.0+ (this version is strictly required)
- numpy 1.12.1+
- cupy 1.0.0+ (if using gpu)
- and their dependencies
You can use any parallel corpus.
For example, run download_wmt.sh
which downloads and decompresses training dataset and development dataset from WMT/europal into your current directory. These files and their paths are set in training script seq2seq.py
as default.
PYTHONIOENCODING=utf-8 python -u seq2seq.py -g=0 -i DATA_DIR -o SAVE_DIR
During training, logs for loss, perplexity, word accuracy and time are printed at a certain internval, in addition to validation tests (perplexity and BLEU for generation) every half epoch. And also, generation test is performed and printed for checking training progress.
-g
: your gpu id. If cpu, set-1
.-i DATA_DIR
,-s SOURCE
,-t TARGET
,-svalid SVALID
,-tvalid TVALID
:
DATA_DIR
directory needs to include a pair of training datasetSOURCE
andTARGET
with a pair of validation datasetSVALID
andTVALID
. Each pair should be parallell corpus with line-by-line sentence alignment.-o SAVE_DIR
: JSON log report file and a model snapshot will be saved inSAVE_DIR
directory (if it does not exist, it will be automatically made).-e
: max epochs of training corpus.-b
: minibatch size.-u
: size of units and word embeddings.-l
: number of layers in both the encoder and the decoder.--source-vocab
: max size of vocabulary set of source language--target-vocab
: max size of vocabulary set of target language