Giter Site home page Giter Site logo

hnmt's Introduction

HNMT: the Helsinki Neural Machine Translation system

This is a neural network-based machine translation system developed at the University of Helsinki.

It is currently rather experimental, but the user interface and setup procedure should be simple enough for people to try out.

Features

  • biLSTM encoder which can be either character-based or hybrid word/character (Luong & Manning 2016)
  • LSTM decoder which can be either character-based or word-based
  • Variational dropout (Gal 2015) and Layer Normalization (Ba et al. 2016)
  • Bayesian word alignment model for guiding attention mechanism (experimental)

Requirements

  • A GPU if you plan to train your own models
  • Python 3.4 or higher
  • Theano (use the development version)
  • BNAS
  • NLTK for tokenization
  • efmaral if you want to try the experimental supervised attention feature (see below)

Quick start

If Theano and BNAS are installed, you should be able to simply run hnmt.py. Run with the --help argument to see the available command-line options.

Training a model on the Europarl corpus can be done like this:

python3 hnmt.py --source europarl-v7.sv-en.en \
                --target europarl-v7.sv-en.sv \
                --source-tokenizer word \
                --target-tokenizer char \
                --source-vocabulary 50000 \
                --max-source-length 30 \
                --max-target-length 180 \
                --batch-size 32 \
                --training-time 24 \
                --log en-sv.log \
                --save-model en-sv.model

This will create a model with a hybrid encoder (with 50k vocabulary size and character-level encoding for the rest) and character-based decoder, filtering out sentences longer than 30 words (source) or 180 characters (target) and training for 24 hours. Development set cross-entropy and some other statistics appended to this file, which is usually the best way of monitoring training. Training loss and development set translations will be written to stdout, so redirecting this or using tee is recommended.

The resulting model can be used like this:

python3 hnmt.py --load-model en-sv.model \
                --translate test.en --output test.sv \
                --beam-size 10

Note that when training a model from scratch, parameters can be set on the commandline or otherwise the hard-coded defaults are ued. When continuing training or doing translation (i.e. whenever the --load-model argument is used), the defaults are encoded in the given model file, although some of these (that do not change the network structure) can still be overridden by commandline arguments.

For instance, the model above will assume that input files need to be tokenized, but passing a pre-tokenized (space-separated) input can be done as follows:

python3 hnmt.py --load-model en-sv.model \
                --translate test.en --output test.sv \
                --source-tokenizer space \
                --beam-size 10

Resuming training

You can resume training by adding the --load-model argument without using --translate (which disables training). For instance, if you want to keep training the model above for another 48 hours on the same data:

python3 hnmt.py --load-model en-sv.model
                --training-time 48 \
                --save-model en-sv-72h.model

Using efmaral for attention supervision

Install the Python bindings for efmaral (i.e. run python3 setup.py install in the efmaral directory).

Then you can simply add --alignment-loss 1.0 when training to activate this feature (the number specifies the contribution of alignment/attention cross-entropy to the loss function). By default this contribution has an exponential decay (per batch), this can be specified with --alignment-decay 0.9999 or such.

hnmt's People

Contributors

robertostling avatar waino avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.