Giter Site home page Giter Site logo

moses-smt's Introduction

Dock You a Moses

Want to play with the Moses Statistical Machine Translation system, but...

  • You don't have time to get a PhD in Setting Up Moses?

  • You have TMX files (or structured bilingual text files easily convertible to TMX) and want to use them with Moses without doing all the munging yourself?

Well now you don't have to, because I stuffed Moses in a Docker container for you.

What is this?

  • A full Moses + MGIZA installation in a Docker image: amake/moses-smt:base on Docker Hub

  • A make-based set of commands for easily

    • Converting TMX files into Moses-ready corpus files: make corpus

    • Training and tuning Moses: make train

    • Building Docker images of trained Moses instances: make build

    • Deploying trained Moses instances to Docker Hub/Amazon Elastic Beanstalk: make deploy-hub

  • Some peripheral tools:

    • A simple REPL for querying Moses over XML-RPC: mosesxmlrpcrepl.py or make repl

Requirements

  • make

  • Docker

  • Python 3 with pip and virtualenv

  • OS X? (not tested elsewhere)

  • Some TMX files (Okapi Rainbow is a good tool for converting structured bilingual files to TMX)

Usage

First, if trying to build the base image, you might need to re-balance the number of cores vs memory available to Docker: e.g. 8 cores but only 2 GB of memory results in compilation failures. 4 cores with 4 GB seems to work better.

  1. Put most of your TMXs in tmx-train, and the rest in tmx-tune.

  2. Run make SOURCE_LANG=<src> TARGET_LANG=<trg> [LABEL=<lbl>].

  • src and trg (required) are the language codes (not language + country) for your source and target languages, e.g. en and fr.

  • lbl is an optional label for the resulting image; myinstance by default.

  1. Wait forever.

  2. When done, you will have a Docker image tagged moses-smt:<lbl>-<src>-<trg>.

  • Run make server SOURCE_LANG=<src> TARGET_LANG=<trg> [PORT=<port>] to start mosesserver which you can query over XML-RPC.

  • Optionally specify a port; the default is 8080.

What then?

  • Train a new image with swapped languages or with a new set of TMXs.

  • Use a trained instance for translation in OmegaT with the omegat-moses-mt plugin:

    • Run make server to run the server locally; the moses.server.url value is then http://localhost:8080/RPC2

    • Run make deploy-hub and then upload the .zip that's produced as a new EB environment

moses-smt's People

Contributors

amake avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

moses-smt's Issues

Installation problem

Hi Aaron,

I'd like to set up and train a local instance of moses-smt and then use it through OmegaT to see if it would help with my translation work (patents Ja>En), but I'm receiving the following error when executing "make SOURCE_LANG=JA TARGET_LANG=EN".

4ka0-laptop:~ Jage$ git clone https://github.com/amake/moses-smt.git
Cloning into 'moses-smt'...
remote: Counting objects: 322, done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 322 (delta 4), reused 9 (delta 4), pack-reused 312
Receiving objects: 100% (322/322), 43.60 KiB | 579.00 KiB/s, done.
Resolving deltas: 100% (178/178), done.
...
4ka0-laptop:moses-smt Jage$ ls
Dockerfile			makefile
Dockerrun.aws.image.json	mosesxmlrpcrepl.py
Dockerrun.aws.noimage.json	test.sh
LICENSE				tmx-train
README.md			tmx-tune
base				work
4ka0-laptop:moses-smt Jage$ make SOURCE_LANG=JA TARGET_LANG=EN
LC_ALL=C virtualenv env
New python executable in /Volumes/Untitled/mac local jobs/archive/moses-smt/env/bin/python
Installing setuptools, pip, wheel...done.
env/bin/pip install git+https://github.com/amake/tmx2corpus.git
make: env/bin/pip: No such file or directory
make: *** [env/bin/tmx2corpus] Error 1

I've tried manually downloading tmx2corpus from your repository on GitHub and placing the whole folder in env/bin but I'm guessing that's not what its looking for.

I have Homebrew 1.6.4, Docker 18.03.1-ce-mac65, Python 2.7.10, pip 10.0.1, and virtualenv 15.10.1 installed on High Sierra 10.13.4.

Any advice you could provide here would be a big help.

Thanks,
Jon

Segmentation fault. Decoder died

Hi, amake. Great docker! Thanks for it.

I have some issues with decoder.

Finally i've localized my problem as follows:
Using phrase-model from http://www.statmt.org/moses/?n=Moses.Tutorial

wget http://www.statmt.org/moses/download/sample-models.tgz
tar zxvf sample-models.tgz
/opt/bin/moses/bin/moses -f phrase-model/moses.ini

i am getting

Reading phrase-model/phrase-table
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Created input-output object : [0.692] seconds
hi
Translating: hi
Line 0: Initialize search took 0.000 seconds total
Line 0: Collecting options took 0.000 seconds at moses/Manager.cpp Line 141
Line 0: Search took 0.000 seconds
Segmentation fault

Actually i have the same error with my trained model too. Could you please advice?

docker pull not working

Just a heads up: I got an error when pulling from docker hub:

docker pull amake/moses-smt
Using default tag: latest
Error response from daemon: manifest for amake/moses-smt:latest not found

I just rebuilt locally, works like a dream! Thanks for this repo, you saved me a million hours of tedium ๐Ÿฅ‡

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.