Giter Site home page Giter Site logo

papers's Introduction

This project is considered obsolete as the Torch framework is no longer maintained. If you are starting a new project, please use an alternative in the OpenNMT family: OpenNMT-tf (TensorFlow) or OpenNMT-py (PyTorch) depending on your requirements.

Build Status codecov

OpenNMT: Open-Source Neural Machine Translation

OpenNMT is a full-featured, open-source (MIT) neural machine translation system utilizing the Torch mathematical toolkit.

The system is designed to be simple to use and easy to extend, while maintaining efficiency and state-of-the-art translation accuracy. Features include:

  • Speed and memory optimizations for high-performance GPU training.
  • Simple general-purpose interface, only requires and source/target data files.
  • C++ implementation of the translator for easy deployment.
  • Extensions to allow other sequence generation tasks such as summarization and image captioning.

Installation

OpenNMT only requires a Torch installation with few dependencies.

  1. Install Torch
  2. Install additional packages:
luarocks install tds
luarocks install bit32 # if using LuaJIT

For other installation methods including Docker, visit the documentation.

Quickstart

OpenNMT consists of three commands:

  1. Preprocess the data.
th preprocess.lua -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/demo
  1. Train the model.
th train.lua -data data/demo-train.t7 -save_model model
  1. Translate sentences.
th translate.lua -model model_final.t7 -src data/src-test.txt -output pred.txt

For more details, visit the documentation.

Citation

A technical report on OpenNMT is available. If you use the system for academic work, please cite:

@ARTICLE{2017opennmt,
  author = {{Klein}, G. and {Kim}, Y. and {Deng}, Y. and {Senellart}, J. and {Rush}, A.~M.},
  title = "{OpenNMT: Open-Source Toolkit for Neural Machine Translation}",
  journal = {ArXiv e-prints},
  eprint = {1701.02810}
}

Acknowledgments

Our implementation utilizes code from the following:

Additional resources

papers's People

Contributors

guillaumekln avatar monsieurzhang avatar srush avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

papers's Issues

[vmap] Need an example of how to use the `build-pt` command

I'm trying to build a phrase table using the following command:

docker run --rm -v $PWD/mycorpus:/root/corpus build-pt corpus src_short tgt_short 3 > phrase-table.gz

I'm running into errors: IOError: [Errno 2] No such file or directory: 'corpus.src_short'. I'm sure there's a silly mistake that i'm making, but i don't know how to fix it even after hours fiddling around with paths and making sure that my corpus files are there in the docker volume.

I have two files under the mycorpus directory:

  • corp.src_short
  • corp.tgt_short

So can anyone provide a working example (paths and file names included) of how to correctly generate a phrase table ?

Generating vmap for en->many model

Hi,
vmap is useful to reduce inference time significantly. Able to generate vmap for many to one model and its works fine. How does vmap work for one to many models?

Not able to build the docker image

Hi,
Tried to execute the following command
sudo docker build -f Dockerfile . -t build-pt
and got the following output

 ---> b6f507652425                                                                                                                                                                                         
Step 2/15 : RUN apt-get update &&     apt-get install -y             build-essential             cmake             git-core             pkg-config             automake             libtool             wge
t                       zlib1g-dev             python-dev             libbz2-dev
 ---> Using cache                                                                                    
 ---> 07ec3c39df14                                                                                   
Step 3/15 : WORKDIR /root                                                                                                                                                                                  
 ---> Using cache
 ---> a33a4ef8f071
Step 4/15 : RUN git clone --branch RELEASE-4.0 https://github.com/moses-smt/mosesdecoder.git
 ---> Using cache
 ---> ed4fc633a2d7
Step 5/15 : WORKDIR /root/mosesdecoder
 ---> Using cache
 ---> 2dd13f55c191
Step 6/15 : RUN make -f contrib/Makefiles/install-dependencies.gmake
 ---> Using cache
 ---> c3f4ac2c2792
Step 7/15 : RUN ./compile.sh
 ---> Running in daaf29ccd1a2
~/mosesdecoder/jam-files/engine ~/mosesdecoder
###
### Using 'gcc' toolset.
###
rm -rf bootstrap
mkdir bootstrap
gcc -o bootstrap/jam0 command.c compile.c constants.c debug.c execcmd.c frames.c function.c glob.c hash.c hdrmacro.c headers.c jam.c jambase.c jamgram.c lists.c make.c make1.c object.c option.c output.c 
parse.c pathsys.c pathunix.c regexp.c rules.c scan.c search.c subst.c timestamp.c variable.c modules.c strings.c filesys.c builtins.c class.c cwd.c native.c md5.c w32_getreg.c modules/set.c modules/path.
c modules/regex.c modules/property-set.c modules/sequence.c modules/order.c execunix.c fileunix.c
modules/path.c: In function 'path_exists':
modules/path.c:16:12: warning: implicit declaration of function 'file_query' [-Wimplicit-function-declaration]
     return file_query( list_front( lol_get( frame->args, 0 ) ) ) ?
            ^
./bootstrap/jam0 -f build.jam --toolset=gcc --toolset-root= clean
...found 1 target...
...updating 1 target...
...updated 1 target...
./bootstrap/jam0 -f build.jam --toolset=gcc --toolset-root=
..found 158 targets...
...updating 3 targets...
[MKDIR] bin.linuxx86_64
[COMPILE] bin.linuxx86_64/b2
modules/path.c: In function 'path_exists':
modules/path.c:16:12: warning: implicit declaration of function 'file_query' [-Wimplicit-function-declaration]
     return file_query( list_front( lol_get( frame->args, 0 ) ) ) ?
            ^
[COPY] bin.linuxx86_64/bjam
...updated 3 targets...
~/mosesdecoder
Invalid value for the '-j' option, valid values are 1 through 64.
The command '/bin/sh -c ./compile.sh' returned a non-zero code: 1

I am not sure what went wrong. Can anyone help me on this ?

unable to generate the vmap

I have successfully generated the phrase_table.gz using the command

sudo docker run -v $(pwd):/root/corpus build-pt train en pt 3 > phrase-table.gz

Now with the generated phrase_table.gz i am trying the generate the vmap with the following command

python build-vmap.py -pt phrase-table.gz -ms 3 -mf 2 -km 20 -tv target_vocabulary -zg zg_list > vmap

I am getting the following error

Traceback (most recent call last):
  File "build-vmap.py", line 46, in <module>
    entries = line.split(" ||| ")
TypeError: a bytes-like object is required, not 'str'

And when i try to print the first few lines in phrase_table.gz

! ! ! ||| ! ! ! ||| 0.493828 0.817676 0.831885 0.738603 ||| 0-0 1-1 2-2 ||| 49420 29337 24405 ||| |||
! ! ! ||| ! ! ... ||| 0.00303306 0.409664 0.000340866 0.00243863 ||| 0-0 1-1 2-1 2-2 ||| 3297 29337 10 ||| |||
! ! ! ||| ! ! A ||| 0.0269231 0.817676 0.000238607 0.00054239 ||| 0-0 1-1 2-1 ||| 260 29337 7 ||| |||
! ! ! ||| ! ! Est ||| 0.333333 0.817676 3.40866e-05 2.98241e-05 ||| 0-0 1-1 2-1 ||| 3 29337 1 ||| |||
! ! ! ||| ! ! V ||| 0.00662252 0.408853 3.40866e-05 1.56883e-05 ||| 0-0 1-1 2-1 2-2 ||| 151 29337 1 ||| |||
! ! ! ||| ! ! ||| 0.0108069 0.817676 0.0318369 0.817098 ||| 0-0 1-0 2-1 ||| 86426 29337 934 ||| |||
! ! ! ||| ! ! _ ||| 0.000645578 0.409295 3.40866e-05 0.000720027 ||| 0-0 1-1 2-1 2-2 ||| 1549 29337 1 ||| |||
! ! ! ||| ! ! _Mais ||| 0.219512 0.410053 0.0015339 0.000926589 ||| 0-0 1-1 2-1 2-2 ||| 205 29337 45 ||| |||
! ! ! ||| ! ... ||| 0.000822614 0.409664 0.000374953 0.00269779 ||| 0-0 1-0 2-0 2-1 ||| 13372 29337 11 ||| |||
! ! ! ||| ! A do ||| 0.0107527 0.817676 0.00010226 1.71069e-07 ||| 0-0 1-0 2-0 ||| 279 29337 3 ||| |||

So can i know what i am missing or the mistake i have made ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.