opennmt / ctranslate Goto Github PK

Lightweight C++ translator for OpenNMT Torch models (deprecated)

License: MIT License

CMake 5.47% C++ 83.81% C 10.21% Cuda 0.51%

eigen opennmt neural-machine-translation cpp

ctranslate's Introduction

This project is considered obsolete as the Torch framework is no longer maintained. If you are starting a new project, please use an alternative in the OpenNMT family: OpenNMT-tf (TensorFlow) or OpenNMT-py (PyTorch) depending on your requirements.

OpenNMT: Open-Source Neural Machine Translation

OpenNMT is a full-featured, open-source (MIT) neural machine translation system utilizing the Torch mathematical toolkit.

The system is designed to be simple to use and easy to extend, while maintaining efficiency and state-of-the-art translation accuracy. Features include:

Speed and memory optimizations for high-performance GPU training.
Simple general-purpose interface, only requires and source/target data files.
C++ implementation of the translator for easy deployment.
Extensions to allow other sequence generation tasks such as summarization and image captioning.

Installation

OpenNMT only requires a Torch installation with few dependencies.

Install Torch
Install additional packages:

luarocks install tds
luarocks install bit32 # if using LuaJIT

For other installation methods including Docker, visit the documentation.

Quickstart

OpenNMT consists of three commands:

Preprocess the data.

th preprocess.lua -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/demo

Train the model.

th train.lua -data data/demo-train.t7 -save_model model

Translate sentences.

th translate.lua -model model_final.t7 -src data/src-test.txt -output pred.txt

For more details, visit the documentation.

Citation

A technical report on OpenNMT is available. If you use the system for academic work, please cite:

@ARTICLE{2017opennmt,
  author = {{Klein}, G. and {Kim}, Y. and {Deng}, Y. and {Senellart}, J. and {Rush}, A.~M.},
  title = "{OpenNMT: Open-Source Toolkit for Neural Machine Translation}",
  journal = {ArXiv e-prints},
  eprint = {1701.02810}
}

Acknowledgments

Our implementation utilizes code from the following:

Additional resources

ctranslate's People

Contributors

Stargazers

Watchers

Forkers

chagge stevenlol milesqli fireae jac578 benjamesbabala jsenellart sunqf jlorieux-systran seandarren hfxunlp amrsharaf aurelien-coquard lorrainewuu stillkeeptry honeyw mikewlange icaas jvdbogae hunter-packages levanhong05 tprott transitive-bullshit mtresearcher skishore brokenpeace stephane88 bittdy onpoeet suzhoushr qwantresearch soares-f batwalrus76 jhnwnd bin2000 yuhonghong66 zvelo samanthafeidfischer degerli peterzs fdalvi adamits gauravalgo mlogix harvyso

ctranslate's Issues

Does CTranslate support distill-tiny model defined in Paper?

Hello, we have trained a bidirectional rnn encoder decoder (default OpenNMT-lua settings) and successfully released the model and tested using this repository. However, when we are working through the paper (http://aclweb.org/anthology/W18-2715) and try to replicate the distill-tiny model with a GRU encoder with 2-layers on the encoder but only 1-layer on the decoder, we run into the issue that the released model doesn't translate anything using the GPU (--cuda). When I run on the CPU, I get the following error:

Intel MKL ERROR: Parameter 10 was incorrect on entry to SGEMM .

The model can accurately translate using the lua code so we know it isn't any issue with the model but must be something incompatible when we try to release to CTranslate. Here is the full configuration used to train:

th train.lua -data data/demo-train.t7 \
	-save_model distill_tiny_model_unlemmatized_50k_gru \
	-gpuid 1    \
	-max_batch_size 512\
	-save_every 5000 \
	-src_vocab_size 50000 \
	-tgt_vocab_size 50000  \
	-src_words_min_frequency 5 \
	-tgt_words_min_frequency 5 \
	-rnn_type GRU \
	-rnn_size 512 \
	-optim adam \
	-learning_rate 0.0002  \
	-enc_layers 2 \
	-dec_layers 1 \
	-bridge dense \
	-continue true \
	-log_file log.txt

Does CTranslate support GRU as a rnn_type and does it support dense as an option for -bridge?

How to enable cuda support

Hi, I created a demo project but found that it was unable to load a GPU trained model, would like to give demo that support cuda?

Thanks.

compiling error

I installed boost but also an error occured in compiling, as follows:

cmake -DEIGEN_ROOT=../../eigen-3.3.1/ -DBOOST_ROOT=/usr/include/boost/ ..
-- Boost version: 1.54.0
-- Could NOT find Boost
-- Configuring done
-- Generating done
-- Build files have been written to: /home/ww110750/CTranslate/build

How to use with GigaWord Pretrained Text Summarization

I'm trying to use the text summarization model with CTranslate. I'm not sure about how the GigaWord corpus was trained (is that using BPE?), so I have tried this way:

echo "The quick brown fox jumps over the lazy dog" | ./cli/translate --model /Users/loretoparisi/webservice/textsum_epoch7_14.69_release.t7 --beam_size 5

I do not get any output here.

CTranslate + Tokenizer with case_feature models.

Hello. I have problem with case_feature models.

I did model:
a) th tokenize with case_feature src and tgt data.
b) th preprocess with 1000 vocabulary and 10 max sentence length.
c) th train only with rnn_size 100 and end_epoch 25 (default encoder type and other options).
I write my C++ file for using CTranslate library functions.

#include <onmt/onmt.h>
std::shared_ptr<name::CTranslator> translator;
//
 int init(const char *model, const char *ptTablePath, int beamSearch) {
    translator = std::make_shared<lingvanex::CTranslator>(model, ptTablePath, beamSearch);
    return 0;
}

int translate(const char *sourceText, char *resultText) {
    char *error(0);
    translator->translate(sourceText, resultText, error); 
    return 0;
}
//
namespace name {
    CTranslator::CTranslator(const char* modelPath, const char* ptTablePath, int beamSearch):
    translator_(onmt::TranslatorFactory::build(modelPath, ptTablePath, true, 1024, beamSearch, false, false)) {}
    
    CTranslator::~CTranslator() {}
    
    int CTranslator::translate(const char *sourceText, char *resultText, char *error) {    
            onmt::ITokenizer* tokenizer = new onmt::Tokenizer(onmt::Tokenizer::Mode::Conservative, "", true, false, false, "");
            std::string result = translator_->translate(sourceText, *tokenizer);
            int resultLength = sizeof(result);
            const char *resultChar = result.c_str();
            int resultCharLength = strlen(resultChar);
            std::cout << sourceText << "\n" << resultChar << std::endl;
            int resultTextLengthBefore = strlen(resultText);
            memcpy(resultText, resultChar, strlen(resultChar));\
            memset(resultText + strlen(resultChar), 0, 2);
            int resultTextLengthAfter = strlen(resultText);
        return 0;
    }
}

And if i use my model with my code, then i get error.
Assertion failed: (lhs.cols() == rhs.rows() && "invalid matrix product" && "if you wanted a coeff-wise or a dot product use the respective explicit functions"), function Product, file Eigen/src/Core/Product.h, line 97.
Error get after program try run this command - rnn_state_enc = _encoder->forward(input); (Translator.hxx 321 line)

Anybody test CTranslate with case_feature? I try test CLI version, but CLI version doesn't have case_feature or tokenization option.

Boost and Eigen 3.3 on Ubuntu 16.04 LTS

I have successfully installed both Boost and Eigen on Ubuntu 16.04 LTS:

Selecting previously unselected package libboost1.58-dev:amd64.
(Reading database ... 48598 files and directories currently installed.)
Preparing to unpack .../libboost1.58-dev_1.58.0+dfsg-5ubuntu3.1_amd64.deb ...
Unpacking libboost1.58-dev:amd64 (1.58.0+dfsg-5ubuntu3.1) ...
Selecting previously unselected package libboost-dev:amd64.
Preparing to unpack .../libboost-dev_1.58.0.1ubuntu1_amd64.deb ...
Unpacking libboost-dev:amd64 (1.58.0.1ubuntu1) ...
Selecting previously unselected package pkg-config.
Preparing to unpack .../pkg-config_0.29.1-0ubuntu1_amd64.deb ...
Unpacking pkg-config (0.29.1-0ubuntu1) ...
Selecting previously unselected package libeigen3-dev.
Preparing to unpack .../libeigen3-dev_3.3~beta1-2_all.deb ...
Unpacking libeigen3-dev (3.3~beta1-2) ...
Setting up libboost1.58-dev:amd64 (1.58.0+dfsg-5ubuntu3.1) ...
Setting up libboost-dev:amd64 (1.58.0.1ubuntu1) ...
Setting up pkg-config (0.29.1-0ubuntu1) ...
Setting up libeigen3-dev (3.3~beta1-2) ...

so libeigen3-dev (3.3~beta1-2) and libboost1.58-dev:amd64 (1.58.0+dfsg-5ubuntu3.1).

but when I compile

Step 24/29 : RUN git clone https://github.com/OpenNMT/CTranslate.git &&   cd CTranslate &&   git submodule update --init &&   mkdir build &&   cd build &&   cmake -DCMAKE_CXX_FLAGS="-march=native" .. &&   make
 ---> Running in bd9def302cf0
Cloning into 'CTranslate'...
Submodule 'lib/tokenizer' (https://github.com/OpenNMT/Tokenizer.git) registered for path 'lib/tokenizer'
Cloning into 'lib/tokenizer'...
Submodule path 'lib/tokenizer': checked out '0d90868e52fce98b1eda968ef23db3668b6f0db9'
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Build type: Release
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp  
-- Could NOT find Boost
-- Eigen3 version 3.2.92 found in /usr/include/eigen3, but at least version 3.3 is required

the cmake does not detect boost or eigen3 in the system default folder.

Since I'm building on docker there are not previous packages installed, so it should find the ones I have installed via apt-get:

# OpenNMT dependencies: boot, eigen, intel mkl
RUN apt-get update && apt-get install -y \
  libboost-dev \
  libeigen3-dev -t xenial

where -t xenial is necessary to install the latest 3.3-beta.

The version ('80100') of the host compiler ('Apple clang') is not supported

The cmake .. was successfull:

[loretoparisi@:mbploreto build]$ cmake ..
-- Build type: Release
-- Try OpenMP C flag = [ ]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-fopenmp=libomp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [/openmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-Qopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-openmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-xopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [+Oopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-qsmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-mp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [ ]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-fopenmp=libomp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [/openmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-Qopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-openmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-xopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [+Oopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-qsmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-mp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Could NOT find OpenMP (missing:  OpenMP_C_FLAGS OpenMP_CXX_FLAGS) 
CMake Warning at CMakeLists.txt:22 (message):
  OpenMP not found: Compilation will not use OpenMP


-- Boost version: 1.64.0
-- Found the following Boost libraries:
--   program_options
-- Boost version: 1.64.0
-- Found the following Boost libraries:
--   program_options
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/loretoparisi/Documents/Projects/AI/CTranslate/build

while when doing make:

[loretoparisi@:mbploreto build]$ make
Scanning dependencies of target OpenNMTTokenizer
[  3%] Building CXX object lib/tokenizer/CMakeFiles/OpenNMTTokenizer.dir/src/BPE.cc.o
[  6%] Building CXX object lib/tokenizer/CMakeFiles/OpenNMTTokenizer.dir/src/CaseModifier.cc.o
[  9%] Building CXX object lib/tokenizer/CMakeFiles/OpenNMTTokenizer.dir/src/ITokenizer.cc.o
[ 12%] Building CXX object lib/tokenizer/CMakeFiles/OpenNMTTokenizer.dir/src/SpaceTokenizer.cc.o
[ 15%] Building CXX object lib/tokenizer/CMakeFiles/OpenNMTTokenizer.dir/src/Tokenizer.cc.o
[ 18%] Building CXX object lib/tokenizer/CMakeFiles/OpenNMTTokenizer.dir/src/unicode/Data.cc.o
[ 21%] Building CXX object lib/tokenizer/CMakeFiles/OpenNMTTokenizer.dir/src/unicode/Unicode.cc.o
[ 24%] Linking CXX shared library libOpenNMTTokenizer.dylib
[ 24%] Built target OpenNMTTokenizer
Scanning dependencies of target TH
[ 27%] Building C object lib/TH/CMakeFiles/TH.dir/THGeneral.c.o
[ 30%] Building C object lib/TH/CMakeFiles/TH.dir/THFile.c.o
[ 33%] Building C object lib/TH/CMakeFiles/TH.dir/THDiskFile.c.o
[ 36%] Linking C shared library libTH.dylib
[ 36%] Built target TH
[ 39%] Building NVCC (Device) object CMakeFiles/onmt.dir/src/cuda/onmt_generated_Kernels.cu.o
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvcc fatal   : The version ('80100') of the host compiler ('Apple clang') is not supported
CMake Error at onmt_generated_Kernels.cu.o.Release.cmake:222 (message):
  Error generating
  /Users/loretoparisi/Documents/Projects/AI/CTranslate/build/CMakeFiles/onmt.dir/src/cuda/./onmt_generated_Kernels.cu.o


make[2]: *** [CMakeFiles/onmt.dir/src/cuda/onmt_generated_Kernels.cu.o] Error 1
make[1]: *** [CMakeFiles/onmt.dir/all] Error 2
make: *** [all] Error 2

Unable to pipe to translate process in node

I'm using node.js with the translate executable that normally would run in pipe like this from the console:

echo "The quick brown fox jumps over the lazy dog" | ./cli/translate --model /root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7 --beam_size 5 -
Der <unk> Fuchs springt über den faulen Hund

while in node I'm doing like

    var args =[
            '--model',
            self._options.loadModel,
            '--beam_size',
            self._options.translate.beamSize,
             '-'
        ];
        var child=exec('translate', self._options.bin, args, self._options.child);
         child.stdin.setEncoding('utf-8');
        child.stdin.write( data + '\r\n' );

where my exec method creates a node child process and listens for data, errors, etc. (example):

    var exec = function(name,command,params,options) {
        var self=this;
        var _options = { detached: false };
        for (var attrname in options) { _options[attrname] = options[attrname]; }

        var created_time= ( new Date() ).toISOString();
        var task = require('child_process').spawn(command, params, _options);
        task.stdout.on('data', function(_data) { 
            //...
        });
        task.on('error', function(error) {
           //...
        });
        task.on('uncaughtException', function (error) {
            //...
        });
        task.stdout.on('end', function(data) {
            //...
        });
        task.stderr.on('data', function(data) {
           //...
        });
        task.on('close', (code, signal) => {
            //...
        });
        task.on('exit', function(code) {
            //...
        });
        return task;
    }//exec

This normally works for most of commands (in this case I'm using that for the tokenize/detokenize executable as well, with the same approach:

var args = [
            '--mode',
            self._options.cmd.mode,
            '-'
        ];
        var child=self.exec('tokenize', self._options.bin.tokenize, args, self._options.child);
        child.stdin.setEncoding('utf-8');
        child.stdin.write( data + '\r\n' );

While in the case of translate for some reason the | does not work programmatically. Is the source program reading from stdin and writing to stdout normally?

n_best support

Currently translate.lua supports the 'n_best' option which is useful.
Would be helpful to have this option in CTranslate as well.

Assertion `thtensor' failed.

I might do something very wrong but after a seemingly successful compilation I get this error when I want to start translation simply by
./cli/translate --model model.t7

$ Error: Assertion `thtensor' failed. at .../CTranslate/src/th/Obj.cc:378

Any quick help would be much appreciated.

CTranslate does not work with Deep bidirectional encoders

CTranslate silently quits if the model being loaded was trained with the -encoder_type dbrnn option. I tried to run cli/translate under gdb, but it did not give additional information, other than loading the model and silently exiting. A model trained with similar data and options, but -encoder_type brnn works fine.

Is this expected since Deep bidirectional encoders were introduced later in the lifetime of OpenNMT, and can one expect this to be implemented in CTranslate any time soon?

cmake -DEIGEN3_ROOT hint doesn't work if eigen3 exists in /usr/include

I tried to install CTranslate with a different path to eigen3. Setting EIGEN3_ROOT as a hint doesn't work if /usr/include/eigen3 exists.

I already had eigen3 2.5.7 installed in /usr/include/eigen3.

/n/w10-nruiz/nmt/CTranslate/build$ cmake3 -DEIGEN3_ROOT=/usr/local/include/eigen3 ..
-- Build type: Release
-- Boost version: 1.59.0
-- Found the following Boost libraries:
-- program_options
-- Eigen3 version 3.2.5 found in /usr/include/eigen3, but at least version 3.3 is required
-- Could NOT find MKL (missing: MKL_INCLUDE_DIR MKL_INTEL_LP64_LIBRARY MKL_GF_LP64_LIBRARY MKL_GNU_THREAD_LIBRARY MKL_CORE_LIBRARY)
-- Boost version: 1.59.0
-- Found the following Boost libraries:
-- program_options
-- Configuring done
-- Generating done
-- Build files have been written to: /n/w10-nruiz/nmt/CTranslate/build

Explicitly setting EIGEN3_INCLUDE_DIR worked, instead.

$ cmake3 -DEIGEN3_INCLUDE_DIR=/usr/local/include/eigen3 ..

It seems like the other hints were overwritten because a version of eigen3 was found in /usr/include.

Please fix and update readme.md to correct this information.

How to get the word embedding vectors ?

Now, I am using attention vectors to create alignment table. Then I also want to use embedding vectors to check the alignment accuracy. But how to retrieve the embedding vectors in the model ?

Windows 32-bit build fails (TH)

Specifically, compilation of THDiskFile.c fails because int32_t type is undefined. Resolution would be to remove the #ifdef _WIN64 from the include of <stdint.h> at the top of the file. This header defines int32_t for standard C, so no ifdef is needed.
This was fixed in the main Torch7 repo.
And what do you think of updating TH to the latest of the Torch7 repo or better yet to the actively developed C++ version in ATen? Thanks!

Can't find ITokenizer.h in onmt

/home/gaox/include/onmt/ITranslator.h:6:29: fatal error: onmt/ITokenizer.h: No such file or directory

#include "onmt/ITokenizer.h"

Clang compilation fails

Using clang compiler on Linux, get these errors:
/usr/bin/ld: CMakeFiles/translate.dir/translate.cc.o: undefined reference to symbol 'pthread_create@@GLIBC_2.2.5'
//lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
clang: error: linker command failed with exit code 1 (use -v to see invocation)
cli/CMakeFiles/translate.dir/build.make:150: recipe for target 'cli/translate' failed
make[2]: *** [cli/translate] Error 1
CMakeFiles/Makefile2:391: recipe for target 'cli/CMakeFiles/translate.dir/all' failed
make[1]: *** [cli/CMakeFiles/translate.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

Can resolve this by adding this line to the root CMakeLists.txt:
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -pthread")

Using CTranslate with image data (im2text)

I apologize if this is not the right place to ask a question.

I'm hoping you can provide guidance for adding support for im2text (https://github.com/OpenNMT/Im2Text) to this utility. I have a model trained in im2text, and I'd like to run it through this utility just for the forward pass in C++.

I imagine I'll need to write some code to load the image data and pass it to the utility? Just looking for advice/pointers -- Thanks!

Non-zero code:2 on build

I'm hitting the following error while building inside a docker container while running Ubuntu 18.04 LTS.

This was running fine a couple of days ago, has there been any update to anything recently?

Scanning dependencies of target onmt [ 45%] Building CXX object CMakeFiles/onmt.dir/src/th/Env.cc.o [ 47%] Building CXX object CMakeFiles/onmt.dir/src/th/Obj.cc.o [ 50%] Building CXX object CMakeFiles/onmt.dir/src/th/Utils.cc.o [ 52%] Building CXX object CMakeFiles/onmt.dir/src/Dictionary.cc.o [ 55%] Building CXX object CMakeFiles/onmt.dir/src/SubDict.cc.o [ 57%] Building CXX object CMakeFiles/onmt.dir/src/PhraseTable.cc.o [ 60%] Building CXX object CMakeFiles/onmt.dir/src/Profiler.cc.o [ 62%] Building CXX object CMakeFiles/onmt.dir/src/ITranslator.cc.o [ 65%] Building CXX object CMakeFiles/onmt.dir/src/TranslatorFactory.cc.o CMakeFiles/onmt.dir/build.make:254: recipe for target 'CMakeFiles/onmt.dir/src/TranslatorFactory.cc.o' failed CMakeFiles/Makefile2:68: recipe for target 'CMakeFiles/onmt.dir/all' failed Makefile:127: recipe for target 'all' failed c++: internal compiler error: Killed (program cc1plus) Please submit a full bug report, with preprocessed source if appropriate. See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions. make[2]: *** [CMakeFiles/onmt.dir/src/TranslatorFactory.cc.o] Error 4 make[1]: *** [CMakeFiles/onmt.dir/all] Error 2 make: *** [all] Error 2 The command '/bin/sh -c make' returned a non-zero code: 2

segfault from destructor when instantiating multiple instances of translator

Instantiating and translating with multiple instances of translator works fine.
But destroying these instances results in a Segmentation fault.

Minimal example:

#include <memory>
#include "onmt/onmt.h"

int main(int argc, char **argv){
 std::unique_ptr<onmt::ITranslator> translator1 = onmt::TranslatorFactory::build("model1.t7");
 std::unique_ptr<onmt::ITranslator> translator2 = onmt::TranslatorFactory::build("model2.t7");
 return 0;
}

Bad performance than translate.lua

The performance of using CTranslate is not as good as Translate.lua. Are there any tips i missed in converting GPU model to CPU model?

Another request, I want to get the n_best and its corresponding score. But now the demo lacks these apis, do you have any plans for creating them?

implement the feature of gold data score

Hi, thanks for your good job!
since this project did not implement the feature of gold data score and i need it.
can you tell me how can i implement it and is it diffcult to implement it ?
Thank you.

cmake fails to find Intel MKL on Windows

On a Windows machine with Intel MKL installed, running cmake does not find the MKL paths, even when setting a MKLROOT environment variable. I have observed that MKL is found if I replace cmake/FindMKL.cmake with https://github.com/marian-nmt/marian-dev/blob/master/cmake/FindMKL.cmake.

Predict error in some specific sentence

An error occurred when input some specific senctence.

terminate called after throwing an instance of 'std::length_error'
what(): vector::_M_default_append
Aborted (core dumped)

Print gdb info :

[New LWP 5116]
[New LWP 5301]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./cli/translate --model ../../ww110750_model/diffBig-21m_epoch3_5.40_release.t7'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f684bd46c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f684bd46c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f684bd4a028 in __GI_abort () at abort.c:89
#2 0x00007f684c34b535 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007f684c3496d6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007f684c349703 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007f684c349922 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007f684c39b3a7 in std::__throw_length_error(char const*) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x00007f684c8a2ba5 in std::vector<unsigned long, std::allocator >::_M_default_append(unsigned long) ()
from /home/ww110750/CTranslate/build/libonmt.so
#8 0x00007f684c8acc17 in onmt::Translator<onmt::Eigen::MatrixBatch, Eigen::Map<Eigen::Matrix<float, -1, -1, 1, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 1, -1, -1> const, 0, Eigen::Stride<0, 0> >, float>::decode(std::vector<std::vector<std::string, std::allocatorstd::string >, std::allocator<std::vector<std::string, std::allocatorstd::string > > > const&, unsigned long, std::vector<onmt::Eigen::MatrixBatch, std::allocator<onmt::Eigen::MatrixBatch > >&, onmt::Eigen::MatrixBatch&) ()
from /home/ww110750/CTranslate/build/libonmt.so
#9 0x00007f684c8ad32e in onmt::Translator<onmt::Eigen::MatrixBatch, Eigen::Map<Eigen::Matrix<float, -1, -1, 1, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 1, -1, -1> const, 0, Eigen::Stride<0, 0> >, float>::translate_batch(std::vector<std::vector<std::string, std::allocatorstd::string >, std::allocator<std::vector<std::string, std::allocatorstd::string > > > const&, std::vector<std::vector<std::vector<std::string, std::allocatorstd::string >, std::allocator<std::vector<std::string, std::allocatorstd::string > > >, std::allocator<std::vector<std::vector<std::string, std::allocatorstd::string >, std::allocator<std::vector<std::string, std::allocatorstd::string > > > > > const&) () from /home/ww110750/CTranslate/build/libonmt.so
#10 0x000000000040c972 in main ()

Does this code work for a model trained with recent OpenNMT pytorch version?

I trained a model with OpenNMT with pytorch 0.4. I would like to load the trained model in c++ and translate an input string with it. Can I do that with this version of CTranslate? When I use torch.save in OpenNMT to save the model, I get a pt file and when I try to load it with a small test script that I was able to compile I just get this error:

undefined object=176816768
$ Error: Assertion `0' failed. at /data/esther/Projects/G2P/lts_experiments/scripts/OpenNMT-CTranslate/src/th/Obj.cc:400

I did notice the example in the Readme file shows a model with a t7 extension, so maybe it is not compatible? Any ideas on how to make them compatible?

Command used to compile:
g++ -O3 -std=c++11 -o test_translate test_translate.cc -IOpenNMT-CTranslate/include -IOpenNMT-CTranslate/lib/TH -IOpenNMT/lib/tokenizer/include -I/usr/include/eigen3 -LOpenNMT-CTranslate/build -lonmt

g++ version 4.8.5
CentOS 7

Compile errors in Visual Studio

I successfully compiled this repo in Ubuntu and want to transfer the lib to windows.
With Cmake and Visual Studio, I tried but got some errors that cannot compile.
It takes me 3 days to work on this and still not working.
The error is in this line (matrixbatch.h 108):
this->row(b).noalias() = Map<MatrixBatchBase>(mat.data(), 1, mat.cols() * mat.rows());
"
Error C2672 'Eigen::PlainObjectBase<Eigen::Matrix<float,-1,-1,1,-1,-1>>::Map': no matching overloaded function found onmt c:\code\ctranslate\include\onmt\eigen\matrixbatch.h 108
Error C2975 'Eigen::PlainObjectBase<Eigen::Matrix<float,-1,-1,1,-1,-1>>::Map': invalid template argument for 'Outer', expected compile-time constant expression onmt c:\code\ctranslate\include\onmt\eigen\matrixbatch.h 108
"