Comments (12)
Is the source program reading from stdin and writing to stdout normally?
Yes. However, it will try to read --batch_size
lines unless the special control character EOF
is received.
So for your application, you certainly want to set --batch_size 1
.
from ctranslate.
+ "\r\n"
This seems to be the issue by the way.
from ctranslate.
@guillaumekln thank you so much!!!! It perfectly worked now!
[loretoparisi@:mbploreto opennmt]$ node translate.js
[ '--model',
'/root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7',
'--beam_size',
5,
'--batch_size',
'1',
'-' ]
----data <unk> der <unk> Fuchs über die faulen <unk>
<unk> der <unk> Fuchs über die faulen <unk>
SOURCE (en) "The quick brown fox jumps over the lazy dog"
DEST (de) "<unk> der <unk> Fuchs über die faulen <unk>\n"
exec:translate end.
exec:translate exit.
task:translate pid:15115 terminated due to receipt of signal:SIGINT
[loretoparisi@:mbploreto opennmt]$
from ctranslate.
@guillaumekln sorry just noted that. When using --batch_size=1
I have a slightly different translation:
source (en): "The quick brown fox jumps over the lazy dog"
dest (de) (from bash, params: --beam_size 5
): Der <unk> Fuchs springt über den faulen Hund
dest (from node script, params: --beam_size 5 --batch_size 1
): <unk> der <unk> Fuchs über die faulen <unk>
from ctranslate.
I think there is something else. Can you reproduce it when directly invoking cli/translate
on the command line?
from ctranslate.
nope, with command line trying different parameters:
[loretoparisi@:mbploreto build]$ echo "The quick brown fox jumps over the lazy dog" | ./cli/translate --model /root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7 --beam_size 5 -
Der <unk> Fuchs springt über den faulen Hund
[loretoparisi@:mbploreto build]$ echo "The quick brown fox jumps over the lazy dog" | ./cli/translate --model /root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7
Der <unk> Fuchs springt über den faulen Hund
[loretoparisi@:mbploreto build]$ echo "The quick brown fox jumps over the lazy dog" | ./cli/translate --model /root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7 --batch_size 1 --beam_size 5
Der <unk> Fuchs springt über den faulen Hund
I always get the same output: Der <unk> Fuchs springt über den faulen Hund
.
Programmatically in node
I'm passing:
[ '--model',
'/root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7',
'--beam_size',
5,
'--batch_size',
1,
'-' ]
and the input text "The quick brown fox jumps over the lazy dog" + "\r\n"
.
from ctranslate.
The command line is the reference so if you are getting another output there is something going on in your application.
from ctranslate.
@guillaumekln Yes confirmed!!!
[loretoparisi@:mbploreto opennmt]$ node translate.js
[ '--model',
'/root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7',
'--beam_size',
5,
'--batch_size',
1,
'-' ]
Der <unk> Fuchs springt über den faulen Hund
SOURCE (en) "The quick brown fox jumps over the lazy dog"
DEST (de) "Der <unk> Fuchs springt über den faulen Hund\n"
exec:translate end.
exec:translate exit.
task:translate pid:54209 terminated due to receipt of signal:SIGINT
[loretoparisi@:mbploreto opennmt]$
My write function now looks like
/**
* Send data to child process
*/
this.send = function(data) {
this.child.stdin.setEncoding('utf-8');
this.child.stdin.write( data + '\n' );
}//send
I also realize that the same happened when doing text summarization, so now it works:
task:translate pid:54209 terminated due to receipt of signal:SIGINT
[loretoparisi@:mbploreto opennmt]$ node textsum.js
[ '--model',
'/root/textsum_epoch7_14.69_release.t7',
'--beam_size',
10,
'--batch_size',
1,
'-' ]
night never just my bed smell
SOURCE (en) "Last night you were in my room And now my bed sheets smell like you Every day discovering something brand new"
DEST (-) "night never just my bed smell\n"
exec:translate end.
exec:translate exit.
task:translate pid:54229 terminated due to receipt of signal:SIGINT
[loretoparisi@:mbploreto opennmt]$
Thank you.
from ctranslate.
@guillaumekln Sorry here for all these questions! Prefer to write here, since it's related to the command line and more than a performance question than an issue. I have noticed that iterating over several lines to translate performances decrease as the number of lines grows.
Of course I'm still using --batch_size=1
, so my question is: Is the model load at every call in this iteration?
I suppose this since it ends up with a memory leak: (node:61283) Warning: Possible EventEmitter memory leak detected. 11 unpipe listeners added. Use emitter.setMaxListeners() to increase limit
, I think due to a OOM
issue.
Considering that the number of lines to translate changes every time and I need to keep the translation by line (executing within annode
process), how to handle that?
A example.
A similar translation task that I'm doing using Facebook Fairseq. In this case, the command line tool loads the model once, then I just send data to the child process stdin
and the model executes the beam search, so that there is no OOM
in this case.
Thank you.
from ctranslate.
Is the model load at every call in this iteration?
No. It will only be loaded when cli/translate
is started and unloaded when the process dies.
You should be able to achieve the same approach as you described for fairseq. Keep stdin
open and write line by line.
from ctranslate.
@guillaumekln thanks I will try that way!
from ctranslate.
Thank you, it works as expected!!!
[loretoparisi@:mbploreto opennmt]$ node translate.js
Module:OpenNMT.en-de of OpenNMT loaded.
[ '--model',
'/root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7',
'--beam_size',
5,
'--batch_size',
1,
'-' ]
<unk>
OpenNMT.load
OpenNMT.translate: translating [0] Ayy, I remember syrup sandwiches and crime allowances
OpenNMT.translate: translating [1] Finesse a nigga with some counterfeits
OpenNMT.translate: translating [2] Parmesan where my accountant lives
<unk> Ich erinnere mich an <unk> und <unk>
OpenNMT.translate: translated [0]
<unk> Ich erinnere mich an <unk> und <unk>
<unk> mit einigen Fälschungen
OpenNMT.translate: translated [1]
<unk> mit einigen Fälschungen
<unk> , wo mein Buchhalter lebt .
OpenNMT.translate: translated [2]
<unk> , wo mein Buchhalter lebt .
OpenNMT.translate: translated:3
[ { line: 0,
source: 'Ayy, I remember syrup sandwiches and crime allowances',
target: '<unk> Ich erinnere mich an <unk> und <unk>\n' },
{ line: 1,
source: 'Finesse a nigga with some counterfeits',
target: '<unk> mit einigen Fälschungen\n' },
{ line: 2,
source: 'Parmesan where my accountant lives',
target: '<unk> , wo mein Buchhalter lebt .\n' } ]
OpenNMT.unload
exec:translate end.
exec:translate exit.
task:translate pid:71271 terminated due to receipt of signal:SIGINT
from ctranslate.
Related Issues (20)
- Predict error in some specific sentence HOT 2
- How to get the word embedding vectors ? HOT 6
- Using CTranslate with image data (im2text) HOT 1
- Assertion `thtensor' failed. HOT 2
- The version ('80100') of the host compiler ('Apple clang') is not supported HOT 15
- How to use with GigaWord Pretrained Text Summarization HOT 9
- Boost and Eigen 3.3 on Ubuntu 16.04 LTS HOT 15
- Can't find ITokenizer.h in onmt HOT 1
- CTranslate + Tokenizer with case_feature models. HOT 1
- cmake -DEIGEN3_ROOT hint doesn't work if eigen3 exists in /usr/include HOT 1
- Does this code work for a model trained with recent OpenNMT pytorch version? HOT 2
- implement the feature of gold data score HOT 1
- Compile errors in Visual Studio HOT 2
- Clang compilation fails HOT 1
- CTranslate does not work with Deep bidirectional encoders HOT 2
- Does CTranslate support distill-tiny model defined in Paper? HOT 4
- Windows 32-bit build fails (TH) HOT 1
- Non-zero code:2 on build HOT 1
- cmake fails to find Intel MKL on Windows
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ctranslate.