Comments (9)
This is interesting.
Seems like there's some permission problems when I tried to get the training tools through wget:
alvas@ubi:~/test-out-of-box/training-tools$ ls -lah *
-rw-rw-r-- 1 alvas alvas 914K Jan 29 16:16 d4norm
-rw-rw-r-- 1 alvas alvas 919K Jan 29 16:16 hmmnorm
-rw-rw-r-- 1 alvas alvas 2.1K Jan 29 16:16 merge_alignment.py
-rw-rw-r-- 1 alvas alvas 1.1M Jan 29 16:16 mgiza
-rw-rw-r-- 1 alvas alvas 336K Jan 29 16:16 mkcls
-rw-rw-r-- 1 alvas alvas 43K Jan 29 16:16 plain2snt
-rw-rw-r-- 1 alvas alvas 38K Jan 29 16:16 snt2cooc
-rw-rw-r-- 1 alvas alvas 29K Jan 29 16:16 snt2coocrmp
-rw-rw-r-- 1 alvas alvas 33K Jan 29 16:16 snt2plain
-rw-rw-r-- 1 alvas alvas 48K Jan 29 16:16 symal
After I did a chmod 777
, it works:
alvas@ubi:~/test-out-of-box/training-tools$ ls
d4norm hmmnorm merge_alignment.py mgiza mkcls plain2snt snt2cooc snt2coocrmp snt2plain symal
alvas@ubi:~/test-out-of-box/training-tools$ chmod 777 *
alvas@ubi:~/test-out-of-box/training-tools$ cd ..
alvas@ubi:~/test-out-of-box$ ls
Europarl.de-en.de Europarl.de-en.en LexicalTranslationModel.pm training-tools train-model.perl
alvas@ubi:~/test-out-of-box$ perl train-model.perl --external-bin-dir training-tools/ --mgiza
Using SCRIPTS_ROOTDIR: /home/alvas/test-out-of-box
Using multi-thread GIZA
using gzip
ERROR: use --corpus to specify corpus at train-model.perl line 379.
But is there a safer way to change the permission? What sorts of permission does train-model.perl
need? Doing chmod 777
works but it's a little unsafe.
from mosesdecoder.
It's sort of digging into the closet but seems like train-model.perl
is behaving weirdly.
When I ran:
perl train-model.perl --root-dir . --model-dir model --corpus Europarl.de-en --f en --e de --external-bin-dir "training-tools" --mgiza --parallel --first-step 1 --last-step 3
mkcls
and mgiza
completes and when the script is trying to stitch the results, train-model.perl
starts to behave weirdly and looks for the moses/bin/symal
instead of $_EXTERNAL_BINDIR/symal
.
Using SCRIPTS_ROOTDIR: /home/alvas/test-out-of-box
Using multi-thread GIZA
using gzip
(1) preparing corpus @ Tue May 19 02:05:17 CEST 2015
Executing: mkdir -p /home/alvas/test-out-of-box/corpus
(1.0) selecting factors @ Tue May 19 02:05:17 CEST 2015
Forking...
(1.1) running mkcls @ Tue May 19 02:05:17 CEST 2015
/home/alvas/test-out-of-box/training-tools/mkcls -c50 -n2 -p/home/alvas/test-out-of-box/Europarl.de-en.en -V/home/alvas/test-out-of-box/corpus/en.vcb.classes opt
/home/alvas/test-out-of-box/corpus/en.vcb.classes already in place, reusing
(1.2) creating vcb file /home/alvas/test-out-of-box/corpus/en.vcb @ Tue May 19 02:05:17 CEST 2015
(1.1) running mkcls @ Tue May 19 02:05:17 CEST 2015
/home/alvas/test-out-of-box/training-tools/mkcls -c50 -n2 -p/home/alvas/test-out-of-box/Europarl.de-en.de -V/home/alvas/test-out-of-box/corpus/de.vcb.classes opt
/home/alvas/test-out-of-box/corpus/de.vcb.classes already in place, reusing
(1.2) creating vcb file /home/alvas/test-out-of-box/corpus/de.vcb @ Tue May 19 02:05:17 CEST 2015
(1.3) numberizing corpus /home/alvas/test-out-of-box/corpus/en-de-int-train.snt @ Tue May 19 02:05:17 CEST 2015
/home/alvas/test-out-of-box/corpus/en-de-int-train.snt already in place, reusing
(1.3) numberizing corpus /home/alvas/test-out-of-box/corpus/de-en-int-train.snt @ Tue May 19 02:05:17 CEST 2015
/home/alvas/test-out-of-box/corpus/de-en-int-train.snt already in place, reusing
Waiting for mkcls processes to finish...
(2) running giza @ Tue May 19 02:05:17 CEST 2015
(2.1a) running snt2cooc de-en @ Tue May 19 02:05:17 CEST 2015
Executing: mkdir -p /home/alvas/test-out-of-box/giza.de-en
/home/alvas/test-out-of-box/training-tools/snt2cooc /home/alvas/test-out-of-box/giza.de-en/de-en.cooc /home/alvas/test-out-of-box/corpus/en.vcb /home/alvas/test-out-of-box/corpus/de.vcb /home/alvas/test-out-of-box/corpus/de-en-int-train.snt
Executing: /home/alvas/test-out-of-box/training-tools/snt2cooc /home/alvas/test-out-of-box/giza.de-en/de-en.cooc /home/alvas/test-out-of-box/corpus/en.vcb /home/alvas/test-out-of-box/corpus/de.vcb /home/alvas/test-out-of-box/corpus/de-en-int-train.snt
(2.1a) running snt2cooc en-de @ Tue May 19 02:05:17 CEST 2015
Executing: mkdir -p /home/alvas/test-out-of-box/giza.en-de
/home/alvas/test-out-of-box/training-tools/snt2cooc /home/alvas/test-out-of-box/giza.en-de/en-de.cooc /home/alvas/test-out-of-box/corpus/de.vcb /home/alvas/test-out-of-box/corpus/en.vcb /home/alvas/test-out-of-box/corpus/en-de-int-train.snt
Executing: /home/alvas/test-out-of-box/training-tools/snt2cooc /home/alvas/test-out-of-box/giza.en-de/en-de.cooc /home/alvas/test-out-of-box/corpus/de.vcb /home/alvas/test-out-of-box/corpus/en.vcb /home/alvas/test-out-of-box/corpus/en-de-int-train.snt
END.
END.
(2.1b) running giza de-en @ Tue May 19 02:05:17 CEST 2015
/home/alvas/test-out-of-box/training-tools/mgiza -CoocurrenceFile /home/alvas/test-out-of-box/giza.de-en/de-en.cooc -c /home/alvas/test-out-of-box/corpus/de-en-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -ncpus 4 -nodumps 1 -nsmooth 4 -o /home/alvas/test-out-of-box/giza.de-en/de-en -onlyaldumps 1 -p0 0.999 -s /home/alvas/test-out-of-box/corpus/en.vcb -t /home/alvas/test-out-of-box/corpus/de.vcb
/home/alvas/test-out-of-box/giza.de-en/de-en.A3.final.gz seems finished, reusing.
Waiting for second GIZA process...
(2.1b) running giza en-de @ Tue May 19 02:05:17 CEST 2015
/home/alvas/test-out-of-box/training-tools/mgiza -CoocurrenceFile /home/alvas/test-out-of-box/giza.en-de/en-de.cooc -c /home/alvas/test-out-of-box/corpus/en-de-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -ncpus 4 -nodumps 1 -nsmooth 4 -o /home/alvas/test-out-of-box/giza.en-de/en-de -onlyaldumps 1 -p0 0.999 -s /home/alvas/test-out-of-box/corpus/de.vcb -t /home/alvas/test-out-of-box/corpus/en.vcb
/home/alvas/test-out-of-box/giza.en-de/en-de.A3.final.gz seems finished, reusing.
(3) generate word alignment @ Tue May 19 02:05:17 CEST 2015
Combining forward and inverted alignment from files:
/home/alvas/test-out-of-box/giza.en-de/en-de.A3.final.{bz2,gz}
/home/alvas/test-out-of-box/giza.de-en/de-en.A3.final.{bz2,gz}
Executing: mkdir -p /home/alvas/test-out-of-box/model
Executing: /home/alvas/test-out-of-box/training/giza2bal.pl -d "gzip -cd /home/alvas/test-out-of-box/giza.de-en/de-en.A3.final.gz" -i "gzip -cd /home/alvas/test-out-of-box/giza.en-de/en-de.A3.final.gz" |/home/alvas/test-out-of-box/../bin/symal -alignment="grow" -diagonal="yes" -final="yes" -both="no" > /home/alvas/test-out-of-box/model/aligned.grow-diag-final
sh: 1: /home/alvas/test-out-of-box/training/giza2bal.pl: not found
sh: 1: /home/alvas/test-out-of-box/../bin/symal: not found
Exit code: 127
ERROR: Can't generate symmetrized alignment file
Also, $SCRIPTS_ROOTDIR
seems to be controlling where train-model.perl
finds the complimentary scripts. This is unavoidable, unless we allow $SCRIPTS_ROOTDIR
to be customize-able but it will lead to a whole lot of other problems.
from mosesdecoder.
Solution: Use Moses scripts as they are compiled and installed normally.
Enlightenment: Training scripts don't work out of the box.
For more info: https://github.com/alvations/usaarhat-repo/blob/master/Align-A-Line.md
from mosesdecoder.
Use the ‘x’ permission bit on anything that you want to be able to execute. Strictly speaking if it's in your home directory you probably only need that permission for the file's owner (you), but the usual and simple thing is to allow it for all users.
So, to permit execution of a file, do::
chmod a+x $MYFILE
from mosesdecoder.
@jtv, thanks for the chmod permission solution!!! But the problems that comes after the permission is a little harder to resolve because it's closely tied to the pseudo-static path that train-model.perl
tries to use.
from mosesdecoder.
Sorry to jump into a closed thread, but I'm having a similar issue and I'm not sure why this was closed. train-model.perl
is failing to find symal
because it's looking for "$SCRIPTS_ROOTDIR/../bin/symal"
and not "$_EXTERNAL_BINDIR/symal"
or even the symal
in the Moses bin dir (which for me is not a sibling of $SCRIPTS_ROOTDIR
). Here's the offending line: https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/train-model.perl#L466
I'm using EMS, and here are the relevant paths from my config file:
moses-src-dir = /NLP_TOOLS/mt_tools/moses/v3.0-release
moses-bin-dir = $moses-src-dir/bin
moses-script-dir = $moses-src-dir/src/scripts
external-bin-dir = /NLP_TOOLS/mt_tools/mgizapp/latest/bin
Note that the while the bin-dir is under $moses-src-dir/bin
, the script-dir is another level lower ($moses-src-dir/src/scripts
). This install is on my university's cluster and I don't have permissions to move things around.
Why does train-model.perl assume the bin-dir is a sibling to the script-dir when it has both the $moses-bin-dir and $external-bin-dir variables available?
from mosesdecoder.
@goodmami Last year, I ended up modifying the path in the train-model.perl
to suit my machine. I've changed all the path to the binaries and path to other specific perl scripts with static path.
The assumption for my $SYMAL = "$SCRIPTS_ROOTDIR/../bin/symal";
is because it assumes that Moses is installed as per the instructions from http://www.statmt.org/moses/?n=Development.GetStarted such that the moses is installed with path like this:
alvas@ubi:~$ cd mosesdecoder/
alvas@ubi:~/mosesdecoder$ ls
biconcor defer mert OnDiskPt scripts
bin doc mingw phrase-extract search
bjam jam-files mira previous.sh symal
BUILD-INSTRUCTIONS.txt Jamroot misc regression-testing util
contrib lib moses sample-models vw
cruise-control lm moses-cmd sample-models.tgz
alvas@ubi:~/mosesdecoder$ cd scripts/
alvas@ubi:~/mosesdecoder/scripts$ ls
analysis generic other regression-testing tests Transliteration
ems Jamfile README server tokenizer
fuzzy-match OSM recaser share training
alvas@ubi:~/mosesdecoder/scripts$ cd ../bin
alvas@ubi:~/mosesdecoder/bin$ ls
1-1-Extraction filter processLexicalTable
biconcor fragment processPhraseTable
build_binary generateSequences project-cache.jam
config.log kbmira prunePhraseTable
consolidate lexical-reordering-score query
consolidate-direct lmbrgrid queryLexicalTable
consolidate-reverse lmplz queryOnDiskPt
CreateOnDiskPt merge-sorted queryPhraseTable
dump_counts mert relax-parse
evaluator mira score
extract moses sentence-bleu
extract-ghkm moses_chart statistics
extract-lex pcfg-extract symal
extract-mixed-syntax pcfg-score TMining
extractor phrase-lookup
extract-rules pro
from mosesdecoder.
The TL;D12R
way would be something like:
cd /path/to/
wget http://www.statmt.org/moses/RELEASE-3.0/binaries/linux-64bit/linux-64bit.tgz
tar zxvf linux-64bit.tgz
mv linux-64bit mosesdecoder
chmod a+x -R mosesdecoder
Since the scripts and EMS should not use the source directly, In the config file, you can do this:
moses-src-dir = /path/to/mosesdecoder
moses-bin-dir = $moses-src-dir/bin
moses-script-dir = $moses-src-dir/scripts
external-bin-dir = $moses-src-dir/training-tools
from mosesdecoder.
Thanks @alvations. I didn't install it myself, and our sysadmin claims to have followed the normal install. According to the link you provided (http://www.statmt.org/moses/?n=Development.GetStarted) (emphasis added):
--install-scripts=/path/to/scripts
copies scripts into a directory. Does not install if missing. No argument defaults toPREFIX/scripts
.
Since the directory didn't exist as a sibling to the bindir, I'm guessing he didn't provide the --install-scripts
option, which in the installation instructions is under "Popular additional bjam options" and not the "easy setup" heading. Even if the option is used, it's possible to provide a path that isn't the default, in which case the train-model.perl
script would still fail because of the directory location assumption.
Anyway, we fixed that problem by symlinking the scripts directory at the expected location, but my original question still stands (emphasis added):
Why does train-model.perl assume the bin-dir is a sibling to the script-dir when it has both the $moses-bin-dir and $external-bin-dir variables available?
I think this hardcoding of the path assumption is a bug. I'd be happy to submit a PR, but I'm not sure what to fix. Maybe I'd just need to change whatever calls train-model.perl
to provide the appropriate command-line options, but maybe I'd also need to change train-model.perl
to actually use them?
Thanks!
from mosesdecoder.
Related Issues (20)
- Lexical reordering scoring failed at /home/ubuntu/Moses/mosesdecoder/scripts/training/train-model.perl line 1924. HOT 2
- Please don't create new issues HOT 4
- No abbreviation Files Found HOT 1
- Evaluation with multi-bleu.perl or multi-bleu-detok.perl HOT 16
- Placeholders should be separated by comma HOT 4
- tiny weights after tuning HOT 2
- sentence-splitter HOT 10
- Question:Related translation models.
- How to increase BLEU? HOT 1
- PROBLEM: alignment is 0. HOT 2
- normalize-punctuation.perl Change the Chinese punctuation marks in English sentences into English. HOT 1
- symal crashes on Linux after latest update HOT 3
- It seems the home page is not working HOT 1
- train-model.perl failed HOT 4
- Tunning translation model failed with this error
- Replace non-breaking space with regular space HOT 1
- tokenizer.perl supported language HOT 2
- symal: permission denied HOT 4
- Looking for Arabic/English demo
- webshell exists in the project HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mosesdecoder.