Giter Site home page Giter Site logo

Comments (9)

alvations avatar alvations commented on August 22, 2024

This is interesting.

Seems like there's some permission problems when I tried to get the training tools through wget:

alvas@ubi:~/test-out-of-box/training-tools$ ls -lah *
-rw-rw-r-- 1 alvas alvas 914K Jan 29 16:16 d4norm
-rw-rw-r-- 1 alvas alvas 919K Jan 29 16:16 hmmnorm
-rw-rw-r-- 1 alvas alvas 2.1K Jan 29 16:16 merge_alignment.py
-rw-rw-r-- 1 alvas alvas 1.1M Jan 29 16:16 mgiza
-rw-rw-r-- 1 alvas alvas 336K Jan 29 16:16 mkcls
-rw-rw-r-- 1 alvas alvas  43K Jan 29 16:16 plain2snt
-rw-rw-r-- 1 alvas alvas  38K Jan 29 16:16 snt2cooc
-rw-rw-r-- 1 alvas alvas  29K Jan 29 16:16 snt2coocrmp
-rw-rw-r-- 1 alvas alvas  33K Jan 29 16:16 snt2plain
-rw-rw-r-- 1 alvas alvas  48K Jan 29 16:16 symal

After I did a chmod 777, it works:

alvas@ubi:~/test-out-of-box/training-tools$ ls
d4norm  hmmnorm  merge_alignment.py  mgiza  mkcls  plain2snt  snt2cooc  snt2coocrmp  snt2plain  symal
alvas@ubi:~/test-out-of-box/training-tools$ chmod 777 *
alvas@ubi:~/test-out-of-box/training-tools$ cd ..
alvas@ubi:~/test-out-of-box$ ls
Europarl.de-en.de  Europarl.de-en.en  LexicalTranslationModel.pm  training-tools  train-model.perl
alvas@ubi:~/test-out-of-box$ perl train-model.perl --external-bin-dir training-tools/ --mgiza
Using SCRIPTS_ROOTDIR: /home/alvas/test-out-of-box
Using multi-thread GIZA
using gzip 
ERROR: use --corpus to specify corpus at train-model.perl line 379.

But is there a safer way to change the permission? What sorts of permission does train-model.perl need? Doing chmod 777 works but it's a little unsafe.

from mosesdecoder.

alvations avatar alvations commented on August 22, 2024

It's sort of digging into the closet but seems like train-model.perl is behaving weirdly.

When I ran:

perl train-model.perl --root-dir .  --model-dir model --corpus Europarl.de-en --f en --e de  --external-bin-dir "training-tools" --mgiza --parallel --first-step 1 --last-step 3

mkcls and mgiza completes and when the script is trying to stitch the results, train-model.perl starts to behave weirdly and looks for the moses/bin/symal instead of $_EXTERNAL_BINDIR/symal.

Using SCRIPTS_ROOTDIR: /home/alvas/test-out-of-box
Using multi-thread GIZA
using gzip 
(1) preparing corpus @ Tue May 19 02:05:17 CEST 2015
Executing: mkdir -p /home/alvas/test-out-of-box/corpus
(1.0) selecting factors @ Tue May 19 02:05:17 CEST 2015
Forking...
(1.1) running mkcls  @ Tue May 19 02:05:17 CEST 2015
/home/alvas/test-out-of-box/training-tools/mkcls -c50 -n2 -p/home/alvas/test-out-of-box/Europarl.de-en.en -V/home/alvas/test-out-of-box/corpus/en.vcb.classes opt
  /home/alvas/test-out-of-box/corpus/en.vcb.classes already in place, reusing
(1.2) creating vcb file /home/alvas/test-out-of-box/corpus/en.vcb @ Tue May 19 02:05:17 CEST 2015
(1.1) running mkcls  @ Tue May 19 02:05:17 CEST 2015
/home/alvas/test-out-of-box/training-tools/mkcls -c50 -n2 -p/home/alvas/test-out-of-box/Europarl.de-en.de -V/home/alvas/test-out-of-box/corpus/de.vcb.classes opt
  /home/alvas/test-out-of-box/corpus/de.vcb.classes already in place, reusing
(1.2) creating vcb file /home/alvas/test-out-of-box/corpus/de.vcb @ Tue May 19 02:05:17 CEST 2015
(1.3) numberizing corpus /home/alvas/test-out-of-box/corpus/en-de-int-train.snt @ Tue May 19 02:05:17 CEST 2015
  /home/alvas/test-out-of-box/corpus/en-de-int-train.snt already in place, reusing
(1.3) numberizing corpus /home/alvas/test-out-of-box/corpus/de-en-int-train.snt @ Tue May 19 02:05:17 CEST 2015
  /home/alvas/test-out-of-box/corpus/de-en-int-train.snt already in place, reusing
Waiting for mkcls processes to finish...
(2) running giza @ Tue May 19 02:05:17 CEST 2015
(2.1a) running snt2cooc de-en @ Tue May 19 02:05:17 CEST 2015

Executing: mkdir -p /home/alvas/test-out-of-box/giza.de-en
/home/alvas/test-out-of-box/training-tools/snt2cooc /home/alvas/test-out-of-box/giza.de-en/de-en.cooc /home/alvas/test-out-of-box/corpus/en.vcb /home/alvas/test-out-of-box/corpus/de.vcb /home/alvas/test-out-of-box/corpus/de-en-int-train.snt
Executing: /home/alvas/test-out-of-box/training-tools/snt2cooc /home/alvas/test-out-of-box/giza.de-en/de-en.cooc /home/alvas/test-out-of-box/corpus/en.vcb /home/alvas/test-out-of-box/corpus/de.vcb /home/alvas/test-out-of-box/corpus/de-en-int-train.snt
(2.1a) running snt2cooc en-de @ Tue May 19 02:05:17 CEST 2015

Executing: mkdir -p /home/alvas/test-out-of-box/giza.en-de
/home/alvas/test-out-of-box/training-tools/snt2cooc /home/alvas/test-out-of-box/giza.en-de/en-de.cooc /home/alvas/test-out-of-box/corpus/de.vcb /home/alvas/test-out-of-box/corpus/en.vcb /home/alvas/test-out-of-box/corpus/en-de-int-train.snt
Executing: /home/alvas/test-out-of-box/training-tools/snt2cooc /home/alvas/test-out-of-box/giza.en-de/en-de.cooc /home/alvas/test-out-of-box/corpus/de.vcb /home/alvas/test-out-of-box/corpus/en.vcb /home/alvas/test-out-of-box/corpus/en-de-int-train.snt
END.
END.
(2.1b) running giza de-en @ Tue May 19 02:05:17 CEST 2015
/home/alvas/test-out-of-box/training-tools/mgiza  -CoocurrenceFile /home/alvas/test-out-of-box/giza.de-en/de-en.cooc -c /home/alvas/test-out-of-box/corpus/de-en-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -ncpus 4 -nodumps 1 -nsmooth 4 -o /home/alvas/test-out-of-box/giza.de-en/de-en -onlyaldumps 1 -p0 0.999 -s /home/alvas/test-out-of-box/corpus/en.vcb -t /home/alvas/test-out-of-box/corpus/de.vcb
  /home/alvas/test-out-of-box/giza.de-en/de-en.A3.final.gz seems finished, reusing.
Waiting for second GIZA process...
(2.1b) running giza en-de @ Tue May 19 02:05:17 CEST 2015
/home/alvas/test-out-of-box/training-tools/mgiza  -CoocurrenceFile /home/alvas/test-out-of-box/giza.en-de/en-de.cooc -c /home/alvas/test-out-of-box/corpus/en-de-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -ncpus 4 -nodumps 1 -nsmooth 4 -o /home/alvas/test-out-of-box/giza.en-de/en-de -onlyaldumps 1 -p0 0.999 -s /home/alvas/test-out-of-box/corpus/de.vcb -t /home/alvas/test-out-of-box/corpus/en.vcb
  /home/alvas/test-out-of-box/giza.en-de/en-de.A3.final.gz seems finished, reusing.
(3) generate word alignment @ Tue May 19 02:05:17 CEST 2015
Combining forward and inverted alignment from files:
  /home/alvas/test-out-of-box/giza.en-de/en-de.A3.final.{bz2,gz}
  /home/alvas/test-out-of-box/giza.de-en/de-en.A3.final.{bz2,gz}
Executing: mkdir -p /home/alvas/test-out-of-box/model
Executing: /home/alvas/test-out-of-box/training/giza2bal.pl -d "gzip -cd /home/alvas/test-out-of-box/giza.de-en/de-en.A3.final.gz" -i "gzip -cd /home/alvas/test-out-of-box/giza.en-de/en-de.A3.final.gz" |/home/alvas/test-out-of-box/../bin/symal -alignment="grow" -diagonal="yes" -final="yes" -both="no" > /home/alvas/test-out-of-box/model/aligned.grow-diag-final
sh: 1: /home/alvas/test-out-of-box/training/giza2bal.pl: not found
sh: 1: /home/alvas/test-out-of-box/../bin/symal: not found
Exit code: 127
ERROR: Can't generate symmetrized alignment file

Also, $SCRIPTS_ROOTDIR seems to be controlling where train-model.perl finds the complimentary scripts. This is unavoidable, unless we allow $SCRIPTS_ROOTDIR to be customize-able but it will lead to a whole lot of other problems.

from mosesdecoder.

alvations avatar alvations commented on August 22, 2024

Solution: Use Moses scripts as they are compiled and installed normally.

Enlightenment: Training scripts don't work out of the box.

For more info: https://github.com/alvations/usaarhat-repo/blob/master/Align-A-Line.md

from mosesdecoder.

jtv avatar jtv commented on August 22, 2024

Use the ‘x’ permission bit on anything that you want to be able to execute. Strictly speaking if it's in your home directory you probably only need that permission for the file's owner (you), but the usual and simple thing is to allow it for all users.

So, to permit execution of a file, do::

chmod a+x $MYFILE

from mosesdecoder.

alvations avatar alvations commented on August 22, 2024

@jtv, thanks for the chmod permission solution!!! But the problems that comes after the permission is a little harder to resolve because it's closely tied to the pseudo-static path that train-model.perl tries to use.

from mosesdecoder.

goodmami avatar goodmami commented on August 22, 2024

Sorry to jump into a closed thread, but I'm having a similar issue and I'm not sure why this was closed. train-model.perl is failing to find symal because it's looking for "$SCRIPTS_ROOTDIR/../bin/symal" and not "$_EXTERNAL_BINDIR/symal" or even the symal in the Moses bin dir (which for me is not a sibling of $SCRIPTS_ROOTDIR). Here's the offending line: https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/train-model.perl#L466

I'm using EMS, and here are the relevant paths from my config file:

moses-src-dir = /NLP_TOOLS/mt_tools/moses/v3.0-release
moses-bin-dir = $moses-src-dir/bin
moses-script-dir = $moses-src-dir/src/scripts
external-bin-dir = /NLP_TOOLS/mt_tools/mgizapp/latest/bin

Note that the while the bin-dir is under $moses-src-dir/bin, the script-dir is another level lower ($moses-src-dir/src/scripts). This install is on my university's cluster and I don't have permissions to move things around.

Why does train-model.perl assume the bin-dir is a sibling to the script-dir when it has both the $moses-bin-dir and $external-bin-dir variables available?

from mosesdecoder.

alvations avatar alvations commented on August 22, 2024

@goodmami Last year, I ended up modifying the path in the train-model.perl to suit my machine. I've changed all the path to the binaries and path to other specific perl scripts with static path.

The assumption for my $SYMAL = "$SCRIPTS_ROOTDIR/../bin/symal"; is because it assumes that Moses is installed as per the instructions from http://www.statmt.org/moses/?n=Development.GetStarted such that the moses is installed with path like this:

alvas@ubi:~$ cd mosesdecoder/
alvas@ubi:~/mosesdecoder$ ls
biconcor                defer      mert       OnDiskPt            scripts
bin                     doc        mingw      phrase-extract      search
bjam                    jam-files  mira       previous.sh         symal
BUILD-INSTRUCTIONS.txt  Jamroot    misc       regression-testing  util
contrib                 lib        moses      sample-models       vw
cruise-control          lm         moses-cmd  sample-models.tgz
alvas@ubi:~/mosesdecoder$ cd scripts/
alvas@ubi:~/mosesdecoder/scripts$ ls
analysis     generic  other    regression-testing  tests      Transliteration
ems          Jamfile  README   server              tokenizer
fuzzy-match  OSM      recaser  share               training
alvas@ubi:~/mosesdecoder/scripts$ cd ../bin
alvas@ubi:~/mosesdecoder/bin$ ls
1-1-Extraction        filter                    processLexicalTable
biconcor              fragment                  processPhraseTable
build_binary          generateSequences         project-cache.jam
config.log            kbmira                    prunePhraseTable
consolidate           lexical-reordering-score  query
consolidate-direct    lmbrgrid                  queryLexicalTable
consolidate-reverse   lmplz                     queryOnDiskPt
CreateOnDiskPt        merge-sorted              queryPhraseTable
dump_counts           mert                      relax-parse
evaluator             mira                      score
extract               moses                     sentence-bleu
extract-ghkm          moses_chart               statistics
extract-lex           pcfg-extract              symal
extract-mixed-syntax  pcfg-score                TMining
extractor             phrase-lookup
extract-rules         pro

from mosesdecoder.

alvations avatar alvations commented on August 22, 2024

The TL;D12R way would be something like:

cd /path/to/
wget http://www.statmt.org/moses/RELEASE-3.0/binaries/linux-64bit/linux-64bit.tgz
tar zxvf linux-64bit.tgz
mv linux-64bit mosesdecoder
chmod a+x -R mosesdecoder

Since the scripts and EMS should not use the source directly, In the config file, you can do this:

moses-src-dir = /path/to/mosesdecoder
moses-bin-dir = $moses-src-dir/bin
moses-script-dir = $moses-src-dir/scripts
external-bin-dir = $moses-src-dir/training-tools

from mosesdecoder.

goodmami avatar goodmami commented on August 22, 2024

Thanks @alvations. I didn't install it myself, and our sysadmin claims to have followed the normal install. According to the link you provided (http://www.statmt.org/moses/?n=Development.GetStarted) (emphasis added):

--install-scripts=/path/to/scripts
copies scripts into a directory. Does not install if missing. No argument defaults to PREFIX/scripts.

Since the directory didn't exist as a sibling to the bindir, I'm guessing he didn't provide the --install-scripts option, which in the installation instructions is under "Popular additional bjam options" and not the "easy setup" heading. Even if the option is used, it's possible to provide a path that isn't the default, in which case the train-model.perl script would still fail because of the directory location assumption.

Anyway, we fixed that problem by symlinking the scripts directory at the expected location, but my original question still stands (emphasis added):

Why does train-model.perl assume the bin-dir is a sibling to the script-dir when it has both the $moses-bin-dir and $external-bin-dir variables available?

I think this hardcoding of the path assumption is a bug. I'd be happy to submit a PR, but I'm not sure what to fix. Maybe I'd just need to change whatever calls train-model.perl to provide the appropriate command-line options, but maybe I'd also need to change train-model.perl to actually use them?

Thanks!

from mosesdecoder.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.