Giter Site home page Giter Site logo

Comments (16)

snomos avatar snomos commented on June 10, 2024

I get the same result. It consumes an increasing amount of memory until it runs out of it.

The first question is whether this is restricted to SMS, or is it a general issue?

from lang-sms.

snomos avatar snomos commented on June 10, 2024

First test using SMA gave no problems at all, it finished in about 3,5 minutes.

from lang-sms.

snomos avatar snomos commented on June 10, 2024

SMJ is also fine.

from lang-sms.

snomos avatar snomos commented on June 10, 2024

SMN is fine.

from lang-sms.

snomos avatar snomos commented on June 10, 2024

And SME is fine. Conclusion: this is a problem specific to SMS, and is most likely related to some details in the FST causing some sort of infinite loop.

from lang-sms.

trondtynnol avatar trondtynnol commented on June 10, 2024

Do we have tools to find such infinite loops, or do we need much manual investigation?

from lang-sms.

snomos avatar snomos commented on June 10, 2024

Manual investigation is the first step.

from lang-sms.

Trondtr avatar Trondtr commented on June 10, 2024

I am not convinced this is sms only. also sme does not compile the hyphenator-gt-desc.hfstol. But when it comes to sms (which now blocs an article in the writing), the message is the following (when compiling, asking for both hyphenators):

  GEN      area-tags.txt
  GEN      derivation-tags.txt
  GEN      usage-tags.txt
  GEN      semantic-tags.txt
  GEN      error-tags.txt
  GEN      dialect-tags.txt
Making all in phonetics
Making all in .
make[3]: Nothing to be done for `all-am'.
Making all in tests
make[3]: Nothing to be done for `all'.
Making all in hyphenation
make[2]: Nothing to be done for `all'.
Making all in orthography
make[2]: Nothing to be done for `all'.
Making all in cg3
make[2]: Nothing to be done for `all'.
Making all in transcriptions
make[2]: Nothing to be done for `all'.
Making all in tagsets
make[2]: Nothing to be done for `all'.
Making all in .
make[2]: Nothing to be done for `all-am'.
Making all in tools
Making all in tokenisers
Making all in filters
make[3]: Nothing to be done for `all'.
Making all in .
make[3]: Nothing to be done for `all-am'.
Making all in tests
make[3]: Nothing to be done for `all'.
Making all in analysers
Making all in .
make[3]: Nothing to be done for `all-am'.
Making all in shellscripts
make[2]: Nothing to be done for `all'.
Making all in spellcheckers
Making all in filters
make[3]: Nothing to be done for `all'.
Making all in weights
make[3]: Nothing to be done for `all'.
Making all in neural
make[3]: Nothing to be done for `all'.
Making all in .
make[3]: Nothing to be done for `all-am'.
Making all in hyphenators
Making all in filters
make[3]: Nothing to be done for `all'.
Making all in .
  HXFST    hyphenator-gt-desc-no_fallback.hfst
/bin/sh: line 1: 26206 Done                    /usr/bin/printf "read regex 			@\"hyphenator-gt-desc-input.hfst\" 		.o. @\"hyphenator-gt-desc-output.hfst\" 		; \n	 save stack hyphenator-gt-desc-no_fallback.hfst\n	 quit\n"
     26207 Killed: 9               | /usr/local/bin/hfst-xfst -p -q
make[3]: *** [hyphenator-gt-desc-no_fallback.hfst] Error 137

from lang-sms.

Trondtr avatar Trondtr commented on June 10, 2024

It seems that the make dependencies do not carry through the whole compilation process. I changed the smn file src/hyphenation/hypheniation.xfscript on March 10th. Part of the content in tools/hyphenators is updated accordingly, but not the crucial hyphenator-gt-desc.hfstol:

> uit-mac-443:lang-smn ttr000$ lt tools/hyphenators/
total 187272
-rw-r--r--   1 ttr000  staff        39 13 mar 10:38 hyph_smn.dic
-rw-r--r--   1 ttr000  staff         0 13 mar 10:38 smn.pat
-rw-r--r--   1 ttr000  staff         0 13 mar 10:38 smn_hyph.tex
-rw-r--r--   1 ttr000  staff    270428 13 mar 10:38 hyphenated-fst-wordlist.txt
-rw-r--r--   1 ttr000  staff  15288903 13 mar 10:37 hyphenator-gt-desc-no_fallback.hfst
-rw-r--r--   1 ttr000  staff  13317803 13 mar 10:37 hyphenator-gt-desc-output.hfst
drwxr-xr-x  14 ttr000  staff       448 10 mar 11:22 filters
-rw-r--r--   1 ttr000  staff     43778 10 mar 11:22 Makefile
-rw-r--r--   1 ttr000  staff  11091478 10 mar 08:54 hyphenator-gt-desc-input.hfst
-rw-r--r--   1 ttr000  staff   5897456 10 mar 08:54 hyphenator-raw-gt-desc.hfst
-rw-r--r--   1 ttr000  staff   5897582 10 mar 08:54 hyphenator-raw-gt-desc.tmp.hfst
-rw-r--r--   1 ttr000  staff   6628071 10 mar 08:52 lexicon-gt-desc-tag_weighted_no_analysis.hfst
-rw-r--r--   1 ttr000  staff   6628055 10 mar 08:52 lexicon-gt-desc-tag_weighted.hfst
-rw-r--r--   1 ttr000  staff   6629002 10 mar 08:52 lexicon-gt-desc-clean.hfst
-rw-r--r--   1 ttr000  staff   5490338 10 mar 08:52 lexicon-gt-desc.hfst
-rw-r--r--   1 ttr000  staff     10297 10 mar 08:52 downcase-derived_proper-strings.compose.hfst
-rw-r--r--   1 ttr000  staff       784 10 mar 08:52 all_tags.txt
-rw-r--r--   1 ttr000  staff     43210 26 feb 11:34 Makefile.in
-rw-r--r--   1 ttr000  staff  17773359 19 jan 19:30 hyphenator-gt-desc.hfstol
-rw-r--r--   1 ttr000  staff       693  9 nov 09:18 tags.reweight
-rw-r--r--   1 ttr000  staff       701  9 nov 09:18 smn.tra
-rw-r--r--   1 ttr000  staff       347  9 nov 09:18 Makefile.modification-pattern.am
-rw-r--r--   1 ttr000  staff       540  9 nov 09:18 Makefile.modification-fst.am
-rw-r--r--   1 ttr000  staff       914  9 nov 09:18 Makefile.am

from lang-sms.

snomos avatar snomos commented on June 10, 2024

I get the same result. It consumes an increasing amount of memory until it runs out of it.

The first question is whether this is restricted to SMS, or is it a general issue?

@Trondtr the memory issue is definitely specific to SMS. Nothing in later comments have proved otherwise, on the contrary.

from lang-sms.

flammie avatar flammie commented on June 10, 2024

I changed one XFST-based singular compose in giella-core shared makefile rules and it compiles on my laptop now... if the results are now ok for most languages it may suggest that the xfst's flag diacritic composition algorithm is at fault wrt sms flag diacritics, or some other automatic maintenance function.

from lang-sms.

snomos avatar snomos commented on June 10, 2024

Builds fine for me as well. If it also builds for @Trondtr, then we can close this as fixed.

from lang-sms.

Trondtr avatar Trondtr commented on June 10, 2024

It does not work for sme (see below), but I do not know whether this is a different bug, it looks very different). For sms the jury is out (busy compiling, now running into the second hour on HXFST hyphenated-fst-wordlist.txt).

While waiting for sms this is thus what sme gives us:

touch se_hyph.tex
cp -f se_hyph.tex se.pat
/Users/ttr000/git/giellalt/lang-sme/./../giella-core/scripts/patgen.exp \
			/usr/local/bin/patgen se . \
			"1 2" \
			"2 4" \
			"1 1 1" \
			cleaned-hyphenated-fst-wordlist.txt se_hyph.tex
This is PATGEN, Version 2.4 (TeX Live 2022/Homebrew)
left_hyphen_min = 2, right_hyphen_min = 2, 56 letters
6236 patterns read in
pattern trie has 8958 nodes, trie_max = 15060, 73 outputs
hyph_start, hyph_finish: 1 2
Largest hyphenation value 3 in patterns should be less than hyph_start
pat_start, pat_finish: 2 4
good weight, bad weight, threshold: 1 1 1
processing dictionary with pat_len = 2, pat_dot = 1
BUO-RIT´á-rii-guin-ait-to                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
Bad character
expect: spawn id exp6 not open
    while executing
"expect "good weight, bad weight, threshold: " { send -- "$gdbadthresh\r" }"
    (file "/Users/ttr000/git/giellalt/lang-sme/./../giella-core/scripts/patgen.exp" line 18)
make[3]: *** [se_hyph.tex] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1

from lang-sms.

snomos avatar snomos commented on June 10, 2024

It is a different error in SME, and totally unrelated. The crucial message is:

Bad character

The error and how to solve it is described here.

This build step relates to the TeX/LO hyphenator, which comes after all FST-based hyphenation. It does thus prove that the FST hyphenation build works fine for SME.

from lang-sms.

snomos avatar snomos commented on June 10, 2024

For sms the jury is out (busy compiling, now running into the second hour on HXFST hyphenated-fst-wordlist.txt).

This also indicates that the build process has gone past the FST build phase, and entered the TeX/LO hyphenation build steps. It thus seems like the FST issue has been solved.

from lang-sms.

snomos avatar snomos commented on June 10, 2024

No further comments or counterarguments have popped up. Closing.

from lang-sms.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.