Comments (16)
I get the same result. It consumes an increasing amount of memory until it runs out of it.
The first question is whether this is restricted to SMS, or is it a general issue?
from lang-sms.
First test using SMA gave no problems at all, it finished in about 3,5 minutes.
from lang-sms.
SMJ is also fine.
from lang-sms.
SMN is fine.
from lang-sms.
And SME is fine. Conclusion: this is a problem specific to SMS, and is most likely related to some details in the FST causing some sort of infinite loop.
from lang-sms.
Do we have tools to find such infinite loops, or do we need much manual investigation?
from lang-sms.
Manual investigation is the first step.
from lang-sms.
I am not convinced this is sms only. also sme does not compile the hyphenator-gt-desc.hfstol. But when it comes to sms (which now blocs an article in the writing), the message is the following (when compiling, asking for both hyphenators):
GEN area-tags.txt
GEN derivation-tags.txt
GEN usage-tags.txt
GEN semantic-tags.txt
GEN error-tags.txt
GEN dialect-tags.txt
Making all in phonetics
Making all in .
make[3]: Nothing to be done for `all-am'.
Making all in tests
make[3]: Nothing to be done for `all'.
Making all in hyphenation
make[2]: Nothing to be done for `all'.
Making all in orthography
make[2]: Nothing to be done for `all'.
Making all in cg3
make[2]: Nothing to be done for `all'.
Making all in transcriptions
make[2]: Nothing to be done for `all'.
Making all in tagsets
make[2]: Nothing to be done for `all'.
Making all in .
make[2]: Nothing to be done for `all-am'.
Making all in tools
Making all in tokenisers
Making all in filters
make[3]: Nothing to be done for `all'.
Making all in .
make[3]: Nothing to be done for `all-am'.
Making all in tests
make[3]: Nothing to be done for `all'.
Making all in analysers
Making all in .
make[3]: Nothing to be done for `all-am'.
Making all in shellscripts
make[2]: Nothing to be done for `all'.
Making all in spellcheckers
Making all in filters
make[3]: Nothing to be done for `all'.
Making all in weights
make[3]: Nothing to be done for `all'.
Making all in neural
make[3]: Nothing to be done for `all'.
Making all in .
make[3]: Nothing to be done for `all-am'.
Making all in hyphenators
Making all in filters
make[3]: Nothing to be done for `all'.
Making all in .
HXFST hyphenator-gt-desc-no_fallback.hfst
/bin/sh: line 1: 26206 Done /usr/bin/printf "read regex @\"hyphenator-gt-desc-input.hfst\" .o. @\"hyphenator-gt-desc-output.hfst\" ; \n save stack hyphenator-gt-desc-no_fallback.hfst\n quit\n"
26207 Killed: 9 | /usr/local/bin/hfst-xfst -p -q
make[3]: *** [hyphenator-gt-desc-no_fallback.hfst] Error 137
from lang-sms.
It seems that the make
dependencies do not carry through the whole compilation process. I changed the smn file src/hyphenation/hypheniation.xfscript
on March 10th. Part of the content in tools/hyphenators
is updated accordingly, but not the crucial hyphenator-gt-desc.hfstol
:
> uit-mac-443:lang-smn ttr000$ lt tools/hyphenators/
total 187272
-rw-r--r-- 1 ttr000 staff 39 13 mar 10:38 hyph_smn.dic
-rw-r--r-- 1 ttr000 staff 0 13 mar 10:38 smn.pat
-rw-r--r-- 1 ttr000 staff 0 13 mar 10:38 smn_hyph.tex
-rw-r--r-- 1 ttr000 staff 270428 13 mar 10:38 hyphenated-fst-wordlist.txt
-rw-r--r-- 1 ttr000 staff 15288903 13 mar 10:37 hyphenator-gt-desc-no_fallback.hfst
-rw-r--r-- 1 ttr000 staff 13317803 13 mar 10:37 hyphenator-gt-desc-output.hfst
drwxr-xr-x 14 ttr000 staff 448 10 mar 11:22 filters
-rw-r--r-- 1 ttr000 staff 43778 10 mar 11:22 Makefile
-rw-r--r-- 1 ttr000 staff 11091478 10 mar 08:54 hyphenator-gt-desc-input.hfst
-rw-r--r-- 1 ttr000 staff 5897456 10 mar 08:54 hyphenator-raw-gt-desc.hfst
-rw-r--r-- 1 ttr000 staff 5897582 10 mar 08:54 hyphenator-raw-gt-desc.tmp.hfst
-rw-r--r-- 1 ttr000 staff 6628071 10 mar 08:52 lexicon-gt-desc-tag_weighted_no_analysis.hfst
-rw-r--r-- 1 ttr000 staff 6628055 10 mar 08:52 lexicon-gt-desc-tag_weighted.hfst
-rw-r--r-- 1 ttr000 staff 6629002 10 mar 08:52 lexicon-gt-desc-clean.hfst
-rw-r--r-- 1 ttr000 staff 5490338 10 mar 08:52 lexicon-gt-desc.hfst
-rw-r--r-- 1 ttr000 staff 10297 10 mar 08:52 downcase-derived_proper-strings.compose.hfst
-rw-r--r-- 1 ttr000 staff 784 10 mar 08:52 all_tags.txt
-rw-r--r-- 1 ttr000 staff 43210 26 feb 11:34 Makefile.in
-rw-r--r-- 1 ttr000 staff 17773359 19 jan 19:30 hyphenator-gt-desc.hfstol
-rw-r--r-- 1 ttr000 staff 693 9 nov 09:18 tags.reweight
-rw-r--r-- 1 ttr000 staff 701 9 nov 09:18 smn.tra
-rw-r--r-- 1 ttr000 staff 347 9 nov 09:18 Makefile.modification-pattern.am
-rw-r--r-- 1 ttr000 staff 540 9 nov 09:18 Makefile.modification-fst.am
-rw-r--r-- 1 ttr000 staff 914 9 nov 09:18 Makefile.am
from lang-sms.
I get the same result. It consumes an increasing amount of memory until it runs out of it.
The first question is whether this is restricted to SMS, or is it a general issue?
@Trondtr the memory issue is definitely specific to SMS. Nothing in later comments have proved otherwise, on the contrary.
from lang-sms.
I changed one XFST-based singular compose in giella-core shared makefile rules and it compiles on my laptop now... if the results are now ok for most languages it may suggest that the xfst's flag diacritic composition algorithm is at fault wrt sms flag diacritics, or some other automatic maintenance function.
from lang-sms.
Builds fine for me as well. If it also builds for @Trondtr, then we can close this as fixed.
from lang-sms.
It does not work for sme (see below), but I do not know whether this is a different bug, it looks very different). For sms the jury is out (busy compiling, now running into the second hour on HXFST hyphenated-fst-wordlist.txt
).
While waiting for sms this is thus what sme gives us:
touch se_hyph.tex
cp -f se_hyph.tex se.pat
/Users/ttr000/git/giellalt/lang-sme/./../giella-core/scripts/patgen.exp \
/usr/local/bin/patgen se . \
"1 2" \
"2 4" \
"1 1 1" \
cleaned-hyphenated-fst-wordlist.txt se_hyph.tex
This is PATGEN, Version 2.4 (TeX Live 2022/Homebrew)
left_hyphen_min = 2, right_hyphen_min = 2, 56 letters
6236 patterns read in
pattern trie has 8958 nodes, trie_max = 15060, 73 outputs
hyph_start, hyph_finish: 1 2
Largest hyphenation value 3 in patterns should be less than hyph_start
pat_start, pat_finish: 2 4
good weight, bad weight, threshold: 1 1 1
processing dictionary with pat_len = 2, pat_dot = 1
BUO-RIT´á-rii-guin-ait-to
Bad character
expect: spawn id exp6 not open
while executing
"expect "good weight, bad weight, threshold: " { send -- "$gdbadthresh\r" }"
(file "/Users/ttr000/git/giellalt/lang-sme/./../giella-core/scripts/patgen.exp" line 18)
make[3]: *** [se_hyph.tex] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1
from lang-sms.
It is a different error in SME, and totally unrelated. The crucial message is:
Bad character
The error and how to solve it is described here.
This build step relates to the TeX/LO hyphenator, which comes after all FST-based hyphenation. It does thus prove that the FST hyphenation build works fine for SME.
from lang-sms.
For sms the jury is out (busy compiling, now running into the second hour on
HXFST hyphenated-fst-wordlist.txt
).
This also indicates that the build process has gone past the FST build phase, and entered the TeX/LO hyphenation build steps. It thus seems like the FST issue has been solved.
from lang-sms.
No further comments or counterarguments have popped up. Closing.
from lang-sms.
Related Issues (11)
- ./configure --with-hfst --without-xfst HOT 1
- xfst sms does not compile: doesn't find ProperNoun-smi- lexicons ( HOT 4
- Cannot compile sms with xfst ( HOT 7
- transcriptor-date-digit2text.lexc produces a working digit2text but not a text2digit analysis in hfst and xfst ( HOT 4
- Add missing affixes/acronyms.lexc HOT 2
- Error: The file lexicon.tmp.lexc did not compile cleanly HOT 3
- Wrong analysis for "10" HOT 2
- Speller is incomplete HOT 6
- System-wide speller suggests what it does not accept in TextEdit MacOS HOT 2
- 52 twolc test pairs with uneven strings ( HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lang-sms.