Giter Site home page Giter Site logo

giellalt / lang-sms Goto Github PK

View Code? Open in Web Editor NEW
4.0 24.0 0.0 115.21 MB

Finite state and Constraint Grammar based analysers and proofing tools, and language resources for the Skolt Sami language

Home Page: https://giellalt.uit.no

License: GNU Lesser General Public License v3.0

Makefile 0.46% Shell 0.60% M4 0.56% HTML 74.10% XSLT 0.11% Perl 0.08% Regular Expression 0.39% XML 0.03% YAML 1.77% Text 21.90%
finite-state-transducers constraint-grammar minority-language nlp language-resources proofing-tools indigenous-languages giellalt-langs maturity-beta geo-nordic

lang-sms's Issues

xfst sms does not compile: doesn't find ProperNoun-smi- lexicons (

This issue was created automatically with bugzilla2github

Bugzilla Bug 2517

Date: 2018-09-28T15:34:28+02:00
From: Lene Antonsen <<lene.antonsen>>
To: Jack Rueter <<rueter.jack>>
CC: ciprian.gerstenberger, sjur.n.moshagen, trond.trosterud

Last updated: 2018-10-03T11:54:47+02:00

Cannot compile hyphenator

I have tried to compile the hyphenator multiple times, both on a Mac and on two different Linux machines, but the process gets stuck when compiling hyphenator-gt-desc-no_fallback.hfst and is eventually killed.

Configuration: ./configure --enable-fst-hyphenator

Making all in .
make[3]: Entering directory `/home/trondtynnol/giellalt/lang-sms/tools/hyphenators'
  HFST2FST lexicon-gt-desc.hfst
  HXFST    lexicon-gt-desc-clean.hfst
  HREWGHT  lexicon-gt-desc-tag_weighted.hfst
  HPROJECT lexicon-gt-desc-tag_weighted_no_analysis.hfst
  HINTRSCT hyphenator-raw-gt-desc.tmp.hfst
  CP       hyphenator-raw-gt-desc.hfst
  HXFST    hyphenator-gt-desc-input.hfst
  HXFST    hyphenator-gt-desc-output.hfst
  HXFST    hyphenator-gt-desc-no_fallback.hfst
/bin/sh: line 5: 37051 Done                    /usr/bin/printf "read regex 			@\"hyphenator-gt-desc-input.hfst\" 		.o. @\"hyphenator-gt-desc-output.hfst\" 	; \n	 save stack hyphenator-gt-desc-no_fallback.hfst\n	 quit\n"
     37052 Killed                  | /usr/bin/hfst-xfst -p -q
make[3]: *** [hyphenator-gt-desc-no_fallback.hfst] Error 137
make[3]: Leaving directory `/home/trondtynnol/giellalt/lang-sms/tools/hyphenators'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/home/trondtynnol/giellalt/lang-sms/tools/hyphenators'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/trondtynnol/giellalt/lang-sms/tools'
make: *** [all-recursive] Error 1

When compiling with make V=1, this is the output of the last compilation before it hangs:

/usr/bin/printf "read regex \
@\"hyphenator-gt-desc-input.hfst\" \
.o. @\"hyphenator-gt-desc-output.hfst\" \
; \n\
save stack hyphenator-gt-desc-no_fallback.hfst\n\
quit\n" | /usr/local/bin/hfst-xfst -p -v
Using default output format OpenFst with tropical weight class
Using OpenFst's tropical weights as output
Reading from standard input...
warning: both composition arguments contain flag diacritics that are not harmonized

52 twolc test pairs with uneven strings (

This issue was created automatically with bugzilla2github

Bugzilla Bug 2155

Date: 2016-02-15T11:17:25+01:00
From: Sjur Nørstebø Moshagen <<sjur.n.moshagen>>
To: Jack Rueter <<rueter.jack>>
CC: trond.trosterud

Last updated: 2019-10-10T09:23:58+02:00

Error: The file lexicon.tmp.lexc did not compile cleanly

I get an error when trying to compile.
If I run make V=1, I get the following:

...
N_Prop_Toponyms_sms2x...400 Num_sms2x...91 Pcle_sms2x...31 Prefix_sms2x...36 Pron_sms2x...47 V_sms2x...5133 ABBR_sms2x...105 ACRO_sms2x...11 Punctuation...60 PunctEnd...1 Symbols...Compiling... Warning: Sublexicon is mentioned but not defined. (OY-sur) 
*** ERROR: could not parse lexc file: treating warnings as errors [--Werror] ***
/usr/local/bin/hfst-lexc: The file lexicon.tmp.lexc did not compile cleanly.
(if there are no error messages above, try -v or -d to get more info)
make[2]: *** [lexicon.tmp.hfst] Error 1
make[2]: *** Deleting file `lexicon.tmp.hfst'
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1

Cannot compile sms with xfst (

This issue was created automatically with bugzilla2github

Bugzilla Bug 2396

Date: 2017-05-29T11:40:03+02:00
From: Børre Gaup <<borre.gaup>>
To: Trond Trosterud <<trond.trosterud>>
CC: rueter.jack, sjur.n.moshagen

Last updated: 2017-10-22T07:33:24+02:00

Wrong analysis for "10"

I get the following analysis for "10":

echo 10|hfst-tokenise -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst
"<10>"
     Use/Circ"1" Use/Circ"0" Num Sg Acc <W:0.0>
     Use/Circ"1" Use/Circ"0" Num Sg Gen <W:0.0>
     Use/Circ"1" Use/Circ"0" Num Sg Ill Attr <W:0.0>
     Use/Circ"1" Use/Circ"0" Num Sg Loc Attr <W:0.0>
     Use/Circ"1" Use/Circ"0" Num Sg Nom <W:0.0>
     Use/Circ"10" Num Sg Acc <W:0.0>
     Use/Circ"10" Num Sg Nom <W:0.0>
:\n

I compiled today:

ls -l tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst
-rw-r--r--  1 car010  staff  112445830 May 21 11:58 tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst

Speller is incomplete

The speller (at least for sms) seems to lack some lemmas after the reorganizing.

$ husmsNorm
mieʹrreed
mieʹrreed	mieʹrreed+V+Imprt+Pl2	0.000000
mieʹrreed	mieʹrreed+V+Inf	0.000000
$ hfst-ospell sms.zhfst 
mieʹrreed
"mieʹrreed" is NOT in the lexicon:

./configure --with-hfst --without-xfst

When running
../configure --disable-hfst-desktop-spellers --enable-spellers --enable-fst-hyphenator --enable-hfst-mobile-speller
sms virtually requires: --with-hfst --without-xfst --enable-reversed-intersect
without specifically selecting hfst and deselecting xfst, and without reversed intersect,
the time required for a successful compilation of approx. 13 minutes may rise to nearly one hour.

System-wide speller suggests what it does not accept in TextEdit MacOS

On 13.5.1 Ventura the spell checker does not accept word forms like ‹säämas›, but its top suggestion is the very same ‹säämas› form. This is related to giellalt/lang-lut#4 and, most likely, giellalt/lang-lut#1

hfst-lookup src/fst/analyser-gt-norm.hfstol 
> säämas
säämas	säämas+Adv	0,000000
säämas	sääʹmm+N+Lat	0,000000

echo 'säämas' | hfst-tokenise -g tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst 
"<säämas>"
	"säämas" Adv <W:0.0>
	"sääʹmm" N Sem/Hum Lat <W:0.0>

hfst-ospell -S -n 5 tools/spellcheckers/sms.zhfst 
säämas
"säämas" is in the lexicon...

BUT, it is not accepted by the speller.

Screenshot 2024-01-30 at 12 18 13

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.