aurelian / ruby-stemmer Goto Github PK
View Code? Open in Web Editor NEWExpose libstemmer_c to Ruby
Home Page: http://locknet.ro/archive/2009-10-29-ann-ruby-stemmer.html
License: MIT License
Expose libstemmer_c to Ruby
Home Page: http://locknet.ro/archive/2009-10-29-ann-ruby-stemmer.html
License: MIT License
Hi,
make clean all
fails with with log as follows:
rm -f stemwords *.o src_c/*.o runtime/*.o libstemmer/*.o
cc -Iinclude -fPIC -c -o src_c/stem_UTF_8_latin.o src_c/stem_UTF_8_latin.c
cc -Iinclude -fPIC -c -o src_c/stem_UTF_8_danish.o src_c/stem_UTF_8_danish.c
cc -Iinclude -fPIC -c -o src_c/stem_UTF_8_dutch.o src_c/stem_UTF_8_dutch.c
cc -Iinclude -fPIC -c -o src_c/stem_UTF_8_english.o src_c/stem_UTF_8_english.c
cc -Iinclude -fPIC -c -o src_c/stem_UTF_8_finnish.o src_c/stem_UTF_8_finnish.c
cc -Iinclude -fPIC -c -o src_c/stem_UTF_8_french.o src_c/stem_UTF_8_french.c
cc -Iinclude -fPIC -c -o src_c/stem_UTF_8_german.o src_c/stem_UTF_8_german.c
cc -Iinclude -fPIC -c -o src_c/stem_UTF_8_hungarian.o src_c/stem_UTF_8_hungarian.c
cc -Iinclude -fPIC -c -o src_c/stem_UTF_8_italian.o src_c/stem_UTF_8_italian.c
cc -Iinclude -fPIC -c -o src_c/stem_UTF_8_norwegian.o src_c/stem_UTF_8_norwegian.c
cc -Iinclude -fPIC -c -o src_c/stem_UTF_8_porter.o src_c/stem_UTF_8_porter.c
cc -Iinclude -fPIC -c -o src_c/stem_UTF_8_portuguese.o src_c/stem_UTF_8_portuguese.c
cc -Iinclude -fPIC -c -o src_c/stem_UTF_8_romanian.o src_c/stem_UTF_8_romanian.c
cc -Iinclude -fPIC -c -o src_c/stem_UTF_8_russian.o src_c/stem_UTF_8_russian.c
cc -Iinclude -fPIC -c -o src_c/stem_UTF_8_spanish.o src_c/stem_UTF_8_spanish.c
cc -Iinclude -fPIC -c -o src_c/stem_UTF_8_swedish.o src_c/stem_UTF_8_swedish.c
cc -Iinclude -fPIC -c -o src_c/stem_UTF_8_turkish.o src_c/stem_UTF_8_turkish.c
cc -Iinclude -fPIC -c -o runtime/api.o runtime/api.c
cc -Iinclude -fPIC -c -o runtime/utilities.o runtime/utilities.c
cc -Iinclude -fPIC -c -o libstemmer/libstemmer_utf8.o libstemmer/libstemmer_utf8.c
ar -cru libstemmer.o src_c/stem_UTF_8_latin.o src_c/stem_UTF_8_danish.o src_c/stem_UTF_8_dutch.o src_c/stem_UTF_8_english.o
src_c/stem_UTF_8_finnish.o src_c/stem_UTF_8_french.o src_c/stem_UTF_8_german.o src_c/stem_UTF_8_hungarian.o src_c/stem_UTF_8
_italian.o src_c/stem_UTF_8_norwegian.o src_c/stem_UTF_8_porter.o src_c/stem_UTF_8_portuguese.o src_c/stem_UTF_8_romanian.o
src_c/stem_UTF_8_russian.o src_c/stem_UTF_8_spanish.o src_c/stem_UTF_8_swedish.o src_c/stem_UTF_8_turkish.o runtime/api.o ru
ntime/utilities.o libstemmer/libstemmer_utf8.o
cc -o stemwords examples/stemwords.o libstemmer.o
examples/stemwords.o: file not recognized: File format not recognized
collect2: error: ld returned 1 exit status
make: *** [stemwords] Error 1
on Ubuntu
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty
When I updated libstemmer_c
to the one from https://github.com/snowballstem/snowball/tree/master, everything compiled and works as expected.
P.S. I updated everything except latin
algorithm. master
version produces noun-form and verb-form as it stated at https://lists.tartarus.org/mailman/private/snowball-discuss/2017-June/001613.html . Since you hold outdated version of libstemmer_c
, I suspect latin is from older version (that returned noun-form only) without your extra edits, isn't it?!
Could you possibly properly update your libstemmer_c
version to snowball's master
?
Hi,
It seems the version.rb file is missing from the latest release (0.9.4) causing this sort of thing:
be rake tmp:clear --trace
rake aborted!
LoadError: cannot load such file -- lingua/version
Confirmed by downloading the gem then:
$ gem unpack ruby-stemmer-0.9.4.gem
$ cd ruby-stemmer-0.9.4
$ tree lib
lib
└── lingua
└── stemmer.rb
Thanks!
Hi.
Compilation on mac with rvm is still not working right.
There's 2 problems with the following code.
To test it, make sure ARCHFLAGS is set first, e.g
ARCHFLAGS="-arch x86_64" gem install ruby-stemmer
The result of file on Mac OS X 10.6.7 with rvm 1.6.5 is
/Users/mmullis/.rvm/rubies/ruby-1.9.2-p180/bin/ruby: Mach-O 64-bit executable
/Users/mmullis/.rvm/rubies/ruby-1.8.7-p334/bin/ruby: Mach-O 64-bit executable
This is why we need to specify ARCHFLAGS and not rely on the detection code.
If you're geting something after "executable" for the last one, there must be a difference in either how the ruby is getting built or the signature files that libmagic uses.
thanks,
michael.
Is possible to somehow add Czech stemmer? I realy dont know how.
Link below includes stem_UTF_8_czech.o stem_UTF_8_czech.h stem_UTF_8_czech.c files
Will be nice to have latin support when dealing with scientific biological names.
Latin stemmer details: http://snowball.tartarus.org/otherapps/schinke/intro.html
Dear Aurelian,
I've recently added your project to our RubyNLP list: https://github.com/arbox/nlp-with-ruby
I wonder if you want to participate in the Ruby for NLP network. You could do this in a very simple step by adding the rubynlp
topic to your GitHub repository.
Thank you for the project!
I expected "simply" and "simple" to have the same stem, but they do not:
> s = Lingua::Stemmer.new
> s.stem('simply')
=> "simpli"
> s.stem('simple')
=> "simpl"
Lingua.stemmer('grüner', language: :de)
Results in:
"grun"
Should be:
"grün"
I've seen the error "not available for stemming in " for unsupported languages. Is there a way to detect all supported languages, or test if a language is supported, so this exception can be prevented?
(I guess I'll just hardcode the list for now based on https://github.com/aurelian/ruby-stemmer/tree/master/libstemmer_c/src_c)
Hi,
I have the rules for bulgarian stemmer.
Don't really have the resources to put into coding this in C.
I can give the rules if someone is willing to write it in C. They are the work of P.Nakov.
Regards,
Yavor
gem 'ruby-stemmer', '>=0.8.3', :lib => 'lingua/stemmer' should be
gem 'ruby-stemmer', '>=0.8.3', :require => 'lingua/stemmer'
OpenSolaris:
ar: libstemmer.o not in archive format.
To make it work:
export CC=/usr/bin/gcc
cd /opt/src/
git clone git://github.com/aurelian/ruby-stemmer.git
cd ruby-stemmer
/opt/bin/ruby extconf.rb
cd libstemmer_c
make
ar -cru libstemmer_foo.o stem_ISO_8859_1_danish.o stem_UTF_8_danish.o stem_ISO_8859_1_dutch.o stem_UTF_8_dutch.o stem_ISO_8859_1_english.o stem_UTF_8_english.o stem_ISO_8859_1_finnish.o stem_UTF_8_finnish.o stem_ISO_8859_1_french.o stem_UTF_8_french.o stem_ISO_8859_1_german.o stem_UTF_8_german.o stem_ISO_8859_1_hungarian.o stem_UTF_8_hungarian.o stem_ISO_8859_1_italian.o stem_UTF_8_italian.o stem_ISO_8859_1_norwegian.o stem_UTF_8_norwegian.o stem_ISO_8859_1_porter.o stem_UTF_8_porter.o stem_ISO_8859_1_portuguese.o stem_UTF_8_portuguese.o stem_ISO_8859_2_romanian.o stem_UTF_8_romanian.o stem_KOI8_R_russian.o stem_UTF_8_russian.o stem_ISO_8859_1_spanish.o stem_UTF_8_spanish.o stem_ISO_8859_1_swedish.o stem_UTF_8_swedish.o stem_UTF_8_turkish.o api.o utilities.o libstemmer.o
cp libstemmer_foo.o libstemmer.o
cd ..
make
/opt/bin/ruby test.rb
make install
I've tried all the three installations for Windows, but it is still the same.
Temporarily enhancing PATH for MSYS/MINGW...
Building native extensions. This could take a while...
ERROR: Error installing ruby-stemmer:
ERROR: Failed to build gem native extension.
current directory: C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/ruby-stemmer-3.0.0/ext/lingua
C:/Ruby27-x64/bin/ruby.exe -I C:/Ruby27-x64/lib/ruby/2.7.0 -r ./siteconf20201206-11216-1swnql7.rb extconf.rb
The filename, directory name, or volume label syntax is incorrect.
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of necessary
libraries and/or headers. Check the mkmf.log file for more details. You may
need configuration options.
Provided configuration options:
--with-opt-dir
--without-opt-dir
--with-opt-include
--without-opt-include=${opt-dir}/include
--with-opt-lib
--without-opt-lib=${opt-dir}/lib
--with-make-prog
--without-make-prog
--srcdir=.
--curdir
--ruby=C:/Ruby27-x64/bin/$(RUBY_BASE_NAME)
extconf failed, exit code 1
Gem files will remain installed in C:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/ruby-stemmer-3.0.0 for inspection.
Results logged to C:/Ruby27-x64/lib/ruby/gems/2.7.0/extensions/x64-mingw32/2.7.0/ruby-stemmer-3.0.0/gem_make.out
Using
gem install ruby-stemmer
on Windows 10 returns this error code:
Microsoft Windows [Version 10.0.19043.1586]
C:\Users\username>gem install ruby-stemmer
Temporarily enhancing PATH for MSYS/MINGW...
Building native extensions. This could take a while...
ERROR: Error installing ruby-stemmer:
ERROR: Failed to build gem native extension.
current directory: C:/Ruby30-x64/lib/ruby/gems/3.0.0/gems/ruby-stemmer-3.0.0/ext/lingua
C:/Ruby30-x64/bin/ruby.exe -I C:/Ruby30-x64/lib/ruby/3.0.0 -r ./siteconf20220316-10816-o9w6w1.rb extconf.rb
The filename, directory name, or volume label syntax is incorrect.
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of necessary
libraries and/or headers. Check the mkmf.log file for more details. You may
need configuration options.
Provided configuration options:
--with-opt-dir
--without-opt-dir
--with-opt-include
--without-opt-include=${opt-dir}/include
--with-opt-lib
--without-opt-lib=${opt-dir}/lib
--with-make-prog
--without-make-prog
--srcdir=.
--curdir
--ruby=C:/Ruby30-x64/bin/$(RUBY_BASE_NAME)
extconf failed, exit code 1
Gem files will remain installed in C:/Ruby30-x64/lib/ruby/gems/3.0.0/gems/ruby-stemmer-3.0.0 for inspection.
Results logged to C:/Ruby30-x64/lib/ruby/gems/3.0.0/extensions/x64-mingw32/3.0.0/ruby-stemmer-3.0.0/gem_make.out
None of the above configuration options resolves the issue.
The "gem_make.out" file contains the same error code.
Trying the recommended Windows options
gem install ruby-stemmer --platform=x86-mingw32
and
gem install ruby-stemmer --platform=x86-mswin32
returns the same error.
My ruby version is
ruby 3.0.3p157 (2021-11-24 revision 3fb7d2cadc) [x64-mingw32]
In ruby 1.9, by default the result of stem is encoded with ASCII-8BIT, this should be changed to use the specified encoding (or maybe the default string encoding).
How to reproduce:
# encoding: utf-8 require 'lingua/stemmer' s= Lingua::Stemmer.new(:language => "ro") result = s.stem("așezare") puts "test".encoding # => UTF-8 puts s.encoding # => UTF_8 puts result.encoding #=> ASCII-8BIT
Workaround:
result.force_encoding "utf-8" puts result.encoding #=> "UTF-8"
Compiler warning:
../../../../ext/lingua/stemmer.c: In function 'rb_stemmer_stem':
../../../../ext/lingua/stemmer.c:86: warning: implicit conversion shortens 64-bit value into a 32-bit value
Environment:
Hi, I think this behaviour is inconsistent:
Lingua.stemmer(["installation"])
"instal" # string
Lingua.stemmer(["installation", "installation"])
["instal", "instal"] # not a string!
It makes it hard to treat arbitrary strings as some post-processing is necessary to neutralise the return data type. I would suggest the first example should yield ["instal"].
Old versions on MacOS X will fail to build the gem as ARCHFLAGS will report x86_64.
Output from 10.4:
[...] cc -Iinclude -fPIC -arch x86_64 -c -o runtime/utilities.o runtime/utilities.c cc -Iinclude -fPIC -arch x86_64 -c -o libstemmer/libstemmer.o libstemmer/libstemmer.c [...] libstemmer/libstemmer.o checking for libstemmer.h... yes creating Makefile make gcc -I. -I/W/lib/ruby/1.8/i686-darwin8.11.1 -I/W/lib/ruby/1.8/i686-darwin8.11.1 -I. -DHAVE_LIBSTEMMER_H -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE -fno-common -g -O2 -pipe -fno-common -I/W/lib/ruby/gems/1.8/gems/ruby-stemmer-0.6.3/libstemmer_c/include -c stemmer.c cc -dynamic -bundle -undefined suppress -flat_namespace -o stemmer_native.bundle stemmer.o -L. -L/W/lib -L. -L/W/lib/ruby/gems/1.8/gems/ruby-stemmer-0.6.3/libstemmer_c /W/lib/ruby/gems/1.8/gems/ruby-stemmer-0.6.3/libstemmer_c/libstemmer.o -lpthread -ldl -lobjc /usr/bin/ld: truncated or malformed archive: /W/lib/ruby/gems/1.8/gems/ruby-stemmer-0.6.3/libstemmer_c/libstemmer.o (ranlib structures in table of contents extends past the end of the table of contents, can't load from it) collect2: ld returned 1 exit status make: *** [stemmer_native.bundle] Error 1
Possible solution will be to detect ruby arch with file which ruby
.
When building the gem on FreeBSD, the gem fails because it uses 'make' instead of 'gmake'. Using 'gmake' instead makes it work just fine.
Use uname as a test to see which operating system it is.
Is it possible to add jruby support with minimal changes?
for 0.8.2 Mac OS X installation fails using
gem install ruby-stemmer
in case of badly compiled libstemmer.o
you have two bugs in extconf.rb where you attempt to determine arch for macs
I haven't tried to fix determination logic but fixed the mistake which blocked ARCH forced by external param. You need to change
unless ENV['ARCHFLAGS'].nil?
to
if ENV['ARCHFLAGS'].nil?
this allows you to start process correctly via ARCHFLAGS='-arch x86_64' gem install ruby-stemmer
but to finish it ok you need to use ranlib for libstemmer.o
so, I've added
if RUBY_PLATFORM =~ /darwin/
system "ranlib #{File.expand_path(File.join(LIBSTEMMER, 'libstemmer.o'))}"
end
after "#{make} libstemmer.o" case and built with this fixes gem was successfully installed under rvmed 1.9.2 ruby on Mac OS X 10.6 via "ARCHFLAGS='-arch x86_64' gem install ruby-stemmer" command
please investigate, and probably add these fixes to next version
Currently, the bundled library libstemmer_c fails to build using the same architecture as the ruby lib was build.
Somehow, in the libstemmer Makefile, we need to detect for what ARCH ruby was build and use the same for it.
Poof of concept:
stemmer = Lingua::Stemmer.new(:language => 'spanish', :encoding => 'UTF-8')
stemmer.stem('piano')
=> "pian"
Creo que entiendes español ya que vives en Barcelona, creo que lo que está fallando es el port interno, o la parte que viene escrita en C
On OSX Snow Leopard and Ruby-1.8.7 p72 compilation fails with the following message:
ld: in /Users/christian/.rvm/gems/ruby/1.8.7/gems/ruby-stemmer-0.6.4/libstemmer_c/libstemmer.o, archive has no table of contents
collect2: ld returned 1 exit status
make: *** [stemmer_native.bundle] Error 1
The compile commands:
make
gcc -I. -I/Users/christian/.rvm/ruby-1.8.7-tv1_8_7_72/lib/ruby/1.8/i686-darwin10.0.0 -I/Users/christian/.rvm/ruby-1.8.7-tv1_8_7_72/lib/ruby/1.8/i686-darwin10.0.0 -I. -DHAVE_LIBSTEMMER_H -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE -fno-common -isysroot /Developer/SDKs/MacOSX10.6.sdk -arch x86_64 -pipe -fno-common -I/Users/christian/.rvm/gems/ruby/1.8.7/gems/ruby-stemmer-0.6.4/libstemmer_c/include -c stemmer.c
cc -dynamic -bundle -undefined suppress -flat_namespace -o stemmer_native.bundle stemmer.o -L. -L/Users/christian/.rvm/ruby-1.8.7-tv1_8_7_72/lib -L. -Wl,-syslibroot /Developer/SDKs/MacOSX10.6.sdk -arch x86_64 -L/Users/christian/.rvm/gems/ruby/1.8.7/gems/ruby-stemmer-0.6.4/libstemmer_c /Users/christian/.rvm/gems/ruby/1.8.7/gems/ruby-stemmer-0.6.4/libstemmer_c/libstemmer.o -ldl -lobjc
The Sphinx makefile is able to compile libstemmer_c without problems on the same machine.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.