Giter Site home page Giter Site logo

greek_stemmer's Introduction

GreekStemmer

A simple Greek stemmer algorithm.

This algorithm is based on this paper from George Ntais.

Installation

Add this line to your application's Gemfile:

gem 'greek_stemmer'

And then execute:

$ bundle

Or install it yourself as:

$ gem install greek_stemmer

Usage

In order to use this stemmer you should normalize input. Normalization means two things for this algorithm: detone and upcase.

  require 'greek_stemmer'

  GreekStemmer.stem("ΠΟΣΟΤΗΤΑ") # => "ΠΟΣΟΤΗΤ"

References

Credits

Original work: bandito

Contributing

  1. Fork it ( http://github.com//greek_stemmer/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Perform changes and run bundle exec rake update_greek_stemming_sample to update the stemming samples
  4. Commit your changes (git commit -a)
  5. Push to the branch (git push origin my-new-feature)
  6. Create new Pull Request

License

greek_stemmer is licensed under MIT License. See LICENSE for details.

greek_stemmer's People

Contributors

chief avatar greenonion avatar pharlez avatar bandito avatar ptheof avatar lovemeblender avatar m-peter avatar

Stargazers

Nick Bassiliades avatar Krzysiek H avatar Stavros Mach avatar Giorgos Balaouras avatar Thanos Koutroubas avatar  avatar DM avatar Peter Sideris avatar Spyros Kounoupidis avatar  avatar Daniel avatar Alexandros Maragkakis avatar Manolis avatar Periklis Papanikolaou avatar Jason Tragakis avatar Alexander Alepis avatar Kostas Gkoutis avatar Antonios Marios Christonasis avatar Arampatzis Georgios avatar Ilias D. avatar John Koumentis avatar Harry Kaonis avatar  avatar Dimitris Gkoulis avatar Dimitris Athanasiadis avatar  avatar STYLIANOS IORDANIS avatar Giannis Stergiou avatar Angelos Orfanakos avatar Panagiotis Tigas avatar Vassilios Karakoidas avatar Lazarus Lazaridis avatar Spyros Brilis avatar Konstantinos Alexiou avatar Kostis Karantias avatar David Gero avatar Kyriakos Sideris avatar Costas Pantazis avatar Iason Asimakopoulos avatar Vasileios Paraskevas avatar Savvas Alexandrou avatar Michalis Zabaras avatar Dimitris Zorbas avatar Jeff Kereakoglow avatar  avatar Andreas Loupasakis avatar Dimitris Klisiaris avatar George Chatzimanolis avatar Pantelis Koukousoulas avatar John Deliyiannis avatar Mpampis Kostas avatar  avatar  avatar Aggelos Avgerinos avatar

Watchers

 avatar Marko Manninen avatar Andreas Loupasakis avatar Stratos Paraskevaidis avatar Yiannis Tsiouris avatar Dimitris Karteris avatar Marco Mangia avatar Kostis Fardelas avatar George Papanikolaou avatar Tasos Stathopoulos avatar James Cloos avatar Mpampis Kostas avatar George Kyrgiazos avatar George Psarakis avatar Nikos Maounis avatar  avatar Nikos Tsaganos avatar Maria Kousta avatar Apostolis Taxidaridis avatar Dimitris Krestos avatar Otto Antoniou avatar le0nidas avatar Giorgos Mitsis avatar Konstantinos Kostis avatar Dimitris Kotsakos avatar Panagiotis Papantonakis avatar Filippia Zikou avatar Dimitris Fousteris avatar Tassos Maurides avatar Christos Mantas avatar ChrisMpitzios avatar Konstantinos Skarmoutsos avatar Harlock avatar Alkis Kalogeris avatar John Torakis avatar Stathis Aliprandis avatar  avatar Elias Zolotas avatar delxen avatar  avatar Kyriakos Sideris avatar Alexis Argiriou avatar Sofia Kritikou avatar Christos Vrachas avatar gokussj avatar  avatar Alex Bratsos avatar John Kapantzakis avatar Konstantina Panagiotopoulou avatar Krzysiek H avatar Apostolis Stergiou avatar Nikos Manoloudis avatar Leonidas Palyvas avatar Kostas Kolivas avatar Christos Melas avatar Mike avatar  avatar

greek_stemmer's Issues

Encoding problems

Ειχαμε ενα προβλημα με τις κωδικοποιησεις:

Traceback (most recent call last):
File "E:/Python/classification/scripts/nsk_fit.py", line 31, in
stemmer = gr_stemm.GreekStemmer()
File "χ\Anaconda3\lib\site-packages\greek_stemmer_init_.py", line 35, in init
custom_rules = self.load_settings()
File "χ\Anaconda3\lib\site-packages\greek_stemmer_init_.py", line 340, in load_settings
custom_rules = yaml.load(f.read())
File "χ\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 72: character maps to

Το προβλήμα λυνεται ως εξης ( στη περιπτωση μας):
γραμμη 339, file: init.py:
os.path.dirname(file), 'stemmer.yml'), 'r', encoding="utf8") as f:

ΑΔ-ΗΣ # ΑΔ-Ω

Probably, there is a problem here: giving the words ΑΔΩ and ΑΔΗΣ, the stemming process has as output the stem ΑΔ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.