Giter Site home page Giter Site logo

Changes proposal about pymorfologik HOT 3 CLOSED

dmirecki avatar dmirecki commented on September 27, 2024
Changes proposal

from pymorfologik.

Comments (3)

dmirecki avatar dmirecki commented on September 27, 2024

Thanks for your post!

I think that Python2 compatibility is very good idea. Sorry that I didn't care about Python2 earlier but pyMorfologik was a part of bigger application written in Python3.

I have an idea how to reconcile "list of tuples" with "dictionary". I propose to create an abstract class Parser with abstract method parse. Then create a method get_stem in Morfologik class, which expect 2 parameters: list of words and instance of parser. So get_stem method will look like:

def get_stem(self, words, parser):
    words = self._make_unique(words)
    output = self._run_morfologik(words)
    return parser.parse(output)

Thus you can create a "to list of tuples" parser extending abstract parser and we can accomodote existing code to "to dict" parser. Moreover I would like to create (in the near future) a parser which care about information like category of word (verb/adj/...), gender, etc. And naturally way to do it, will be just extend abstract parser and make some code.

What do you think about it?

Anyway, I think that pull request is good idea.

Damian

from pymorfologik.

adibo avatar adibo commented on September 27, 2024

Hi!
(I realized we could discuss things in Polish, but I guess it's good to keep things readable for the rest of the world too)

Good idea!. I'll make the changes and submit it via a pull request soon.

Yet there would be few minor changes from my side:

  1. to shift _make_unique() from within get_stem() of Morfologik class to a Parser class;
    For my application I would need all text to be stemmed, with some words repeating (and thus with repeating stems). So _make_unique() is not desired for my purposes. I hope that's okey (and it is in line with the morfologik way of outputing things anyway. Still, one could remove repeating words (and hence stems) in a parser if one wishes so, or by simply reducing (via set()?) a list of words supplied as an input
  2. let's rename get_stem() to stem() => even simpler name; it is also a verb so it will play nicely here
  3. let's add **kwargs to stem(), so some additional parameters could be used if needed

I'll submit my proposal soon.

Regards,
Adrian

from pymorfologik.

adibo avatar adibo commented on September 27, 2024

Thanks for the merge. Wouldn't this be a good time to release a new pip package?
I would say it would! :)

from pymorfologik.

Related Issues (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.