Giter Site home page Giter Site logo

Comments (5)

tsproisl avatar tsproisl commented on July 28, 2024

I've been unhappy with that for a long time. There are mainly two reasons why I haven't changed the model format, so far:

  1. While it is annoying, it does not seem to be a problem in practice as most people seem to have enough RAM (or do not complain if they haven't).
  2. I don't want to render any existing models useless.

Of course, dealing with 2 is just a matter of making the tagger recognize the format and handle the model file appropriately. It's just that it hasn't been a top priority for me.

from someweta.

ianroberts avatar ianroberts commented on July 28, 2024

Sure. It has only become an issue for me because I'm working with a project that wants to expose a web service based on your tagger in a platform that uses Kubernetes. I need to apply memory limits to the pod definitions but for this service I have to make the pod request 4GB even though it only needs 1.7GB after the startup phase.

For this particular use case I've developed a workaround where I transform the model into a gzipped pickle format file, which is quite a bit larger than the original gzipped JSON but loads faster and with virtually no additional memory overhead. However it occurred to me today that it's actually possible to implement a more efficient streaming load of the current model format using ijson, I can submit a PR for this if you like?

from someweta.

tsproisl avatar tsproisl commented on July 28, 2024

Ah, the ijson solution is nice! A PR would be most welcome. The only thing that needs to be taken into account is that this will produce garbage on Python versions <3.7 that have ijson installed. I see two possible solutions: Either always fall back to the standard parser for these older versions or use a collections.OrderedDict instead of a dict if the version is <3.7.

from someweta.

ianroberts avatar ianroberts commented on July 28, 2024

PR submitted - I've made it use the optimised algorithm on CPython 3.6+ or (any) Python 3.7+, which are the ones where dict iteration order is guaranteed, and fall back to the original algorithm on earlier versions.

from someweta.

tsproisl avatar tsproisl commented on July 28, 2024

Thank you! I've updated the README and created a new release.

from someweta.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.