Giter Site home page Giter Site logo

Comments (4)

anttttti avatar anttttti commented on July 28, 2024

Of course. Thanks for the offer.

I'll do a code clean up after the Mercari competition closes this week. This includes fixing bugs, and removing differences to sklearn API, for example how .fit() and .transform() behave. It would be a good time to document the code at that point.

from wordbatch.

JanzenLiu avatar JanzenLiu commented on July 28, 2024

Sure thanks. I would also start working on this after the competition ends. Besides, Are you interested in adding BM25 module? i.e. similar to WordBag but the score of each word for each document is BM25 score, it seems better than simple TF-IDF.

from wordbatch.

anttttti avatar anttttti commented on July 28, 2024

We can do this, it won't be difficult to add. Average document lengths need to be tracked for the BM25 document length normalization, but that's all. BM25 IDF is applied the same as in TF-IDF, it's only the term+length normalization that works different.

Argument options for TF-IDF/BM25 configurations can get messy, so those need to be sorted out. One option would be refactoring the normalizations into a separate class, that could be used on any set of sparse matrix features.

from wordbatch.

JanzenLiu avatar JanzenLiu commented on July 28, 2024

That's great. By the way how can I contact you except through Github? I will be better if we can contact more instantly. My email is [email protected], and if you use Facebook or something else it will also be okay for me.

from wordbatch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.