Giter Site home page Giter Site logo

Comments (5)

haifengl avatar haifengl commented on April 28, 2024

Hi Mike,

Thanks for the bug report! I would like to do c and a. The "training bag" is actually not for training. It is the feature list, and thus should be a set of unique words. But for sanity check, we should check the duplicates in the constructor. I will do these right now. Can you please add a unit test (in the smile.feature.BagTest of test directory) with your test case? Thanks!

Haifeng

from smile.

haifengl avatar haifengl commented on April 28, 2024

Hi Mike,

The fix is in the master now. Thanks!

from smile.

Xyclade avatar Xyclade commented on April 28, 2024

Thanks! I'll add a unit test and PR it to this issue later today, as I'm in a meeting right now.

from smile.

Xyclade avatar Xyclade commented on April 28, 2024

PS. the reason I found this was by extracting a feature array for 1 category, then for a second one and combining the two feature arrays into 1 new one, causing duplicates to occur as some terms where in both categories. I should probably remove the intersecting features to make the algorithm predictions better, or does the implementation take these into account?

from smile.

haifengl avatar haifengl commented on April 28, 2024

Yes, you better use Set class to get the union of the features rather than just concat in general. But the constructor now handles the duplicates. So it should be fine for you to use the feature list in your example.

The main problem is the documentation of Bag. It is not for training purpose. It just assumes that you have a set of unique features and uses them to calculate the double valued feature vectors of some text for you.

from smile.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.