Giter Site home page Giter Site logo

Ordered itemsets about scikit-mine HOT 4 CLOSED

scikit-mine avatar scikit-mine commented on September 23, 2024
Ordered itemsets

from scikit-mine.

Comments (4)

lgalarra avatar lgalarra commented on September 23, 2024 1

Thanks Rémi, that helps. I would probably recommend you to include how to run an existence test in the documentation, e.g.,
assert frozenset({14, 15, 16}) in patterns.itemset.values
It comes handy at debugging.

Cheers,
Luis

from scikit-mine.

remiadon avatar remiadon commented on September 23, 2024

Hi Luis,
Good point !!

Short answer : Yes it's possible


Longer answer :

I believe that having a consistent order for the output itemsets may make other tasks, such as testing

That's right, I already had some issue with inconsistent outputs with another algorithm using the same representation

For now itemsets are represented as python frozenset, mainly for 3 reasons

  • a frozenset is hashable
  • it's immutable
  • it's a primitive data type, recognized by every Python implementation

IMO itemsets should stay hashable, it's a good practice. They also should remain immutable, to avoid opportunistic errors during the mining. The third assumption, thought, can be modified. If we import a data type from one of our dependecy, we can stick with it

My trick is to use a data type provided by pandas

>>> from pandas.core.indexes.frozen import FrozenList
>>> itemset = FrozenList([2, 3])
>>> itemset[1:]  # slicing OK
FrozenList([3])
>>> hash(itemset)  # works

Some more simple solution might be to use the tuple data type

>>> a = tuple((2, 3))
>>> a[1:]  # works
>>> hash(a) # also works

On the other hand, a natural ordered may not be defined for itemsets of arbitrary data types

Actually we use sortedcontainer.SortedSet to track items in LCM.fit, so we already check for items to be comparable. If not, we throw an error at .fit time, so that .transform is not even called.

Hope this helps

from scikit-mine.

remiadon avatar remiadon commented on September 23, 2024

fixed in 54a24f6

doc updated see https://scikit-mine.github.io/scikit-mine/tutorials/itemsets/LCM_on_chess.html

from scikit-mine.

lgalarra avatar lgalarra commented on September 23, 2024

Thanks Rémi. I still have a recommendation. I saw the tests and now you we use the tuple notation to check the existence of an itemset. Perhaps the documentation should show the "official" way to do it (if you think that such a task may be done for other reasons besides debugging).

from scikit-mine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.