Comments (4)
Thanks Rémi, that helps. I would probably recommend you to include how to run an existence test in the documentation, e.g.,
assert frozenset({14, 15, 16}) in patterns.itemset.values
It comes handy at debugging.
Cheers,
Luis
from scikit-mine.
Hi Luis,
Good point !!
Short answer : Yes it's possible
Longer answer :
I believe that having a consistent order for the output itemsets may make other tasks, such as testing
That's right, I already had some issue with inconsistent outputs with another algorithm using the same representation
For now itemsets are represented as python frozenset
, mainly for 3 reasons
- a frozenset is hashable
- it's immutable
- it's a primitive data type, recognized by every Python implementation
IMO itemsets should stay hashable, it's a good practice. They also should remain immutable, to avoid opportunistic errors during the mining. The third assumption, thought, can be modified. If we import a data type from one of our dependecy, we can stick with it
My trick is to use a data type provided by pandas
>>> from pandas.core.indexes.frozen import FrozenList
>>> itemset = FrozenList([2, 3])
>>> itemset[1:] # slicing OK
FrozenList([3])
>>> hash(itemset) # works
Some more simple solution might be to use the tuple
data type
>>> a = tuple((2, 3))
>>> a[1:] # works
>>> hash(a) # also works
On the other hand, a natural ordered may not be defined for itemsets of arbitrary data types
Actually we use sortedcontainer.SortedSet
to track items in LCM.fit
, so we already check for items to be comparable. If not, we throw an error at .fit
time, so that .transform
is not even called.
Hope this helps
from scikit-mine.
fixed in 54a24f6
doc updated see https://scikit-mine.github.io/scikit-mine/tutorials/itemsets/LCM_on_chess.html
from scikit-mine.
Thanks Rémi. I still have a recommendation. I saw the tests and now you we use the tuple notation to check the existence of an itemset. Perhaps the documentation should show the "official" way to do it (if you think that such a task may be done for other reasons besides debugging).
from scikit-mine.
Related Issues (20)
- avoid data copies in PeriodicCycleMiner
- SLIM for high dimensional data HOT 1
- Make scikit-mine profile friendly HOT 2
- KRIMP Imputation
- MDLP Discretizer v2 HOT 1
- notebook for PeriodicCycleMiner
- fetch_instancart is broken HOT 1
- [perf] skmine.periodic.cycles.extract_triples is really slow for n_points > 200
- Question about parameter k of SLIM HOT 2
- Return type of SLIM HOT 4
- Don't understand return values of decision_function of SLIM HOT 3
- apriori and CBA HOT 1
- MDLPDiscretizer: cut_points_ is not sorted HOT 2
- inherit sklearn BaseEstimator HOT 2
- MDLPDiscretizer: cut_points_ sometimes contains ambiguous values HOT 1
- max_time follows an anti-pattern in SLIM HOT 1
- preprocessing of transactionnal database affect code length HOT 2
- OneVsOneClassifier for SLIM doesnt't work
- environment for doc generation
- CoverTransformer HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scikit-mine.