Comments (8)
Yes it is ok for me :-)
from scikit-mine.
@alexandre-termier @PeggyCellier do you know where I can find usage infos on these datasets ?
The only info I have now is for retail
Informations on accidents
only concerns the raw data, as presented here
from scikit-mine.
@PeggyCellier what would you prefer ? I think describe_transactions is both short and explicit
I would prefer describe_dataset. Is it possible ?
from scikit-mine.
@PeggyCellier I have updated the name to "describe_transactions"
Rationale behind this
First, we have to stick with the sklearn philosophy as much as possible
scikit-learn provides a dedicated function for every single type of data generation,
for classification, they have a make_classification
for classification, a make_regression
for regression, etc ...
In our case we have classic "transactions", but in the future we may also add sequential transactions, and other types
In addition, we have to provide a consistent way to both generate data and check properties on this data --> a make_transactions
, for which the mirror function is describe_transactions
make_dataset
and describe_dataset
would not make sense, at least in the sklearn framework.
I tried to find a compromise. Hope this syntax remains intuitive to you
from scikit-mine.
I took a look on the last meeting and we decided to name it: skmine.datasets.utils.describe(D)
and to integrate the type of the data in the results of the function...
But the compatibility with sklearn is a good point...
@alexandre-termier what is your opinion?
from scikit-mine.
I understand the need for compatibility with sklearn whenever possible.
However, I do not agree with the make_X / describe_X argument.
When making a new dataset, of course precise information on the type of the dataset is required: the system has no way to infer it at that point. So I buy the "make_transaction" name.
But when describing a dataset, the dataset already exists and has a type: why would the user need to remember that for a transactional dataset she has to use function describe_transaction
, and for a sequence dataset she has to use function describe_sequence
?
For me this is cumbersome. Look at the equivalent in any collection library of a programming language: one does not write size_list
for a list and size_set
for a set! There is a generic function size
that is easy to remember, and that does the adequate job whatever the underlying collection.
For me, for describe
this is what we should aim.
More generally, I think that for the project, our philosophy should be "the user first": try to provide something as easy to use and as unobstrusive as possible for our users.
Of course we have to balance that with scikit learn compatibility, but I think that compromises should go towards simplicity.
from scikit-mine.
@alexandre-termier I get your point
I guess "describe" is the good way to go
If @PeggyCellier is OK with this name I'll change it
from scikit-mine.
fixed in 7980faf
from scikit-mine.
Related Issues (20)
- avoid data copies in PeriodicCycleMiner
- SLIM for high dimensional data HOT 1
- Make scikit-mine profile friendly HOT 2
- KRIMP Imputation
- MDLP Discretizer v2 HOT 1
- notebook for PeriodicCycleMiner
- fetch_instancart is broken HOT 1
- [perf] skmine.periodic.cycles.extract_triples is really slow for n_points > 200
- Question about parameter k of SLIM HOT 2
- Return type of SLIM HOT 4
- Don't understand return values of decision_function of SLIM HOT 3
- apriori and CBA HOT 1
- MDLPDiscretizer: cut_points_ is not sorted HOT 2
- inherit sklearn BaseEstimator HOT 2
- MDLPDiscretizer: cut_points_ sometimes contains ambiguous values HOT 1
- max_time follows an anti-pattern in SLIM HOT 1
- preprocessing of transactionnal database affect code length HOT 2
- OneVsOneClassifier for SLIM doesnt't work
- environment for doc generation
- CoverTransformer HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scikit-mine.