Describe the issue linked to the documentation As discussed wit <a

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

fixed in <a class="commit-link" data-hovercard-type="commit" data-hovercard-url="https

better description of transactions about scikit-mine HOT 8 CLOSED

scikit-mine commented on September 23, 2024

better description of transactions

from scikit-mine.

Comments (8)

PeggyCellier commented on September 23, 2024 1

Yes it is ok for me :-)

from scikit-mine.

remiadon commented on September 23, 2024

@alexandre-termier @PeggyCellier do you know where I can find usage infos on these datasets ?
The only info I have now is for retail
Informations on accidents only concerns the raw data, as presented here

from scikit-mine.

PeggyCellier commented on September 23, 2024

@PeggyCellier what would you prefer ? I think describe_transactions is both short and explicit

I would prefer describe_dataset. Is it possible ?

from scikit-mine.

remiadon commented on September 23, 2024

@PeggyCellier I have updated the name to "describe_transactions"

Rationale behind this

First, we have to stick with the sklearn philosophy as much as possible

scikit-learn provides a dedicated function for every single type of data generation,
for classification, they have a make_classification for classification, a make_regression for regression, etc ...
In our case we have classic "transactions", but in the future we may also add sequential transactions, and other types

In addition, we have to provide a consistent way to both generate data and check properties on this data --> a make_transactions, for which the mirror function is describe_transactions

make_dataset and describe_dataset would not make sense, at least in the sklearn framework.

I tried to find a compromise. Hope this syntax remains intuitive to you

from scikit-mine.

PeggyCellier commented on September 23, 2024

I took a look on the last meeting and we decided to name it: skmine.datasets.utils.describe(D)
and to integrate the type of the data in the results of the function...
But the compatibility with sklearn is a good point...

@alexandre-termier what is your opinion?

from scikit-mine.

alexandre-termier commented on September 23, 2024

I understand the need for compatibility with sklearn whenever possible.
However, I do not agree with the make_X / describe_X argument.
When making a new dataset, of course precise information on the type of the dataset is required: the system has no way to infer it at that point. So I buy the "make_transaction" name.
But when describing a dataset, the dataset already exists and has a type: why would the user need to remember that for a transactional dataset she has to use function describe_transaction, and for a sequence dataset she has to use function describe_sequence ?
For me this is cumbersome. Look at the equivalent in any collection library of a programming language: one does not write size_list for a list and size_set for a set! There is a generic function size that is easy to remember, and that does the adequate job whatever the underlying collection.
For me, for describe this is what we should aim.

More generally, I think that for the project, our philosophy should be "the user first": try to provide something as easy to use and as unobstrusive as possible for our users.
Of course we have to balance that with scikit learn compatibility, but I think that compromises should go towards simplicity.

from scikit-mine.

remiadon commented on September 23, 2024

@alexandre-termier I get your point
I guess "describe" is the good way to go

If @PeggyCellier is OK with this name I'll change it

from scikit-mine.

remiadon commented on September 23, 2024

fixed in 7980faf

from scikit-mine.

better description of transactions about scikit-mine HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent