Giter Site home page Giter Site logo

Comments (8)

PeggyCellier avatar PeggyCellier commented on September 23, 2024 1

Yes it is ok for me :-)

from scikit-mine.

remiadon avatar remiadon commented on September 23, 2024

@alexandre-termier @PeggyCellier do you know where I can find usage infos on these datasets ?
The only info I have now is for retail
Informations on accidents only concerns the raw data, as presented here

from scikit-mine.

PeggyCellier avatar PeggyCellier commented on September 23, 2024

@PeggyCellier what would you prefer ? I think describe_transactions is both short and explicit

I would prefer describe_dataset. Is it possible ?

from scikit-mine.

remiadon avatar remiadon commented on September 23, 2024

@PeggyCellier I have updated the name to "describe_transactions"


Rationale behind this

First, we have to stick with the sklearn philosophy as much as possible

scikit-learn provides a dedicated function for every single type of data generation,
for classification, they have a make_classification for classification, a make_regression for regression, etc ...
In our case we have classic "transactions", but in the future we may also add sequential transactions, and other types

In addition, we have to provide a consistent way to both generate data and check properties on this data --> a make_transactions, for which the mirror function is describe_transactions

make_dataset and describe_dataset would not make sense, at least in the sklearn framework.

I tried to find a compromise. Hope this syntax remains intuitive to you

from scikit-mine.

PeggyCellier avatar PeggyCellier commented on September 23, 2024

I took a look on the last meeting and we decided to name it: skmine.datasets.utils.describe(D)
and to integrate the type of the data in the results of the function...
But the compatibility with sklearn is a good point...

@alexandre-termier what is your opinion?

from scikit-mine.

alexandre-termier avatar alexandre-termier commented on September 23, 2024

I understand the need for compatibility with sklearn whenever possible.
However, I do not agree with the make_X / describe_X argument.
When making a new dataset, of course precise information on the type of the dataset is required: the system has no way to infer it at that point. So I buy the "make_transaction" name.
But when describing a dataset, the dataset already exists and has a type: why would the user need to remember that for a transactional dataset she has to use function describe_transaction, and for a sequence dataset she has to use function describe_sequence ?
For me this is cumbersome. Look at the equivalent in any collection library of a programming language: one does not write size_list for a list and size_set for a set! There is a generic function size that is easy to remember, and that does the adequate job whatever the underlying collection.
For me, for describe this is what we should aim.

More generally, I think that for the project, our philosophy should be "the user first": try to provide something as easy to use and as unobstrusive as possible for our users.
Of course we have to balance that with scikit learn compatibility, but I think that compromises should go towards simplicity.

from scikit-mine.

remiadon avatar remiadon commented on September 23, 2024

@alexandre-termier I get your point
I guess "describe" is the good way to go

If @PeggyCellier is OK with this name I'll change it

from scikit-mine.

remiadon avatar remiadon commented on September 23, 2024

fixed in 7980faf

from scikit-mine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.