Giter Site home page Giter Site logo

k-means's People

Contributors

croshiw avatar ducasse avatar jecisc avatar jordanmontt avatar olekscode avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

croshiw

k-means's Issues

Prefix classes with ML?

I would like to propose to prefix the KMean classes with ML, as there is another implementation of KMeans which uses the same name, and it would be more consistent with new packages like MLViz and MLMetrics.

What do you think?

Implement coverage method

We should have a method, common to all algos, that returns a boolean saying that the algo has reached convergence or not. We need to separete that of the max_iterations.

If the algo has reached the max iterations and has not converged it should return false.

This should be for all algos

You should be able to define initial centroids

When initializating kmeans, you should be able to choose if you want the initial centroids to be random or #6, or if you want to set your own initial centroids. Keep in mind that If an array is passed, it should be of shape (n_clusters, n_features) and gives the initial centers or else raise an exception.
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html.

Adding this feature would be helpful in testing that empty clusters are relocated as expected if you give one initial centroid that is far from the data which means that a cluster will be empty on the first iteration

Double dispatch for #fit: method

Currently I'm using k-means with DataFrame as follows:

| df kmeans |
df := Datasets loadIris columnsFrom: 1 to: 4.
kmeans := KMeans numberOfClusters: 3.
kmeans fit: (df asArrayOfRows collect: #asArray).

It would be nice if the argument to #fit: could be just the DataFrame, which knows how to be fitted with a KMeans algorithm:

kmeans fit: df

This way fit could receive also PMMatrix and similar matrix-like objects. And each one is responsible to implement:

DataFrame>>fitKMeans: aKmeans
  aKmeans fit: (self asArrayOfRows collect: #asArray).

and so on...

nbClusters should be between 1 and data size

This

 | data kmeans |
	data := #( #( 0 0 ) #( 0.5 0 ) #( 0.5 1 ) #( 1 1 ) ).

	kmeans := AIKMeans numberOfClusters: 100.

	kmeans fit: data

and this

| data kmeans |
	data := #( #( 0 0 ) #( 0.5 0 ) #( 0.5 1 ) #( 1 1 ) ).
	kmeans := AIKMeans numberOfClusters: -100.
	kmeans fit: data

should raise an appropriate exception because now the first one either works or says Tooktoomuchtime while the second raises a SubscriptionOutOfBound

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.