Giter Site home page Giter Site logo

Comments (6)

nicodv avatar nicodv commented on July 25, 2024

This is not due to the no. of observations, but likely due to your specific data. Note that Cao's init method can be very slow when n_clusters is large, maybe that's it.

You can run the benchmark.py script in the examples directory (which has 10k observations) to see if that works for you.

Also, you're leaving me guessing with this little information about the specifics of your problem...

from kmodes.

paulaceccon avatar paulaceccon commented on July 25, 2024

Same here. I was able to run DBSCAN, but I'm struggling to run k-Prototype. Not sure about what strategy to follow to be able to run it.

from kmodes.

nicodv avatar nicodv commented on July 25, 2024

@paulaceccon , what are the dimensions of your problem? Also, please provide sample output, running in verbose mode.

from kmodes.

mpikoula avatar mpikoula commented on July 25, 2024

Any insight as to why kprototypes is a lot slower than kmodes for similar datasets? Just trying to understand the algorithm better.

I've got only 3 numerical variables in my dataset(out of 12 total) and if I get rid of one and turn the other two into categories, kmodes runs instantaneously for n=1-10

from kmodes.

nicodv avatar nicodv commented on July 25, 2024

I've had some time to analyze this problem by profiling my code.

In order to determine the clusters in the k-means part of the algorithm, we need to divide the sums of attribute values by the number of points in the cluster. I realized I was caching the sums of the attributes alright, but not the sums of the memberships in the clusters.

The following commit resolves this:
1a6a7be

I'm seeing very significant speedups as a result. :)

Thanks to everyone for pointing it out.

from kmodes.

nicodv avatar nicodv commented on July 25, 2024

Oh, and I've included a benchmark script specifically for k-prototypes in the examples folder.

from kmodes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.