Comments (6)
This is not due to the no. of observations, but likely due to your specific data. Note that Cao's init method can be very slow when n_clusters is large, maybe that's it.
You can run the benchmark.py script in the examples directory (which has 10k observations) to see if that works for you.
Also, you're leaving me guessing with this little information about the specifics of your problem...
from kmodes.
Same here. I was able to run DBSCAN, but I'm struggling to run k-Prototype. Not sure about what strategy to follow to be able to run it.
from kmodes.
@paulaceccon , what are the dimensions of your problem? Also, please provide sample output, running in verbose mode.
from kmodes.
Any insight as to why kprototypes is a lot slower than kmodes for similar datasets? Just trying to understand the algorithm better.
I've got only 3 numerical variables in my dataset(out of 12 total) and if I get rid of one and turn the other two into categories, kmodes runs instantaneously for n=1-10
from kmodes.
I've had some time to analyze this problem by profiling my code.
In order to determine the clusters in the k-means part of the algorithm, we need to divide the sums of attribute values by the number of points in the cluster. I realized I was caching the sums of the attributes alright, but not the sums of the memberships in the clusters.
The following commit resolves this:
1a6a7be
I'm seeing very significant speedups as a result. :)
Thanks to everyone for pointing it out.
from kmodes.
Oh, and I've included a benchmark script specifically for k-prototypes in the examples folder.
from kmodes.
Related Issues (20)
- k-prototype seems to focus on one continuous variable HOT 1
- Reduce memory usage in array initialization HOT 2
- GPU ( cuda ) support? HOT 1
- Add L1 as a dissimilarity function option for continuous variables HOT 1
- Performance over binary data HOT 1
- parallelization HOT 4
- KPrototypes fit_predict fails with sample_weight HOT 2
- Apologies if this is redundant but I could not find documentation ... how do you extract class membership from an object created by the function KPrototypes HOT 1
- What are the minimum characteristics that a binary matrix must meet to avoid the following error: "Insufficient Number of data since union is 0"? HOT 1
- ValueError: All arrays must be of the same length HOT 3
- Euclidean distance definiton lacks a square root HOT 2
- Support Arm64 macos HOT 1
- Please add conda installation information HOT 1
- Different clusters when K-Prototypes trained on same data in numpy array and pandas dataframe HOT 1
- Li
- Estimation of Gamma in K-Prototypes HOT 1
- [BUG] Badge not rendering in readme HOT 2
- Incorrect dtype conversion of categoricals when dealing with manually assigned centroids HOT 2
- Create equal-sized clusters within kmodes HOT 1
- Value Error when I pass a NumPy array as init parameter HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kmodes.