The <a href="http://clusteringjl.readthedocs.org/en/latest/kmedoids.html" rel="nofollo

I think <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

The k-medoids code was rewritten by <a class="user-mention notranslate" data-hovercard

I just looked at the implementation and docs. C is actually an <a href="https://github

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How to use k-medoids? about clustering.jl HOT 9 CLOSED

juliastats commented on August 23, 2024

How to use k-medoids?

from clustering.jl.

Comments (9)

kingzbauer commented on August 23, 2024 2

This might be related. I keep getting this error when trying to run kmedoids

julia> kmedoids(mat, 8)
ERROR: AssertionError: !(isempty(grp))
 in _find_medoid at /root/.julia/v0.4/Clustering/src/kmedoids.jl:189
 in _kmedoids! at /root/.julia/v0.4/Clustering/src/kmedoids.jl:100
 in kmedoids at /root/.julia/v0.4/Clustering/src/kmedoids.jl:39

mat is a distance matrix.

from clustering.jl.

johnmyleswhite commented on August 23, 2024

I think @lendle may be the main person who knows about this code.

from clustering.jl.

lendle commented on August 23, 2024

The k-medoids code was rewritten by @cyocum in #22.

from clustering.jl.

lendle commented on August 23, 2024

I just looked at the implementation and docs. C is actually an n x n matrix (not k x n).

The docs say "C – The cost matrix, where C[i,j] is the cost of assigning sample j to the medoid i" is a bit unclear. The ith row does correspond to the ith medoid for i = 1, ..., k, but corresponds to the cost associated with assigning each sample to a medoid defined by sample i for i = 1, ..., n.
@waTeim, would "C – The cost matrix, where C[i,j] is the cost of assigning sample j to a cluster with medoid sample i" be more clear?

from clustering.jl.

waTeim commented on August 23, 2024

I don't think so. Are you sure because that approach has a lot of problems. It's not really the algorithm as documented in Wikipedia, it should be kxn and be re-calculated every iteration because the medoids can change. If it's nxn then that becomes unusable because take for instance how I was going to use it -- 90,000 data points gives rise to a 90000x90000 matrix which uses up more memory that the machine I have to run it on (40 GB RAM), and frankly that's kinda small compared to todays dataset sizes.

from clustering.jl.

lindahua commented on August 23, 2024

The current approach takes as input a pre-computed pairwise cost matrix. When n is very large, one should use a different algorithm.

from clustering.jl.

waTeim commented on August 23, 2024

Yea, that's more efficient if the number of rows^2 is storable, but in this case, nope. And like I said, this isn't even the largest of the datasets. I was looking around for a simple implementation of PAM to submit as a PR so there there would be options, but am stuck on the "compare cost of each non-medioid with a mediod cost and swap if lower" seems too inefficient to implement directly like that.

from clustering.jl.

diegozea commented on August 23, 2024

@waTeim If your matrix is symmetric, maybe https://github.com/diegozea/PairwiseListMatrices.jl can be useful to you.

from clustering.jl.

alyst commented on August 23, 2024

This is an old question, that does not apply to the new distance-based API.

from clustering.jl.

How to use k-medoids? about clustering.jl HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent