Giter Site home page Giter Site logo

Comments (8)

motiwari avatar motiwari commented on July 29, 2024

Code to reproduce bug:

# code starts
from banditpam import KMedoids
import numpy as np
from scipy.spatial import  distance_matrix
from sklearn_extra import cluster

rand_seed = 10
np.random.seed(rand_seed)

n = 20000
d = 10
X = np.random.rand(n,d)

# BanditPAM's BUILD run

k = 100
delta1 = 2*k/n # this value of delta1 should correspond to \delta = n^{-3} as in Theorem 1 of BanditPAM
useCache = False
maxIter = 0
buildConfidence = np.floor(np.log(2*n*k/delta1)).astype(np.int64)
# delta1 = 2*k/n makes buildConfidence ~ log(n^2) (subject to integer rounding) and overall \delta ~ n^{-3}

kmed = KMedoids(n_medoids=k, algorithm="BanditPAM", build_confidence = buildConfidence, use_cache = useCache, \
max_iter = maxIter)
kmed.fit(X, 'L2')

banditpam_build_medoids_idx = kmed.build_medoids
banditpam_build_medoids = X[banditpam_build_medoids_idx,:]

banditpam_medoids_ref_cost_distance_matrix = distance_matrix(banditpam_build_medoids,X)
banditpam_objective = np.sum(np.min(banditpam_medoids_ref_cost_distance_matrix,0))

# PAM's build run, using sklearn's KMedoids BUILD step
sklearn_kmed = cluster.KMedoids(n_clusters=k, metric='euclidean', method='pam', init='build', max_iter = 0).fit(X)

sklearn_build_medoids_idx = sklearn_kmed.medoid_indices_
sklearn_build_medoids = X[sklearn_build_medoids_idx,:]

sklearn_medoids_ref_cost_distance_matrix2 = distance_matrix(sklearn_build_medoids,X)
sklearn_objective2 = np.sum(np.min(sklearn_medoids_ref_cost_distance_matrix2,0))

print('BanditPAM BUILD objective: ', banditpam_objective, ' sklearn KMedoids BUILD objective: ', sklearn_objective2)
print('Out of ', k, ', common medoids selected by the two algorithms: ', \
len(np.intersect1d(sklearn_build_medoids_idx,banditpam_build_medoids_idx)))
#sklearn_cluster_centers = sklearn_kmed.cluster_centers_
#sklearn_medoids_ref_cost_distance_matrix1 = distance_matrix(sklearn_cluster_centers,X)
#sklearn_objective1 = np.sum(np.min(sklearn_medoids_ref_cost_distance_matrix1,0)) #gives same output as sklearn_objective2

# code ends

from banditpam.

motiwari avatar motiwari commented on July 29, 2024

@lukeleeai can you take a look and see if you can reproduce this error? It's probably related to some of the correctness issues we're seeing with the loss in BanditPAM vs. BanditPAM++

from banditpam.

lukeleeai avatar lukeleeai commented on July 29, 2024

Thanks for sharing this. I will add it to my todo!

from banditpam.

lukeleeai avatar lukeleeai commented on July 29, 2024

The corrected code is proposed in the Pull Request!

from banditpam.

motiwari avatar motiwari commented on July 29, 2024

@lukeleeai it looks like the linked PR was closed with unmerged commits. Is that intentional?

The user is still reporting that this is an issue. Could you verify this issue is resolved in v4.0.2? Or do we need to wait until v4.0.3?

image

from banditpam.

lukeleeai avatar lukeleeai commented on July 29, 2024

from banditpam.

motiwari avatar motiwari commented on July 29, 2024

Oh right! So this is currently expected to still fail in v4.0.2 and will be fixed when @Adarsh321123 ships v4.0.3, correct? @lukeleeai

from banditpam.

lukeleeai avatar lukeleeai commented on July 29, 2024

from banditpam.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.