Comments (8)
Code to reproduce bug:
# code starts
from banditpam import KMedoids
import numpy as np
from scipy.spatial import distance_matrix
from sklearn_extra import cluster
rand_seed = 10
np.random.seed(rand_seed)
n = 20000
d = 10
X = np.random.rand(n,d)
# BanditPAM's BUILD run
k = 100
delta1 = 2*k/n # this value of delta1 should correspond to \delta = n^{-3} as in Theorem 1 of BanditPAM
useCache = False
maxIter = 0
buildConfidence = np.floor(np.log(2*n*k/delta1)).astype(np.int64)
# delta1 = 2*k/n makes buildConfidence ~ log(n^2) (subject to integer rounding) and overall \delta ~ n^{-3}
kmed = KMedoids(n_medoids=k, algorithm="BanditPAM", build_confidence = buildConfidence, use_cache = useCache, \
max_iter = maxIter)
kmed.fit(X, 'L2')
banditpam_build_medoids_idx = kmed.build_medoids
banditpam_build_medoids = X[banditpam_build_medoids_idx,:]
banditpam_medoids_ref_cost_distance_matrix = distance_matrix(banditpam_build_medoids,X)
banditpam_objective = np.sum(np.min(banditpam_medoids_ref_cost_distance_matrix,0))
# PAM's build run, using sklearn's KMedoids BUILD step
sklearn_kmed = cluster.KMedoids(n_clusters=k, metric='euclidean', method='pam', init='build', max_iter = 0).fit(X)
sklearn_build_medoids_idx = sklearn_kmed.medoid_indices_
sklearn_build_medoids = X[sklearn_build_medoids_idx,:]
sklearn_medoids_ref_cost_distance_matrix2 = distance_matrix(sklearn_build_medoids,X)
sklearn_objective2 = np.sum(np.min(sklearn_medoids_ref_cost_distance_matrix2,0))
print('BanditPAM BUILD objective: ', banditpam_objective, ' sklearn KMedoids BUILD objective: ', sklearn_objective2)
print('Out of ', k, ', common medoids selected by the two algorithms: ', \
len(np.intersect1d(sklearn_build_medoids_idx,banditpam_build_medoids_idx)))
#sklearn_cluster_centers = sklearn_kmed.cluster_centers_
#sklearn_medoids_ref_cost_distance_matrix1 = distance_matrix(sklearn_cluster_centers,X)
#sklearn_objective1 = np.sum(np.min(sklearn_medoids_ref_cost_distance_matrix1,0)) #gives same output as sklearn_objective2
# code ends
from banditpam.
@lukeleeai can you take a look and see if you can reproduce this error? It's probably related to some of the correctness issues we're seeing with the loss in BanditPAM vs. BanditPAM++
from banditpam.
Thanks for sharing this. I will add it to my todo!
from banditpam.
The corrected code is proposed in the Pull Request!
from banditpam.
@lukeleeai it looks like the linked PR was closed with unmerged commits. Is that intentional?
The user is still reporting that this is an issue. Could you verify this issue is resolved in v4.0.2
? Or do we need to wait until v4.0.3
?
from banditpam.
from banditpam.
Oh right! So this is currently expected to still fail in v4.0.2
and will be fixed when @Adarsh321123 ships v4.0.3
, correct? @lukeleeai
from banditpam.
from banditpam.
Related Issues (20)
- Standardize `banditpam` across URL and all code
- Make `R_package` subdirectory lint-compliant
- pip install on windows doesn't use MS Visual C++ compiler HOT 1
- cannot pickle 'banditpam.KMedoids' object HOT 3
- Bug Report: Slower than k-means on `n=10,000` moon dataset
- Upload wheels to conda
- Add citation to "About" section of repo
- Cannot build locally with `conda` `python` due to `x86_64` vs. `arm64` mismatch
- New error in GHA
- Link to full paper with appendices in `README.md`
- BanditPAM is slower than sklearn
- Minor performance difference between BanditPAM and sklearn for a small number of data points
- Change github actions that upload wheels to upload to TestPyPI on PR update, and PyPI on release
- Include OpenMP support on Windows HOT 1
- Need GHA to build wheels for Windows and push to PyPI + GHA to build package and run tests on Windows
- Upload M1 Mac wheels to TestPyPI and PyPI
- pip install banditpam error HOT 9
- Unpredictable Performance with Parallelization Enabled on ScRNA Dataset
- Race condition when calculating the loss
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from banditpam.