ssarfraz / finch-clustering Goto Github PK
View Code? Open in Web Editor NEWSource Code for FINCH Clustering Algorithm
License: Other
Source Code for FINCH Clustering Algorithm
License: Other
The code is working fine. But the performance I have got is always 0.96536 in terms of NMI (implemented in sklearn.metrics).
The code I run is as follows:
import numpy as np
import scipy.io as sio
from sklearn.metrics import normalized_mutual_info_score as nmi
from .finch import FINCH
data = sio.loadmat("Agg.mat")
X = data["X"]
y_true = data["Y"]
c_true = len(np.unique(y_true))
Y, num_clu, req_y = FINCH(X, req_clust=c_true, distance='euclidean') # or cosine
acc = nmi(y_true, req_y, average_method="max")
print(acc)
Looking forward to your reply
why the algorithm triger an error when working on s1 dataset from http://cs.joensuu.fi/sipu/datasets/
~/finchcls.py in update_adj(self, adj, d)
94 v = np.argsort(d[idx])
95 v = v[:2]
---> 96 x = [idx[0][v[0]], idx[0][v[1]]]
97 y = [idx[1][v[0]], idx[1][v[1]]]
98 a = sp.lil_matrix(adj.get_shape())
IndexError: index 0 is out of bounds for axis 0 with size 0
the same error with a1 dataset and "unbalance" dataset
any other datasets it works fine
It would be great if you could publish your code for TW-FINCH, since it is a bit hard to replicate the results from the paper.
Thank your opened code,I want to know what mean about output of 'C', It is a N*2 array,what which is cluster label? I found about my data get bad result ,I want to reason.
Hi, thanks for your great job. How to input a precomputed distance matrix instead of data? Could you please release a version ?
It is amazing that this unsupervised clutering method outperforms other paradigms on five challenging action segmentation datasets. However, some details puzzle me a lot, just about how to map the obtainded segments with different action labels (including background) using Hungarian algorithm. It would pretty appreciate if these problems would be explained.
The former is deprecated and pip throws a hissy fit
Thank you for the greate method and code.
As far as I understand, I think algo2 is needed for evaluation, but I don't think there is a corresponding python code.
Hello,
Thank you for publishing your excellent work.
I was testing the TW_FINCH for clustering and it has been working well, but when I tried to specify the exact number of clusters I wanted, I got the following error:
[186] ind = [i for i, v in enumerate(num_clust) if v >= req_clust]
--> [187] req_c = req_numclust(c[:, ind[-1]], data, req_clust, distance, use_tw_finch=tw_finch)
[188]else:
[189] req_c = c[:, num_clust.index(req_clust)]
IndexError: list index out of range```
It seems to be in the c[:,ind[-1]] call.
What could be the reason behind this error?
Thank you.
Dear @ssarfraz ,
I am sorry for disturbing you, but could you please describe in more detail the tool or the source code you visualize the Figure 2 in your paper? Thank you so much!
Hi, I fixed the random seed and input data and then applied FINCH for clustering. But I found that the results obtained by each clustering are different, what should I do to ensure that I can get a fixed result every time?
P.S. I have a large amount of data (hundreds of thousands) and use the NNDescent method in 'pynndescent', is it possible that this is the cause? What can I do?
Looking forward to your reply, thank you very much
hello,thank you for posting the code for the TWFinch and great work!I have tried to reproduce the results,but I meet some problems when I run the run_on_dataset.m.
I downloaded the data and put it under E:\FINCH-Clustering-master\TW-FINCH,
then I run the script tw_finch = true Result = run_on_dataset('50Salads', tw_finch, 'E:\FINCH-Clustering-master\TW-FINCH\Action_Segmentation_Datasets');
the error is as follows
thanks for this amazing and practical algorihtm
when I browse the python ver code, I find the element of adjacent matrix may greater than 1 as below
csr_matrix in python.finch.py line45
0, 0, 0, 0, 1
0, 0, 0, 0, 1
0, 1, 0, 0, 0
0, 0, 0, 0, 1
0, 1, 0, 0, 0
adjacent matrix in line50
0, 1, 0, 1, 1
1, 0, 1, 1, 2
0, 1, 0, 0, 1
1, 1, 0, 0, 1
1, 2, 1, 1, 0
maybe this will impact the value of min_sim in hierarchy cluster line155
The segmentation faults occurs at the call of NNDescent function where RP-trees are being built and descent steps are about to start. I am using the H-NNE (koulakis/h-nne#17) algorithm which uses FINCH under the hood.
thanks for your work!!! I love it very much!!
I want to know Why does edge (vertices are the most similar nodes of each other) have greater weight when compare with min_sim?
In the src code, the weight of edge(vertices are the most similar nodes of each other) is 2, while others is 1 when compared with min_sim
Hi~ Thanks for releasing your code and great work! I would appreciate if you can help me with the features of these two datasets.
Commit b508b1a intended to remove sklearn
dependency, but actually removed scipy
. You can check the commit's diff here.
scipy
is still installed since it's a dependency of scikit-learn
, but we also get the deprecated sklearn
package.
This means that the problem from #29 still affects finch-clust==0.1.8
. We can check it by doing the following (based on How to test whether a package will be affected by the sklearn deprecation):
SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=False \
pip install finch-clust==0.1.8
I call finch using
cluster_partition, n_part_clust, part_labels = FINCH(data, req_clust=2)
and receive this error
line 185, in FINCH
req_c = req_numclust(c[:, ind[-1]], data, req_clust, distance, use_ann_above_samples, verbose)
IndexError: list index out of range
My best guess is, that there is only one cluster, so the condition v >= req_clust
is never fulfilled in ind = [i for i, v in enumerate(num_clust) if v >= req_clust]
, thereby the index list is empty, thereby ind[-1]
is out of range.
What is the implication and how to best deal with this?
Hey,
I was trying to replicate the numbers presented in the paper with the features provided and my numbers seem to be a bit on the lower side. Without changing anything, I ran the python version of the code, and what i noticed was on breakfast I am getting an MOF of 60.1 whereas the reported number is 62.7. Similarly for MPII, I am getting 41.51 but reported number is 42.0 (Though very minor). Is there a reason for this discrepancy?
[
I realized, that the python implementation does yield different results than the matlab version.
This I have found out by first comparing a python evaluation of the tw-finch clustering results against the provided matlab evaluation one, with one of the provided datasets and the features from the TW-FINCH paper.
After looking a little more into the issue, I have found that already in the first steps of the clustering process, both version assign the same features/frames to different clusters and the number of clusters is drastically different too, which explains the performance differences.
Have you encountered this issue and if yes are there solutions?
Suppose I have a dataset of 3000w items, each item is a 2048-d vector.
Thanks
All I konw is how to get the percision and recall. But I don't know how to get the midpoint hit, and they are different.
Hi, I tried to convert my video into a numpy array as method shown here (https://stackoverflow.com/questions/67644826/how-to-convert-a-video-to-a-numpy-array) . And now when I pass it as a input to the function as FINCH(data, req_clust=K, tw_finch=True) I am getting :
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 4 dimensions(s) and the array at index 1 has 2 deminesion (s). The shape of my data right now is (928, 108, 108, 3)
How do I fix this? Is there any other method to get feature vector of a video ? I really appreciate the response !
Hello, thank you for posting the code and data for the TWFinch paper.
The code seems to be missing an option to run the FS "Eval" dataset.
I've made a logical change to your code (below) to load this dataset, but am unable to reproduce the accuracy in the paper, which was reported as MoF= 71.1%
.
The following change to TW-FINCH/util_fns/read_video.m
produces an accuracy of MoF:= 66.7%
:
elseif strcmp(Dataset, 'FS')
map=readtable(fullfile(mapping_path, 'mappingeval.txt'));
map2=table([1:numel(map.Var2)]', 'RowNames', map.Var2);
gt_label_str=table2cell(readtable(fullfile(gt_path, vid_name), 'Delimiter', '#', 'ReadVariableNames',false));
gt_label_frame=table2array(map2(gt_label_str,1));
I would appreciate any guidance on what might be wrong. Thank you.
Got different results for different trials in my experiments ...
hello,thanks for your work!I'm sorry but this problem has been bothering me for a long time.For TW-FINCH,do the frame-wise features can only be extracted by iDT(your paper mentioned),or it can also be extracted by other CNN methods such as I3D?Will the methods affect the clustering results?
I was wondering how to estimate number of clusters using FINCH after reading your paper, your method seems can always get the correct number of clusters, e.g., in Table 2.
For YTI dataset, I have read CTE, but I can't reproduce the code to remove background frames. The replicated f1 is much lower, and the
mof is close.
I would appreciate any guidance on how to do this in YTI dataset. Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.