Giter Site home page Giter Site logo

ggiecold-zz / cluster_ensembles Goto Github PK

View Code? Open in Web Editor NEW
69.0 69.0 43.0 5.26 MB

A package for combining multiple partitions into a consolidated clustering. The combinatorial optimization problem of obtaining such a consensus clustering is reformulated in terms of approximation algorithms for graph or hyper-graph partitioning.

License: MIT License

Python 4.01% CMake 0.46% C 93.85% Makefile 0.24% Objective-C 0.41% C++ 1.02% Batchfile 0.01%

cluster_ensembles's People

Contributors

aopisco avatar ggiecold avatar lefnire avatar msardelich avatar ralic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cluster_ensembles's Issues

Low adjusted Rand index of the result

Hello,
I ran cluster_ensembles algorithm on 3 outputs from 3 different clustering algorithms. The adjusted Rand index of these 3 outputs was 0.75, 0.75 and 0.78 when compared to true labels. The adjusted Rand index of the output from the cluster_ensembles was 0.18. The dataset has about 650 data points and 9 clusters.

I was wondering if you have any thoughts on why your algorithm didn't perform well. I expected adjusted Rand index of the result to be at least not worse than the individual inputs.

Let me know if you need additional information.

Helena

Pakage dependency

Got error:
ImportError: cannot import name 'jaccard_similarity_score' from 'sklearn.metrics' (/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/metrics/init.py)

In the last version of sklearn, this function 'jaccard_similarity_score' is renamed as 'jaccard_score'.

Error No such file or directory: 'wgraph_HGPA.part.16'

Hi,

I try to run the exemple (also to other datas), but I get always this error.

Do you have some idea how can I solve?


`cluster_runs = np.random.randint(0, 50, (50, 15000))

consensus_clustering_labels = CE.cluster_ensembles(cluster_runs, verbose = True, N_clusters_max = 50)

INFO: Cluster_Ensembles: cluster_ensembles: due to a rather large number of cells in your data-set, using only 'HyperGraph Partitioning Algorithm' (HGPA) and 'Meta-CLustering Algorithm' (MCLA) as ensemble consensus functions.


INFO: Cluster_Ensembles: HGPA: consensus clustering using HGPA.

INFO: Cluster_Ensembles: wgraph: writing wgraph_HGPA.
INFO: Cluster_Ensembles: wgraph: 15000 vertices and 2500 non-zero hyper-edges.

INFO: Cluster_Ensembles: sgraph: calling shmetis for hypergraph partitioning.
Traceback (most recent call last):

File "", line 1, in
consensus_clustering_labels = CE.cluster_ensembles(cluster_runs, verbose = True, N_clusters_max = 50)

File "/usr/local/lib/python2.7/dist-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 300, in cluster_ensembles
cluster_ensemble.append(consensus_functions[i](hdf5_file_name, cluster_runs, verbose, N_clusters_max))

File "/usr/local/lib/python2.7/dist-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 648, in HGPA
return hmetis(hdf5_file_name, N_clusters_max)

File "/usr/local/lib/python2.7/dist-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 973, in hmetis
labels = sgraph(N_clusters_max, file_name)

File "/usr/local/lib/python2.7/dist-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 1201, in sgraph
with open(out_name, 'r') as file:

IOError: [Errno 2] No such file or directory: 'wgraph_HGPA.part.50'`

memory() function is linux-specific

I can't speak for Windows, but as far as OS X goes, the only barrier to using this module is the 'memory' function, which looks to /proc/meminfo, which only exists in *nix systems. I've fixed this for myself by just hard-coding a very large number for my free memory, but there's probably a good cross-platform solution that could be implemented, if you were interested.

Perhaps a look into psutil (https://github.com/giampaolo/psutil) might be a good cross-platform solution?

Example:

psutil.phymem_usage()
usage(total=4153868288, used=2854199296, free=1299668992, percent=34.6)

Error running example

Hello,

Thanks for your contribution, I am trying to run it in python 2.7, anaconda distribution, following your example and I get the following error when running the cluster_ensembles command:

Traceback (most recent call last):
File "", line 1, in
File "/Users/me/anaconda/lib/python2.7/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 297, in cluster_ensembles
store_hypergraph_adjacency(hypergraph_adjacency, hdf5_file_name)
File "/Users/me/anaconda/lib/python2.7/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 190, in store_hypergraph_adjacency
FILTERS = get_compression_filter(byte_counts)
File "/Users/me/anaconda/lib/python2.7/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 138, in get_compression_filter
if 2 * byte_counts > 1000 * memory()['free']:
File "/Users/me/anaconda/lib/python2.7/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 63, in memory
with open('/proc/meminfo') as file:
IOError: [Errno 2] No such file or directory: '/proc/meminfo'

Do you have any idea what could be the problem?

Thanks

FileNotFoundError: [Errno 2] No such file or directory: 'wgraph_HGPA.part.16'

Hi!

I am having this issue when trying to run Cluster_Ensembles in a CentOs machine. I have already installed metis and apparently is running.

Do you know what can be giving the error?

INFO: Cluster_Ensembles: cluster_ensembles: due to a rather large number of cells in your data-set, using only 'HyperGraph Partitioning Algorithm' (HGPA) and 'Meta-CLustering Algorithm' (MCLA) as ensemble consensus functions.


*****
INFO: Cluster_Ensembles: HGPA: consensus clustering using HGPA.

#
INFO: Cluster_Ensembles: wgraph: writing wgraph_HGPA.
INFO: Cluster_Ensembles: wgraph: 239847 vertices and 119 non-zero hyper-edges.
#

#
INFO: Cluster_Ensembles: sgraph: calling shmetis for hypergraph partitioning.
Out of netind memory!
Traceback (most recent call last):
  File "cluster_ensemble.py", line 42, in <module>
    clusterlist = cooperative_cluster(data, feature_method)
  File "cluster_ensemble.py", line 22, in cooperative_cluster
    consensus_labels = CE.cluster_ensembles(cluster_runs, verbose = True, N_clusters_max = 16)
  File "/home/DeepLearning/Pyenv/ontoenv/lib/python3.6/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 309, in cluster_ensembles
    cluster_ensemble.append(consensus_functions[i](hdf5_file_name, cluster_runs, verbose, N_clusters_max))
  File "/home/DeepLearning/Pyenv/ontoenv/lib/python3.6/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 657, in HGPA
    return hmetis(hdf5_file_name, N_clusters_max)
  File "/home/DeepLearning/Pyenv/ontoenv/lib/python3.6/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 982, in hmetis
    labels = sgraph(N_clusters_max, file_name)
  File "/home/DeepLearning/Pyenv/ontoenv/lib/python3.6/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 1210, in sgraph
    with open(out_name, 'r') as file:
FileNotFoundError: [Errno 2] No such file or directory: 'wgraph_HGPA.part.16'

Parellel processing

Does current version allow python to call more than twice Cluster_Ensembles on the same machine? Thank you.

Set the K by user

Currently, there is an option to define the maximum K of the cluster number (N_clusters_max). Is it possible to define K by user instead of estimating K by the algorithm? Thank you.

Test usage error

Hi,

After following the installation instructions, I tried to launch a test with the following command :

>>> import numpy as np
>>> import Cluster_Ensembles as CE
>>> cluster_runs = np.random.randint(0, 50, (50, 15000))
>>> consensus_clustering_labels = CE.cluster_ensembles(cluster_runs, verbose = True, N_clusters_max = 50)

I got the following error :

>>> consensus_clustering_labels = CE.cluster_ensembles(cluster_runs, verbose = True, N_clusters_max = 50)

INFO: Cluster_Ensembles: cluster_ensembles: due to a rather large number of cells in your data-set, using only 'HyperGraph Partitioning Algorithm' (HGPA) and 'Meta-CLustering Algorithm' (MCLA) as ensemble consensus functions.


*****
INFO: Cluster_Ensembles: HGPA: consensus clustering using HGPA.

#
INFO: Cluster_Ensembles: wgraph: writing wgraph_HGPA.
INFO: Cluster_Ensembles: wgraph: 15000 vertices and 2500 non-zero hyper-edges.
#

#
INFO: Cluster_Ensembles: sgraph: calling shmetis for hypergraph partitioning.
/home/philippe/miniconda3/envs/concenssus/lib/python2.7/site-packages/Cluster_Ensembles/Hypergraph_Partitioning/hmetis-1.5-linux/shmetis: 1: /home/philippe/miniconda3/envs/concenssus/lib/python2.7/site-packages/Cluster_Ensembles/Hypergraph_Partitioning/hmetis-1.5-linux/shmetis: Syntax error: "(" unexpected
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/philippe/miniconda3/envs/concenssus/lib/python2.7/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 300, in cluster_ensembles
    cluster_ensemble.append(consensus_functions[i](hdf5_file_name, cluster_runs, verbose, N_clusters_max))
  File "/home/philippe/miniconda3/envs/concenssus/lib/python2.7/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 648, in HGPA
    return hmetis(hdf5_file_name, N_clusters_max)
  File "/home/philippe/miniconda3/envs/concenssus/lib/python2.7/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 973, in hmetis
    labels = sgraph(N_clusters_max, file_name)
  File "/home/philippe/miniconda3/envs/concenssus/lib/python2.7/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 1201, in sgraph
    with open(out_name, 'r') as file:
IOError: [Errno 2] No such file or directory: 'wgraph_HGPA.part.50'

Error running readme example

Hi,

I am getting the error:

IOError: [Errno 2] No such file or directory: 'wgraph_CSPA.part.50

Before I start debugging, do you have any idea what is causing this issue?

Thanks,
Marcelo

HGPA can not work

These days I tried to use Cluster_Ensembles package to do my paper. I want to get each labels results of the samples with CSPA,HGPA,MCLA. I try to do like this as an example:

import numpy as np
import Cluster_Ensembles as CE
import tables
cluster_runs = np.random.randint(0, 8, (15, 150))
hdf5_file_name = './Cluster_Ensembles.h5'
fileh = tables.open_file(hdf5_file_name, 'w')
fileh.create_group(fileh.root, 'consensus_group')
fileh.close()
hypergraph_adjacency = CE.build_hypergraph_adjacency(cluster_runs)
CE.store_hypergraph_adjacency(hypergraph_adjacency, hdf5_file_name)
consensus_clustering_labels = CE.HGPA(hdf5_file_name,cluster_runs, verbose = True, N_clusters_max = 8)

But there is an error:IOError: [Errno 2] No such file or directory: 'wgraph_HGPA.part.4'
But the MCLA and CSPA are available to get the result,only HGPA can't work.Can you help me?

Move to Python 3

Allow this code to be used in python 3.5 as well as python 2.7.

Currently when run, I receive

    294         function_names = ['CSPA', 'HGPA', 'MCLA']
    295 
--> 296     hypergraph_adjacency = build_hypergraph_adjacency(cluster_runs)
    297     store_hypergraph_adjacency(hypergraph_adjacency, hdf5_file_name)
    298 

~/venv/mlClass/lib/python3.5/site-packages/Cluster_Ensembles/Cluster_Ensembles.py in build_hypergraph_adjacency(cluster_runs)
    166     N_runs = cluster_runs.shape[0]
    167 
--> 168     hypergraph_adjacency = create_membership_matrix(cluster_runs[0])
    169     for i in xrange(1, N_runs):
    170         hypergraph_adjacency = scipy.sparse.vstack([hypergraph_adjacency,

~/venv/mlClass/lib/python3.5/site-packages/Cluster_Ensembles/Cluster_Ensembles.py in create_membership_matrix(cluster_run)
    890     cluster_run = np.asanyarray(cluster_run)
    891 
--> 892     if reduce(operator.mul, cluster_run.shape, 1) != max(cluster_run.shape):
    893         raise ValueError("\nERROR: Cluster_Ensembles: create_membership_matrix: "
    894                          "problem in dimensions of the cluster label vector "

NameError: name 'reduce' is not defined```

jaccard_similarity_score import error

Hi, when I update my scikit-learn and then using Cluster_Ensembles package, an error happened:
ImportError: cannot import name 'jaccard_similarity_score' from 'sklearn.metrics'

Now I solved this problem, I go to my C:\Users\china\Anaconda3\Lib\site-packages\Cluster_Ensembles folder, modify "Cluster_Ensembles.py" file's 37 line,
from: from sklearn.metrics import jaccard_similarity_score
to: from sklearn.metrics import jaccard_score
then it is solved, reference by DiamondLightSource/SuRVoS#103.

Maybe you can modify this sentence by scikit-learn version selection.

Usage of each cluster ensemble method

Hello!I want to get each labels results of the samples with CSPA,HGPA,MCLA. I try to do like this as an example:


import numpy as np
import Cluster_Ensembles as CE
cluster_runs = np.random.randint(0, 8, (11, 50))
CSPA_clustering_labels = CE.CSPA(hdf5_file_name,cluster_runs, verbose = True, N_clusters_max = 8)
print CSPA_clustering_labels

But I don't know how to set the first Parameter(hdf5_file_name),because I don't have a hdf5_file.Could you give me the answer of above example?I am really newer in python.
Thanks a lot.I'm looking forward to your help.

ImportError: DLL load failed while importing hdf5extension: The specified procedure could not be found.

Whenever I am writing
import Cluster_Ensembles as CE
giving error like
ImportError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_10820/1064937422.py in
----> 1 import Cluster_Ensembles as CE

D:\chaman\Cluster_Ensembles-master\src\Cluster_Ensembles_init_.py in
39
40
---> 41 from . Cluster_Ensembles import *
42
43

D:\chaman\Cluster_Ensembles-master\src\Cluster_Ensembles\Cluster_Ensembles.py in
53 import subprocess
54 import sys
---> 55 import tables
56 import warnings
57 import six

~\anaconda3\lib\site-packages\tables_init_.py in
60 # Import the user classes from the proper modules
61 from .exceptions import *
---> 62 from .file import File, open_file, copy_file
63 from .node import Node
64 from .group import Group

~\anaconda3\lib\site-packages\tables\file.py in
31 import numpy
32
---> 33 from . import hdf5extension
34 from . import utilsextension
35 from . import parameters

ImportError: DLL load failed while importing hdf5extension: The specified procedure could not be found.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.