Comments (20)
I have pushed a solution and tested it on @hw449 graph and it works like a charme. simply execute the following line and it should be okay.
clusters = mc.get_clusters(mc.run_mcl(matrix, inflation=2), keep_overlap=False)
from markov_clustering.
The algorithm can generate situations where a node belongs to multiple clusters. See the description by the original algorithm author here (specifically section 9.3).
I need to check if I am handling these cases correctly.
It would be really helpful to have access to the graph that you are performing the clustering on. Could you try to export the graph using
write_weighted_edgelist(G, path_to_file)
and then zip up the file. Maybe that would be small enough to share?
from markov_clustering.
Soft clustering sometimes causes problems in downstream analysis when hard clustering is required. It would be better if we can choose between soft/hard clustering. Thanks.
from markov_clustering.
I sent the graph to your gmail. Thanks.
from markov_clustering.
Moonire - if you have time to address this, it would be a big help!
from markov_clustering.
I am not aware that Markov Clustering can even theoretically do a soft clustering. Can you share your Adjacency matrix so we can test it ?
from markov_clustering.
Thanks. The file containing edge information is too huge (713Mb) to share with you. Here is my code. It is very short and simple. Could you please help me check whether there are some problems? I don't have this problem last year.
import pandas as pd
import markov_clustering as mc
import networkx as nx
import numpy as np
load edge information from a csv file (qgeneid, sgeneid, and sbitscore are two nodes and a weight, respectively)
data=pd.read_csv('BLAST_results/allv3_to_allv3_reduced',sep='\t')
edges_with_weights=[(data['qgeneid'][i],data['sgeneid'][i],data["sbitscore"][i]) for i in range(len(data))]
clustering
G=nx.Graph()
G.add_weighted_edges_from(edges_with_weights)
matrix=nx.to_scipy_sparse_matrix(G)
clusters=mc.get_clusters(mc.run_mcl(matrix,inflation=1.1))
write results to a csv file
nodes=list(G.nodes())
gene_families=[]
for i,tup in enumerate(clusters):
for item in tup:
gene_families.append([i,nodes[item]])
gene_families=pd.DataFrame(gene_families,columns=['family_id','gene_id'])
gene_families.to_csv('gene_families.csv')
from markov_clustering.
Sure. It is now 33.4Mb! How can I share with you?
from markov_clustering.
I observed an increase of the number of nodes belonging to multiple clusters when inflation was set to a higher value.
from markov_clustering.
In case the algorithm is actually assigning a node to multiple clusters, do we consider it as a bad behavior and fix it ? cause if its something mcl does naturally why modify it ?
Also I'll take a look at the author's description and see what I can gain from it.
from markov_clustering.
Ok, so from what I've read it's quite easy to fix. The orignal author simply assigned the node to the 1st cluster it appeared in and only to that cluster plus it gave you a warning that it was the case. An option to keep the overlap should be added too. I think I'll handle this one if @GuyAllard is ok with that ans submit the pr asap.
from markov_clustering.
@hw449 can you email me your graph? I think I have fixed the issu and would like to test it.
from markov_clustering.
Thanks! So how can I install this newest version? Still using "pip install"?
from markov_clustering.
For that you'd have to wait until @GuyAllard merges my pull request, until then you can download my repo and replace the mcl.py file in python\Libs\sitepackage\markov_clustering
by mine, it will do the trick.
from markov_clustering.
I run your latest mcl.py just now and saw the warning "to unable soft clustering set keep_overlap to True".
I was confused, since I think soft clustering refers to a situation where different clusters can overlap with each other (i.e. share common nodes). Based on my understanding, if I want to unable soft clustering, I should set "keep_overlap" to False, rather than to True.
from markov_clustering.
That was poor English on my account ! i'll correct it.
from markov_clustering.
I tried to use the following code not to perform soft clustering:
clusters = mc.get_clusters(mc.run_mcl(matrix, inflation=2), keep_overlap=False)
However, I have got the following error:
TypeError: get_clusters() got an unexpected keyword argument 'keep_overlap'
Please advice me to solve this problem.
from markov_clustering.
same error! any suggestion?
from markov_clustering.
@lcd522 You have to patch the file yourself until the authors push the fix
from markov_clustering.
I tried but this is what I also get "get_clusters() got an unexpected keyword argument 'keep_overlap"
from markov_clustering.
Related Issues (20)
- Network Graph from SciPy Hierarchical Clustering HOT 1
- markov_clustering not works with scipy 1.3.0 HOT 2
- What does the format of input data look like?
- ValueError: shape mismatch in assignment. HOT 3
- How to find a new sample belongs to which cluster?
- Shape mismatch in assignment
- conda-forge support HOT 3
- markov_clustering, possible error in instructions for computing of modularity Q HOT 3
- Hyperparameter Tunning
- markov_cluster.modularity errors when 1st arg is an np.ndarray
- module 'markov_clustering' has no attribute 'draw_graph' HOT 1
- RuntimeError: nnz of the result is too large
- How should I cite this package?
- Draw edge Labels
- Draw multiple graphs on different figures without closing the first ones
- Error when using inflation argument with a value different than pre-defined HOT 1
- Overlapping clusters HOT 1
- no matches found: markov_clustering[drawing]
- Two issues: "nx.to_scipy_sparse_matrix" not exist and the inflation hyperparameter not working?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from markov_clustering.