Giter Site home page Giter Site logo

Comments (20)

Moonire avatar Moonire commented on May 25, 2024 3

I have pushed a solution and tested it on @hw449 graph and it works like a charme. simply execute the following line and it should be okay.

clusters = mc.get_clusters(mc.run_mcl(matrix, inflation=2), keep_overlap=False)

from markov_clustering.

GuyAllard avatar GuyAllard commented on May 25, 2024 1

The algorithm can generate situations where a node belongs to multiple clusters. See the description by the original algorithm author here (specifically section 9.3).

I need to check if I am handling these cases correctly.

It would be really helpful to have access to the graph that you are performing the clustering on. Could you try to export the graph using

write_weighted_edgelist(G, path_to_file)

and then zip up the file. Maybe that would be small enough to share?

from markov_clustering.

hw449 avatar hw449 commented on May 25, 2024 1

Soft clustering sometimes causes problems in downstream analysis when hard clustering is required. It would be better if we can choose between soft/hard clustering. Thanks.

from markov_clustering.

hw449 avatar hw449 commented on May 25, 2024 1

I sent the graph to your gmail. Thanks.

from markov_clustering.

GuyAllard avatar GuyAllard commented on May 25, 2024 1

Moonire - if you have time to address this, it would be a big help!

from markov_clustering.

Moonire avatar Moonire commented on May 25, 2024

I am not aware that Markov Clustering can even theoretically do a soft clustering. Can you share your Adjacency matrix so we can test it ?

from markov_clustering.

hw449 avatar hw449 commented on May 25, 2024

Thanks. The file containing edge information is too huge (713Mb) to share with you. Here is my code. It is very short and simple. Could you please help me check whether there are some problems? I don't have this problem last year.

import pandas as pd
import markov_clustering as mc
import networkx as nx
import numpy as np

load edge information from a csv file (qgeneid, sgeneid, and sbitscore are two nodes and a weight, respectively)

data=pd.read_csv('BLAST_results/allv3_to_allv3_reduced',sep='\t')
edges_with_weights=[(data['qgeneid'][i],data['sgeneid'][i],data["sbitscore"][i]) for i in range(len(data))]

clustering

G=nx.Graph()
G.add_weighted_edges_from(edges_with_weights)
matrix=nx.to_scipy_sparse_matrix(G)
clusters=mc.get_clusters(mc.run_mcl(matrix,inflation=1.1))

write results to a csv file

nodes=list(G.nodes())
gene_families=[]
for i,tup in enumerate(clusters):
for item in tup:
gene_families.append([i,nodes[item]])
gene_families=pd.DataFrame(gene_families,columns=['family_id','gene_id'])
gene_families.to_csv('gene_families.csv')

from markov_clustering.

hw449 avatar hw449 commented on May 25, 2024

Sure. It is now 33.4Mb! How can I share with you?

from markov_clustering.

hw449 avatar hw449 commented on May 25, 2024

I observed an increase of the number of nodes belonging to multiple clusters when inflation was set to a higher value.

from markov_clustering.

Moonire avatar Moonire commented on May 25, 2024

In case the algorithm is actually assigning a node to multiple clusters, do we consider it as a bad behavior and fix it ? cause if its something mcl does naturally why modify it ?

Also I'll take a look at the author's description and see what I can gain from it.

from markov_clustering.

Moonire avatar Moonire commented on May 25, 2024

Ok, so from what I've read it's quite easy to fix. The orignal author simply assigned the node to the 1st cluster it appeared in and only to that cluster plus it gave you a warning that it was the case. An option to keep the overlap should be added too. I think I'll handle this one if @GuyAllard is ok with that ans submit the pr asap.

from markov_clustering.

Moonire avatar Moonire commented on May 25, 2024

@hw449 can you email me your graph? I think I have fixed the issu and would like to test it.

from markov_clustering.

hw449 avatar hw449 commented on May 25, 2024

Thanks! So how can I install this newest version? Still using "pip install"?

from markov_clustering.

Moonire avatar Moonire commented on May 25, 2024

For that you'd have to wait until @GuyAllard merges my pull request, until then you can download my repo and replace the mcl.py file in python\Libs\sitepackage\markov_clustering by mine, it will do the trick.

from markov_clustering.

hw449 avatar hw449 commented on May 25, 2024

I run your latest mcl.py just now and saw the warning "to unable soft clustering set keep_overlap to True".
I was confused, since I think soft clustering refers to a situation where different clusters can overlap with each other (i.e. share common nodes). Based on my understanding, if I want to unable soft clustering, I should set "keep_overlap" to False, rather than to True.

from markov_clustering.

Moonire avatar Moonire commented on May 25, 2024

That was poor English on my account ! i'll correct it.

from markov_clustering.

dhrubajyotiborah avatar dhrubajyotiborah commented on May 25, 2024

I tried to use the following code not to perform soft clustering:
clusters = mc.get_clusters(mc.run_mcl(matrix, inflation=2), keep_overlap=False)
However, I have got the following error:
TypeError: get_clusters() got an unexpected keyword argument 'keep_overlap'
Please advice me to solve this problem.

from markov_clustering.

lcd522 avatar lcd522 commented on May 25, 2024

same error! any suggestion?

from markov_clustering.

codykingham avatar codykingham commented on May 25, 2024

@lcd522 You have to patch the file yourself until the authors push the fix

from markov_clustering.

xelleze avatar xelleze commented on May 25, 2024

I tried but this is what I also get "get_clusters() got an unexpected keyword argument 'keep_overlap"

from markov_clustering.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.