I initially arrived at this code via your blog post <a href="https://www.singlelunch.c

Why is generating walks so slow with non-default parameters? about nodevectors HOT 3 CLOSED

vhranger commented on July 19, 2024

Why is generating walks so slow with non-default parameters?

from nodevectors.

Comments (3)

VHRanger commented on July 19, 2024

It's a question of just the amount of additional work and how efficiently it can be done during the walks - you need to resample at each step of each random walk in Node2Vec walks and this also breaks cache locality. It's easy to make normal random walks fast in a CSRGraph (all the choices fit in the CPU cache).

Theres been some research done in making it faster through rejection sampling: https://louisabraham.github.io/articles/node2vec-sampling.html

Which weve been looking into merging into CSRGraphs: VHRanger/CSRGraph#14

That said, as noted in the README, I encourage you to try other algorithms (ProNE, GGVec) before spending a lot of resources gridsearching p and q on Node2Vec. As mentionned in this blog post, you need to gridsearch p & q a lot on Node2Vec for it to show a difference and this is time better spent gridsearching other parameters (the w2vparams for instance) with more efficient methods.

from nodevectors.

ldorigo commented on July 19, 2024

That said, as noted in the README, I encourage you to try other algorithms (ProNE, GGVec) before spending a lot of resources gridsearching p and q on Node2Vec. As mentionned in this blog post, you need to gridsearch p & q a lot on Node2Vec for it to show a difference and this is time better spent gridsearching other parameters (the w2vparams for instance) with more efficient methods.

Just read it, great article. Although at this point I'm mostly stuck with Node2Vec because I would have to rewrite a major part of my thesis to use another algorithm.

FYI, since writing here I've tried using Node2Vec's actual reference implementation in C++ (https://github.com/snap-stanford/snap/tree/master/examples/node2vec), and the walks are ridiculously fast to generate (a few seconds for my network for any combination of parameters). I don't think any of the python implementations leverage the markov property the way they describe in the paper (i.e. that you can generate partial walks for many nodes at once because the next node only depends on the current node and the previous one). Anyhow, thanks for answering - feel free to close this issue, I'm now using the c++ implementation so my problem is solved :-)

from nodevectors.

VHRanger commented on July 19, 2024

Thanks, since it's tracked in the rejective sampling issue we'll track progress on faster (p or q) != 1 walks there

from nodevectors.

Why is generating walks so slow with non-default parameters? about nodevectors HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent