Comments (3)
It's a question of just the amount of additional work and how efficiently it can be done during the walks - you need to resample at each step of each random walk in Node2Vec walks and this also breaks cache locality. It's easy to make normal random walks fast in a CSRGraph (all the choices fit in the CPU cache).
Theres been some research done in making it faster through rejection sampling: https://louisabraham.github.io/articles/node2vec-sampling.html
Which weve been looking into merging into CSRGraphs: VHRanger/CSRGraph#14
That said, as noted in the README, I encourage you to try other algorithms (ProNE, GGVec) before spending a lot of resources gridsearching p and q on Node2Vec. As mentionned in this blog post, you need to gridsearch p & q a lot on Node2Vec for it to show a difference and this is time better spent gridsearching other parameters (the w2vparams for instance) with more efficient methods.
from nodevectors.
That said, as noted in the README, I encourage you to try other algorithms (ProNE, GGVec) before spending a lot of resources gridsearching p and q on Node2Vec. As mentionned in this blog post, you need to gridsearch p & q a lot on Node2Vec for it to show a difference and this is time better spent gridsearching other parameters (the w2vparams for instance) with more efficient methods.
Just read it, great article. Although at this point I'm mostly stuck with Node2Vec because I would have to rewrite a major part of my thesis to use another algorithm.
FYI, since writing here I've tried using Node2Vec's actual reference implementation in C++ (https://github.com/snap-stanford/snap/tree/master/examples/node2vec), and the walks are ridiculously fast to generate (a few seconds for my network for any combination of parameters). I don't think any of the python implementations leverage the markov property the way they describe in the paper (i.e. that you can generate partial walks for many nodes at once because the next node only depends on the current node and the previous one). Anyhow, thanks for answering - feel free to close this issue, I'm now using the c++ implementation so my problem is solved :-)
from nodevectors.
Thanks, since it's tracked in the rejective sampling issue we'll track progress on faster (p or q) != 1 walks there
from nodevectors.
Related Issues (20)
- Embedding a VERY LARGE graph, upcoming? HOT 2
- When saving large graph, creating a temporary folder will cause the system disk resources to be exhausted. HOT 1
- Issue with gensim 4.0.0+ HOT 3
- is it possible to split n2v to generate walks only? HOT 4
- node2vec uses CBOW instead of skip-gram HOT 4
- Setting value of seed to make Node2vec embedding repeatable. HOT 1
- Print training progression (node2vec)? HOT 1
- Continue fitting process HOT 2
- Has node2vec implementation been updated to use skip-gram as default? HOT 3
- About painting HOT 1
- defining random state or seed option parameters HOT 3
- word2vec parameters changed HOT 3
- Problem with underlying Word2vec HOT 1
- G.mat got an asymmetric sparse matrix
- ProNE option: "inconsistent shapes" error
- Node2Vec:About the return_weight and neighbor_weight
- ProNE multithread HOT 1
- NetworkX 3.0 remove adj_matrix in version HOT 1
- Old parameter shows up in Word2Vec call
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nodevectors.