vhranger / nodevectors Goto Github PK
View Code? Open in Web Editor NEWFastest network node embeddings in the west
License: MIT License
Fastest network node embeddings in the west
License: MIT License
with tempfile.TemporaryDirectory() as temp_dir:
joblib.dump(self, os.path.join(temp_dir, self.f_model), compress=True)
with open(os.path.join(temp_dir, self.f_mdata), 'w') as f:
json.dump(meta_data, f)
filename = shutil.make_archive(filename, 'zip', temp_dir)
It should be written directly to the destination.
nodevectors/nodevectors/node2vec.py
Line 132 in e98df00
This line refers to the old size
parameter in gensim Word2Vec. It looks like the parameter was renamed to vector_size
ref.
Getting this error:
129 # Train gensim word2vec model on random walks
--> 130 self.model = gensim.models.Word2Vec(
131 sentences=self.walks,
132 size=self.n_components,
133 **self.w2vparams)
134 if not self.keep_walks:
135 del self.walks
TypeError: __init__() got an unexpected keyword argument 'size'
When running with gensim==4.3.2
Hi,
After the fix of VHRanger/CSRGraph#3. I was successfully able to load my dataset in CSRGraph. But when I ran the following command, I get an error -
from nodevectors import Node2Vec
g2v = Node2Vec()
g2v.fit(G)
Error - ---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
in
3 # way faster than other node2vec implementations
4 # Graph edge weights are handled automatically
----> 5 g2v.fit(G)
~/SageMaker/CSRGraph/nodevectors/nodevectors/node2vec.py in fit(self, nxGraph)
93 node_names = list(nxGraph)
94 G = cg.csrgraph(nxGraph, threads=self.threads)
---> 95 if type(node_names[0]) not in [int, str, np.int32, np.uint32,
96 np.int64, np.uint64]:
97 raise ValueError("Graph node names must be int or str!")
IndexError: list index out of range
Ids in my datafile are int64 datatype. Interestingly when I run the following command. I can execute successfully.
from nodevectors import GGVec
ggvec_model = GGVec()
embeddings = ggvec_model.fit_transform(G)
Hi,
Thanks for the clarification to solve the Issue number 27. Now that works fine after I upgrade the csrgraph
to version 0.1.27
. Now the next issue is that I got Segmentation fault
while running node2vec. Is there any suggestion to fix this?
G = csr_matrix(G)
n2v_model = nodevectors.Node2Vec()
n2v_model.fit(G)
Segmentation fault
Do I need to update the nodevectors package as well after I update the csrgraph
? If so which version is needed?
Thanks in advance !
Hi,
Thanks for this great module. I have a large sparse csr graph of 10GB and I wanted to learn the node embedding using Node2Vec. However, I am keep getting this error:
TypeError: 'NoneType' object is not subscriptable
To reproduce this error in my machine here is my toy script:
from scipy.sparse import csr_matrix
import numpy as np
import nodevectors
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 1, 1, 1, 1, 1])
G = csr_matrix((data, (row, col)), shape=(3, 3))
n2v_model = nodevectors.Node2Vec()
n2v_model.fit(G)
Isn't it true that Node2Vec() module directly works with csr_matrix? I even tried the converting CSR matrix to CSRGraphs but stll get the same error. Any help would be great?
import csrgraph as cg
G = cg.csrgraph(G)
n2v_model = nodevectors.Node2Vec()
n2v_model.fit(G)
TypeError: 'NoneType' object is not subscriptable
I am getting this error when trying to run the unit tests.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-b3bd86b0b5de> in <module>()
1 # Fit embedding model to graph
2 g2v = Node2Vec()
----> 3 g2v.fit(G)
/home/dionysis/Documents/git_repos/graph2vec/graph2vec/graph.py in fit(self, nxGraph, verbose)
356 Whether to print output while working
357 """
--> 358 node_names = list(nxGraph.nodes)
359 if type(node_names[0]) not in [int, str, np.int32, np.int64]:
360 raise ValueError("Graph node names must be int or str!")
TypeError: 'method' object is not iterable
Do you think this line:
https://github.com/VHRanger/graph2vec/blob/8474f7ccf5d9b34d82fbf5ac16f04bcc37143cd6/graph2vec/graph.py#L358
Should change to this:
node_names = list(nxGraph.nodes())
Hello,
I just tryied to fit Node2Vec object and got error
129 # Train gensim word2vec model on random walks
130 self.model = gensim.models.Word2Vec(
131 sentences=self.walks,
132 size=self.n_components,
I found out some advise that curretly to use word2vec from gensim parametr must be named vector_size instead of size
Running nodevectors.Node2Vec.fit for the same nx_graph gives different embedding.
Hi!
In node2vec.py, you should modify the 'iter' parameter to 'epochs' and the 'size' parameter to 'vector_size'.
(And thank you for the library, I use it extensively in my research!)
I need the option to assign random state or seed values to get stable results. I don't think there is such an option.
Unfortunately, my attempts to fix the general seed that I have listed below did not solve the problem.
import random
random.seed(1)
from numpy.random import seed
seed(1)
What can be done about it? Do you have any advice?
thanks in advance
remove adj_matrix from linalg/graphmatrix.py (#5753)
Hi,
I an testing the code with blogcatelog datasets(download from the OpenNe a repository of github) with your work.
Additionally, I have compare the multi-class result with the code in OpenNe.
In my test, if I use 10 percent of data as training data, the result of your work is
{'micro': 0.25313039723661485, 'macro': 0.12076017464146425};
In the same time, the code of OpenNe has achieve
{'micro': 0.2903713298791019, 'macro': 0.1674684546080052};
I am very confused about this, because I find both of you use the gensim. I simplely think the problem occur in the node samples.
I haven't do a deeper job right now but your code is really inspired me that it could used in a huge number of nodes. A good spark in Graph process.
Best regards,
Tade
from nodevectors import Node2Vec
import networkx as nx
G = nx.Graph()
G.add_edge("1", "2")
n2v = Node2Vec(n_components=128)
n2v.fit_transform(G)
Output:
Making walks... Done, T=3.98
Mapping Walk Names... Done, T=0.07
Training W2V... WARNING: gensim word2vec version is unoptimizedTry version 3.6 if on windows, versions 3.7 and 3.8 have had issues
Done, T=0.39
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-6-43e45de9791e> in <module>
2 G.add_edge("1", "2")
3 n2v = Node2Vec(n_components=128)
----> 4 n2v.fit_transform(G)
~/miniconda3/envs/graphs/lib/python3.7/site-packages/nodevectors/node2vec.py in fit_transform(self, G)
151 pd.DataFrame.from_records(
152 pd.Series(np.arange(len(G.nodes)))
--> 153 .apply(self.predict)
154 .values)
155 )
~/miniconda3/envs/graphs/lib/python3.7/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
4106 else:
4107 values = self.astype(object)._values
-> 4108 mapped = lib.map_infer(values, f, convert=convert_dtype)
4109
4110 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
~/miniconda3/envs/graphs/lib/python3.7/site-packages/nodevectors/node2vec.py in predict(self, node_name)
166 if type(node_name) is not str:
167 node_name = str(node_name)
--> 168 return self.model.wv.__getitem__(node_name)
169
170 def save_vectors(self, out_file):
~/miniconda3/envs/graphs/lib/python3.7/site-packages/gensim/models/keyedvectors.py in __getitem__(self, entities)
351 if isinstance(entities, string_types):
352 # allow calls like trained_model['office'], as a shorthand for trained_model[['office']]
--> 353 return self.get_vector(entities)
354
355 return vstack([self.get_vector(entity) for entity in entities])
~/miniconda3/envs/graphs/lib/python3.7/site-packages/gensim/models/keyedvectors.py in get_vector(self, word)
469
470 def get_vector(self, word):
--> 471 return self.word_vec(word)
472
473 def words_closer_than(self, w1, w2):
~/miniconda3/envs/graphs/lib/python3.7/site-packages/gensim/models/keyedvectors.py in word_vec(self, word, use_norm)
466 return result
467 else:
--> 468 raise KeyError("word '%s' not in vocabulary" % word)
469
470 def get_vector(self, word):
KeyError: "word '0' not in vocabulary"
Fitting and then predicting works fine:
n2v.fit(G)
for node in G:
print(n2v.predict(node))
Output:
Making walks... Done, T=0.00
Mapping Walk Names... Done, T=0.06
Training W2V... WARNING: gensim word2vec version is unoptimizedTry version 3.6 if on windows, versions 3.7 and 3.8 have had issues
Done, T=0.38
[ 0.01669522 0.01119813 -0.00566072 -0.0134473 0.01121703 0.00379648
0.01170088 -0.0121789 -0.01429367 -0.00849178 0.00943886 -0.00981773
0.00337284 -0.0013884 -0.01287963 -0.00460479 -0.00217993 -0.01019352
0.00615602 -0.00658679 0.01679845 -0.00747446 0.0019177 -0.00912566
-0.01688758 0.00983168 0.00286994 0.00739604 0.01249113 0.00116864
0.00235101 -0.01515406 -0.00786685 -0.01675885 -0.01421799 -0.00829282
-0.00385966 -0.00779916 -0.00067812 0.01312324 0.0154448 -0.0107193
-0.00059914 -0.00439935 -0.01970238 -0.00585162 -0.01741348 -0.00118494
-0.01365886 -0.007099 0.00806013 -0.00448715 -0.00633816 -0.009869
0.01835089 0.01462685 0.00408294 0.01042183 0.00773886 0.00500051
0.00697436 -0.00052141 -0.00307364 0.00916708 -0.0059573 -0.00794462
0.00316458 -0.01120937 0.00820292 -0.00175512 -0.00426679 0.00403081
0.0036373 -0.00538955 0.00169757 -0.00476247 0.00011785 -0.00015604
-0.02005355 0.00293106 -0.00457922 0.01199162 -0.01039407 -0.00975906
-0.00386479 0.00380202 0.0150509 0.00117078 0.01009431 -0.01518334
-0.01550014 -0.00316153 -0.01638743 0.00911983 -0.00656796 -0.01130522
0.00696332 0.00222521 -0.01348531 0.01745371 -0.01043333 0.00377076
0.00168364 -0.01029514 -0.01187336 -0.00047892 0.01747731 0.01539742
-0.00317966 0.01036133 0.00348293 0.00357884 0.01691393 -0.01314759
-0.00387712 0.01349622 0.00886216 0.01269572 -0.014981 0.01047694
-0.01591979 0.00815849 0.0053769 -0.01705019 0.00478466 -0.00967307
0.00100743 -0.00627678]
[ 1.74459908e-02 9.29250382e-03 -5.62654436e-03 -1.58256646e-02
6.62352284e-03 -1.04596815e-03 7.46087125e-03 -1.52283600e-02
-1.47760203e-02 -4.99586575e-03 8.37715156e-03 -1.14215305e-02
8.03218782e-03 -4.57122130e-03 -1.37374401e-02 -6.70122309e-03
5.60258329e-03 -1.36625227e-02 2.69854977e-03 -2.01221928e-03
1.41100660e-02 -1.21530667e-02 7.38256099e-03 -7.29203923e-03
-1.45003749e-02 8.89602769e-03 -1.07536477e-03 1.66074419e-03
7.48369843e-03 8.18155764e-04 3.80413979e-03 -1.41491415e-02
-1.12004904e-03 -1.57257933e-02 -1.23076690e-02 -9.28518735e-03
-5.15399221e-03 -5.42826438e-03 9.19695070e-04 9.03129764e-03
1.57911442e-02 -5.36569115e-03 -1.36574614e-03 -2.82609137e-03
-1.89300030e-02 -5.67972986e-03 -1.65421404e-02 -3.22455773e-04
-1.18535999e-02 -7.90045224e-03 9.72144585e-03 -7.91174080e-03
-4.45207767e-03 -1.19799254e-02 1.93504207e-02 1.06750363e-02
4.26934101e-03 1.17199738e-02 6.25003641e-03 1.98470801e-03
4.88949660e-03 7.53012951e-04 -8.29974841e-03 6.85363356e-03
-2.72968784e-03 -5.58869634e-03 1.48452440e-04 -8.40961654e-03
3.35645187e-03 -3.52724968e-03 3.98239447e-03 -2.40911031e-03
4.06429684e-03 -3.92150227e-03 6.94983220e-03 -8.35845713e-03
9.88924527e-04 -1.79716619e-03 -1.90840866e-02 2.46768352e-03
-4.37452644e-03 1.30511560e-02 -6.40019309e-03 -1.33609995e-02
3.72520881e-04 5.42262476e-03 1.41993044e-02 7.35963322e-03
1.08134123e-02 -1.49347940e-02 -1.22990599e-02 -9.69778374e-03
-1.74602009e-02 8.74316972e-03 -5.31877764e-03 -7.91502465e-03
3.98375420e-03 4.59250668e-03 -1.26426788e-02 1.60577614e-02
-1.03733260e-02 4.70442930e-03 6.72380021e-03 -1.34339379e-02
-1.50517235e-02 3.45687894e-03 1.50700649e-02 1.58219878e-02
4.28991532e-03 9.33015719e-03 7.03065936e-03 3.41207208e-03
1.49237625e-02 -1.07398266e-02 -1.00340396e-02 9.12039913e-03
1.27081424e-02 1.08739929e-02 -1.16528282e-02 4.42440435e-03
-1.53663196e-02 3.64650693e-03 5.37529076e-03 -1.76296048e-02
3.67483153e-05 -7.88922701e-03 -5.40610822e-03 -1.80462585e-03]
Node2vec and DeepWalk original proposals are built upon the skip-gram model. By default, nodevectors does not set the parameter w2vparams["sg"]
to 1, therefore the underlying Word2Vec model uses the default value of 0, which means using CBOW instead of skip-gram. This has major consequences in the quality of the embeddings.
Unfortunately, graph2vec
has already been used in 2017 in a paper on representation learning for whole graphs (not nodes). Link: https://arxiv.org/abs/1707.05005, implementations at https://github.com/MLDroid/graph2vec_tf (author) and https://github.com/benedekrozemberczki/graph2vec (reimplementation)
Also, graph2vec has already been taken on PyPI by another project https://pypi.org/project/graph2vec, but I think you're aware of this.
I think the solution was to rename this to graph2vec-learn but I would encourage you pick a more informative name because this doesn't alleviate the original name conflict.
Either way, could you please update the name of this repo so the PyPI project matches the repo and folder inside the repo?
I am trying to load a 150MB edgelist in csr graph using the command G = cg.read_edgelist("samplelist.edgelist", sep="\t")
But I get the following error:
ValueError Traceback (most recent call last)
in
2 import nodevectors
3
----> 4 G = cg.read_edgelist("samplelist.edgelist", sep="\t")
~/anaconda3/envs/python3/lib/python3.6/site-packages/csrgraph/graph.py in read_edgelist(f, sep, header, **readcsvkwargs)
457 SRC: {elist.src.max()}, {elist.src.min()}
458 DST: {elist.dst.max()}, {elist.dst.min()}
--> 459 """)
460 elist.src = elist.src.astype(np.uint32)
461 elist.dst = elist.dst.astype(np.uint32)
ValueError:
Invalid uint32 value in node IDs. Max/min :
SRC: 8278237827, 15830
DST: 8237827382738273827382, 2111364
Hi,
Thanks for solving previous issue #19. However, now I am receiving segmentation fault error on running
from nodevectors import Node2Vec
g2v = Node2Vec()
g2v.fit(G)
Additionally, when I pip3 install CSRGraph and nodevectors, installation completes, but when I import them, I get No module found error.
There seems to be init.py missing in the evaluation folder which causes an error on import.
Additionally, umap is missing from the requirements.
Also, a small suggestion - when I ran into this issue today I tried installing the last version that worked for me (0.1.12), which is also broken since you don't specify package versions in your requirements. In this case your other package CSRGraph created a compatibility issue, so maybe you only need to specify the CSRGraph version since you're frequently updating it.
I really appreciate the work you've put into this package, when I was looking for a node2vec implementation many months ago yours was by far the cleanest and fastest. Thanks!
Awesome work! Unfortunately, when I load my bin file, I get the following error message:
ValueError: invalid vector on line 0 (is this really the text format?)
Any suggestions? There are spaces in the node names (e.g., 'Leonardo da Vinci').
I have trained and saved the model with
import csrgraph as cg
import nodevectors
G = cg.read_edgelist("edges.txt", directed=False, sep=' ')
ggvec_model = nodevectors.GGVec()
embeddings = ggvec_model.fit_transform(G)
ggvec_model.save("embeddings.emb")
Now I want to load and iterate over the embeddings but I'm unable to find any method that returns the nodes list.
import nodevectors
ggvec_model = nodevectors.GGVec()
ggvec_model.load("embeddings.emb.zip")
Hi! Thanks for your library!
I'm using it to vectorize network graph - graph of IPs communicated with each other. What do you think might be an approach when dealing with new previously unseen IPs (nodes)?
It seems like there are no other options than retrain n2v model from scratch.
In my case skipping them is not an option, and I don't see how I can use tricks from NLP like using synonymous to the unseen word.
I would be grateful for any thoughts or suggestions.
Cheers,
Alex
Hi, is there any way to monitor training progression? Even with verbose=True
, nothing gets printed out after "Mapping Walk Names... Done" (and if the training can be expected to take several hours, it's a bit annoying to have no idea if anything is actually happening).
Hello! Thanks for the great great work!
I encountered an issue while using nodevectors to train the prone embeddings:
I ran
G = cg.read_edgelist("..", directed=True, sep=',')
g2v = ProNE()
g2v.fit(G)
ValueError Traceback (most recent call last)
Input In [34], in <cell line: 2>()
1 g2v = ProNE()
----> 2 g2v.fit(G)
File ~/miniforge3/envs/alphaA/lib/python3.8/site-packages/nodevectors/prone.py:82, in ProNE.fit(self, graph)
78 G = cg.csrgraph(graph)
79 features_matrix = self.pre_factorization(G.mat,
80 self.n_components,
81 self.exponent)
---> 82 vectors = ProNE.chebyshev_gaussian(
83 G.mat, features_matrix, self.n_components,
84 step=self.step, mu=self.mu, theta=self.theta)
85 self.model = dict(zip(G.nodes(), vectors))
File ~/miniforge3/envs/alphaA/lib/python3.8/site-packages/nodevectors/prone.py:154, in ProNE.chebyshev_gaussian(G, a, n_components, step, mu, theta)
151 return a
152 print(G.shape)
--> 154 A = sparse.eye(nnodes) + G
155 DA = preprocessing.normalize(A, norm='l1')
156 # L is graph laplacian
File ~/miniforge3/envs/alphaA/lib/python3.8/site-packages/scipy/sparse/base.py:414, in spmatrix.add(self, other)
412 elif isspmatrix(other):
413 if other.shape != self.shape:
--> 414 raise ValueError("inconsistent shapes")
415 return self._add_sparse(other)
416 elif isdense(other):
ValueError: inconsistent shapes
I further check the error and it showed that the G.mat is an asymmetric sparse matrix with shape (830421x830420)
Could you please give me any clue on this?
Currently, n_components is set to 32 in all available algorithms like node2vec, GGVec etc. How can I increase to 128? I tried modifying the .py files of these algorithms to increase from 32 to 128. But it did not work. Once I set n_components=128 in .py files and imported package again, running algorithm still outputs vector that has 32 components.
I get an:
raise ValueError("inconsistent shapes")
from:
./nodevectors/prone.py line 61 in fit_transorm
./nodevectors/prone.py line 152, in chebyshev_gaussian
.../scipy/sparse/_base.py line 471, in add
the defaults work for small graph, ~ tens of thousands, but fail for 7M nodes and 50M edges graph
It appears one of the argument names has changed in the newly released version of GenSim. This has also caused some pain in other libraries using this package for node2vec implementations (e.g., krishnanlab/PecanPy#16)
Traceback (most recent call last):
File "embed_nodevectors.py", line 150, in <module>
main()
File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "embed_nodevectors.py", line 137, in main
model.fit(graph)
File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/nodevectors/node2vec.py", line 130, in fit
self.model = gensim.models.Word2Vec(
TypeError: __init__() got an unexpected keyword argument 'size'
Dear author,
I read the source code of the Node2vec, and found that the default value of return_weight and neighbor_weight is equal to 1, Isn't that deepwalk?
However, if I change the value of the return_weight and neighbor_weight, then the speed will be very slow,I want to customize the embedding of BFS and DFS, how to keep it fast?
import networkx as nx
from nodevectors import Node2Vec
# the edgelist file has 895608 lines
nx.read_weighted_edgelist('edgelist',create_using=nx.DiGraph)
g2v = Node2Vec(n_components=dimension,verbose=True)
g2v.fit(G)
Here is the error trace.
File "./lib/utils/twitter_data.py", line 410, in _learn_node2vec_nodevectors
g2v.fit(G)
File "./venv/lib64/python3.6/site-packages/nodevectors/node2vec.py", line 133, in fit
**self.w2vparams)
File "./venv/lib64/python3.6/site-packages/gensim/models/word2vec.py", line 591, in init
self.wv = Word2VecKeyedVectors(size)
File "./venv/lib64/python3.6/site-packages/gensim/models/keyedvectors.py", line 380, in init
super(WordEmbeddingsKeyedVectors, self).init(vector_size=vector_size)
File "./venv/lib64/python3.6/site-packages/gensim/models/keyedvectors.py", line 218, in init
self.vectors = zeros((0, vector_size), dtype=REAL)
TypeError: 'str' object cannot be interpreted as an integer
I am loading about 7MM edges in a graph object using networkx and then running
import nodevectors
ggvec_model = nodevectors.GGVec()
embeddings = ggvec_model.fit_transform(G)
After running for a few minutes jupyter notebook kernel dies. Is there any way forward in this scenario ?
Recently numba removed jitclass from it's module __init__
. This breaks the import of jitclass.
numba/numba@4976953
Nodevectors imports jitclass so the dependency needs to be pinned.
Pinning to 0.51.2 fixes the import of nodevectors.
The Node2Vec
class constructor sets the default value of w2vparams["batch_words"]
to 128. The default value in gensim's lib is 10000. According to their docs:
batch_words (int, optional) – Target size (in words) for batches of examples passed to worker threads (and thus cython routines).(Larger batches will be passed if individual texts are longer than 10000 words, but the standard cython code truncates to that maximum.)
I don't know what exactly it does behind the scenes, but using the current default value of 128 severely affects the training performance.
Line of code:
nodevectors/nodevectors/node2vec.py
Line 28 in 5acc519
如题..
Hello Mr. Matt Ranger,
I installed the nodevectors package on my Mac OS Sierra, I verified to have all the required Python packages available with 'pip list' and then tried to run the given short example as a filename.py file. Here the CL trace:
% python networkx-test.py
Making walks... Done, T=2.94
Mapping Walk Names... Done, T=0.08
Training W2V... Done, T=0.85
Traceback (most recent call last):
File "networkx-test.py", line 19, in
g2v = Node2vec.load('node2vec.pckl') # it gets blocked at this point.
NameError: name 'Node2vec' is not defined
...any hint/feedback/re-testing would be appreciated.
Thank you, BR
H.
Is there a way to pass parameters (e.g., epoch=100) in the command line?
For example:
g2v = Node2Vec()
g2v.fit(GCC, walklen=30, epochs=100 )
Thanks!
Just want to know if ProNE is multithreaded? Is there a way to control the number of threads like the implemented Node2Vec?
Very nice project. Here is a suggestion: Would be great to be able to call n2v.walks
and get a list of all generated random walks after running the fit()
. I think it should be an easy upgrade :)
Node2Vec accepts neighbor_weight
parameter, however docstring mentions it as explore_weight
parameter. Doc needs to be updated probably.
Hello,Could you share the Wikipedia 6M.png and 3d graph.png drawing code?
I initially arrived at this code via your blog post https://www.singlelunch.com/2019/08/01/700x-faster-node2vec-models-fastest-random-walks-on-a-graph/#note-3-692 - and indeed the speedup with default parameters (q=1,p=1) is impressive.
But as you also mention in the readme, much of that is lost when using non-default parameters. I have a network of 100k nodes and 1M edges, and the "default" walk generation takes 14 seconds, while trying different parameters takes well over 10 hours. Is there anything that can be done to improve speed for different values of p and q? much of the flexibility of Node2Vec comes form being able to capture local vs. global information by tuning the parameters, and even the Node2Vec paper shows that the best results are usually obtained with values for p and q that are different from 1.
Hello,
First, thanks for a great package. The performance boost compared to other implementations is pretty incredible.
One thing I don't see is support for using edge weights in the input graph. Is there a way to do this now, or are there plans to add this functionality?
All the best,
Chad
Hi VHRanger,
Thanks for your great works. I am trying to run node2vec with around 4 million nodes and more than 48 million edges. But I got this issue. Can you give me some advice to deal with this big graph?
sys:1: DtypeWarning: Columns (1) have mixed types. Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
File "node2vec/graph_builder/csr.py", line 26, in <module>
gr = CSRGraphNode2Vec()
File "node2vec/graph_builder/csr.py", line 9, in __init__
self.graph = cg.read_edgelist(file_path, directed=False, sep=',')
File "/data/quocpbc/anaconda3/lib/python3.8/site-packages/csrgraph/graph.py", line 523, in read_edgelist
G = methods._edgelist_to_graph(
File "/data/quocpbc/anaconda3/lib/python3.8/site-packages/csrgraph/methods.py", line 31, in _edgelist_to_graph
new_src[1:] = np.cumsum(np.bincount(src, minlength=nnodes))
ValueError: could not broadcast input array from shape (2147483649) into shape (4790294)
Hi,
Your package is great, but you should really put it on PyPi to make the installation easier.
Thanks for this great work.
I have a big graph of size 10 GB I use CSRGraphs to load the edgelist and compute the node embedding using node2vec. But, I got this problem while reading a graph. Here is the error I encountered for what I mean.
import csrgraph as cg
G = cg.read_edgelist("karate.txt",sep = "\t")
TypeError: sort_values() got an unexpected keyword argument 'ignore_index'
Any suggestion to fix this.
Thanks in advance.
Hi.
It would be great if nodevectors could support the word2vec's corpus_file parameter that allows for file-based fast training.
What do the devs think about that?
Hi! ,
I am using node2vec to generate walks on graphs which i then pass to a different gensim modified by another tool (ths is for alignment of temporal models) -
Given the speed of carrying out walks with nodevectors - is it possible to separate the walks from the .fit method (as in have an option to ONLY carry out the walks without fitting the model so that i can just then save the walks to take on to the next tool?
thanks!
Hi!,
Related to #40
I was wondering if node2vec now uses skip-gram by default (I cannot see it anywhere in the source code, but i am sure i am missing it!!)
If it hasn't, does the following line of code automatically set sg=1 if i add this?
n2v = Node2Vec(n_components=32, walklen=80, epochs=100, keep_walks=True, w2vparams={'sg':1})
n2v.fit(nx_graph)
I want to be sure this is correct, as when i set {'sg': 50}
(just a very silly example to invoke an error), no error is thrown - and so I wonder if w2vparams={'sg':1}
is actually selecting skip-gram instead of CBOW or if I am doing something incorrectly. Any advice (or the right way to do it) is appreciated :)
Secondly: instead of saving embeddings and then loading them as keyedvectors with word2vec - is there a way of converting the fitted object (n2v above) directly to a Word2Vec gensim object?
Thank you!
Hi,
Can I update node embeddings given and already trained model? I want to fit a model but then I want to update the network periodically and update the node embedding and not start from zero.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.