muhanzhang / seal Goto Github PK
View Code? Open in Web Editor NEWSEAL (learning from Subgraphs, Embeddings, and Attributes for Link prediction). "M. Zhang, Y. Chen, Link Prediction Based on Graph Neural Networks, NeurIPS 2018 spotlight".
SEAL (learning from Subgraphs, Embeddings, and Attributes for Link prediction). "M. Zhang, Y. Chen, Link Prediction Based on Graph Neural Networks, NeurIPS 2018 spotlight".
Hello,
if I have new nodes on my graph, ie, no observed links for it, and I want tp predict the possible links for these new nodes, how to process ?
hello
when i run the main.py i have error like this:
pythonw.exe - Bad Image
E: SEAL-master pytorch DGCNN\lib build\dilibgnn.so is either not designed to run on Windows or it contains an error. Try installing the program again using the original installation media or contact your system administrator or the software vendor for support. Error status Oxc000012f.
what should i do?
Hi, Mr.Zhang, Can I speak Chinese? My English is too bad.
非常感谢您的代码,他对我很有用。我下载之后使用Python版本,想跑一下,但是这里一直有问题,util_function里面的form util import GNNGraph,我试图安装第三方包,但是一直没有成功,想问一下util这个文件是您自己实现的吗?如果是的话,可能您没给出来麽?
Hi,
I had a couple questions about the methods:
1). The GNN training is for each subgraph extraction, yes? If so, how are the GNNs related to one another if the parameterization will be different for each GNN?
2). How exactly is the edge information encoded into the learning? To my understanding, you would be taking two nodes, producing the feature matrix for the neighbors in the enclosing subgraph, apply the GNN, and produce a probability score. If the enclosing subgraph could be different sizes depending on the topology, how is this addressed?
Thanks!
Run Main.m using MATLAB, turns out this error:
Traceback (most recent call last):
File "../../software/node2vec/src/main.py", line 14, in <module>
import networkx as nx
ImportError: No module named networkx
Error using dlmread (line 62)
The file 'data/embedding/USAir_1.emd' could not be opened because: No such file
or directory
Error in generate_embeddings (line 59)
node_embeddings = dlmread(['data/embedding/', data_name, '.emd']);
Error in graph2mat (line 62)
node_embeddings = generate_embeddings(A1, data_name_i, emd_method);
Error in SEAL (line 66)
[data, max_size] = graph2mat([train_pos; train_neg], [test_pos; test_neg],
A, h, ith_experiment, 0, data_name, include_embedding, include_attribute);
Error in Main (line 65)
parfor (ith_experiment = 1:numOfExperiment, workers)
I think the problem is MATLAB does not detect the correct python environment. Can you please help me to solve this?
Hi, thanks for sharing your code and related paper . I tried training it on my own data and saved the model and then tried to predict link probability using the command below. Please could you throw some light on what could have caused the error below
python Main.py --data-name DATA --train-name DATA_train.txt --test-name DATA_test.txt --hop 1 --use-attribute --max-nodes-per-hop 50 --max-train-num 50000 --only-predict
====== begin of gnn configuration ======
| msg_average = 0
====== end of gnn configuration ======
Namespace(all_unknown_as_negative=False, batch_size=50, cuda=False, data_name='DATA', hop='1', max_nodes_per_hop='50', max_train_num=50000, no_cuda=False, no_parallel=False, only_predict=True, save_model=False, seed=1, test_name='DATA_test.txt', test_ratio=0.1, train_name='DATA_train.txt', use_attribute=True, use_embedding=False)
sampling negative links for train and test
Enclosing subgraph extraction begins...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [02:18<00:00, 2.18s/it]
Time eplased for subgraph extraction: 147.22653317451477s
Initializing DGCNN
Traceback (most recent call last):
File "Main.py", line 190, in
predictions.append(classifier(batch_graph)[0][:, 1].exp().cpu().detach())
File "/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in call_impl
result = self.forward(*input, **kwargs)
File "/Code/SEAL/Python/../../pytorch_DGCNN/main.py", line 119, in forward/Code/SEAL/Python/../../pytorch_DGCNN/main.py", line 90, in PrepareFeatureLabel
feature_label = self.PrepareFeatureLabel(batch_graph)
File "
node_tag.scatter(1, concat_tag, 1)
RuntimeError: index 38 is out of bounds for dimension 1 with size 33
Hi Dr. Zhang!
First of all - thank you for your great work. Your paper helped me to understand many concepts and inspired me a lot.
Do you occasionally have saved models and hyper parameters (.pkl and .pth files) for some of your included datasets? Particularly for the "Facebook"?
I am trying to train it myself but it seems like it is going to take very long time.
Have you conducted experiments without the DRNL algorithm? Did it make a difference to the results?
hi,
as your readme.md
said, "The node attributes are assumed to be saved in the group
of the .mat file."(In the Usage part), I look into the file and it's look like as following:
I want to know that
(131, 0) 1.0
thanks.
Hi Muhan,
I am using the datasets you provided in the repo, however I cannot find the Ecoli dataset in .txt version. Do you know where I could find it or I am looking somewhere wrong?
Thanks,
Jozef
I tried to run the node2vec code on Zachary's karate club network by executing !python3 src/main.py --input graph/karate.edgelist --output emb/karate.emd on google colab but I get this error :
Walk iteration:
1 / 10
2 / 10
3 / 10
4 / 10
5 / 10
6 / 10
7 / 10
8 / 10
9 / 10
10 / 10
Traceback (most recent call last):
File "src/main.py", line 104, in
main(args)
File "src/main.py", line 100, in main
learn_embeddings(walks)
File "src/main.py", line 87, in learn_embeddings
model = Word2Vec(walks, size=args.dimensions, window=args.window_size, min_count=0, sg=1, workers=args.workers, iter=args.iter)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 767, in init
fast_version=FAST_VERSION)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 759, in init
self.build_vocab(sentences=sentences, corpus_file=corpus_file, trim_rule=trim_rule)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 936, in build_vocab
sentences=sentences, corpus_file=corpus_file, progress_per=progress_per, trim_rule=trim_rule)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 1571, in scan_vocab
total_words, corpus_count = self._scan_vocab(sentences, progress_per, trim_rule)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 1556, in _scan_vocab
total_words += len(sentence)
TypeError: object of type 'map' has no len()
I want to reproduce the experiments in the paper Link Prediction Based on Graph Neural Networks. However, I cannot get the sota results recorded in the paper. Take USAir for example, I run python Main.py --data-name USAir --hop 'auto' --test-ratio 0.1 and get best accuracy at 0.92. Are there any special settings that I ignore? Thanks.
Hi muhan,
Thanks for sharing the code, I've run it by both python Main.py --data--name USAir --hop 'auto' --batch-size 1
and python Main.py --data--name USAir --hop 1 --batch-size 1
, the sampling of subgraph run smoothly, but it seems to suck after print Initializing DGCNN
.
The previous command has run over 2 days still without output file auc_results.txt
. And now the second command has run over 4 hours without printing a log or file.
I thought it was because the subgraph is very large due to 'auto -hop', but still suck after changing hop arg to 1 hop
does it correct, did u spend time on training and get auc results?
Could you tell me the math theory about this function? And the website you refer can't connect, I can't see the details
张博士,您好!使用python3.5执行SEAL遇到错误,NameError: name 'cmd_args' is not defined,不知如何解决!谢谢!
In your paper, I find some information are included in the supplementary material, but I found there is no such part in the paper. Where can I get it? Thank you.
A.eliminate_zeros() # make sure the links are masked when using the sparse matrix in scipy-1.3.x
AttributeError: 'numpy.ndarray' object has no attribute 'eliminate_zeros'
张博士,您好!使用python3.8执行SEAL遇到错误,我也把scipy包版本降下来了依旧报错,不知如何解决!谢谢!
Hey there Muhan,
I was wondering if SEAL worked on graphs with multiple types of nodes (i.e. heterogeneous networks). Take the graph below, for example:
If I would like to my GNN to predict only if there is a link between between green circles and blue triangles, would I just make sure that the central nodes x and y are blue triangles and green circles, or vice versa during the subgraph extraction process?
Thanks!
My English is not very good,please allow me to use Chinese.
张博士你好,我的本科毕设论文是基于图卷积神经网络的社交网络链路预测,想问一下您的SEAL算法能够符合我这个人题目吗?以及我是否可以不配置嵌入与节点属性来完成这个课题?
hello , main package in Ubuntu environment, download is unsuccessful。
Running setup.py install for main ... error
Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-fJikpq/main/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-pB5uyz-record/install-record.txt --single-version-externally-managed --compile --user --prefix=:
running install
running build
running build_scripts
creating build
creating build/scripts-2.7
error: file '/tmp/pip-build-fJikpq/main/main' does not exist
Great job of the paper and code!!! It is really impressive for me. BTW, would u mind also sharing the MF and SBM baselines code in Table 2 (publicly or privately)?
Hi, Dr. Zhang
I have a question regarding to the DGCNN hyper-parameter "cmd_args.feat_dim". It seems like this parameter is set to 16. May I ask what is this parameter is used for? I was trying to run test on my data, and I run into an error. In main.py line 87, the concat_tage contains values greater than 16, which it lead to invalid index error when the "node_tag.scatter_(1, concat_tag, 1)" is called. Is there any suggestion that can help me solve this problem?
Thanks.
当我尝试python Main.py --data-name NS --test-ratio 0.5 --hop 'auto' --use-embedding出现了这错误
以下是我的完整的错误报告(包括我自己添加的打印出A的结果)
====== begin of gnn configuration ======
| msg_average = 0
====== end of gnn configuration ======
Namespace(all_unknown_as_negative=False, batch_size=50, cuda=True, data_name='NS', hop='auto', max_nodes_per_hop=None, max_train_num=100000, no_cuda=False, no_parallel=False, only_predict=False, save_model=False, seed=1, test_name=None, test_ratio=0.5, train_name=None, use_attribute=False, use_embedding=True)
sampling negative links for train and test
Traceback (most recent call last):
File "Main.py", line 138, in
embeddings = generate_node2vec_embeddings(A, 128, True, train_neg)
File "/home/czw/SEAL/Python/util_functions.py", line 216, in generate_node2vec_embeddings
G = node2vec.Graph(nx_G, is_directed=False, p=1, q=1)
AttributeError: module 'node2vec' has no attribute 'Graph'
(base) czw@sy-NF5280M5:~/SEAL/Python$ python Main.py --data-name NS --test-ratio 0.5 --hop 'auto' --use-embedding
====== begin of gnn configuration ======
| msg_average = 0
====== end of gnn configuration ======
Namespace(all_unknown_as_negative=False, batch_size=50, cuda=True, data_name='NS', hop='auto', max_nodes_per_hop=None, max_train_num=100000, no_cuda=False, no_parallel=False, only_predict=False, save_model=False, seed=1, test_name=None, test_ratio=0.5, train_name=None, use_attribute=False, use_embedding=True)
sampling negative links for train and test
(1588, 0) 1.0
(4, 1) 1.0
(5, 1) 1.0
(5, 3) 1.0
(1, 4) 1.0
(5, 4) 1.0
(1, 5) 1.0
(3, 5) 1.0
(4, 5) 1.0
(7, 6) 1.0
(8, 6) 1.0
(10, 6) 1.0
(6, 7) 1.0
(6, 8) 1.0
(9, 8) 1.0
(10, 8) 1.0
(1423, 8) 1.0
(1531, 8) 1.0
(8, 9) 1.0
(10, 9) 1.0
(6, 10) 1.0
(8, 10) 1.0
(9, 10) 1.0
(12, 11) 1.0
(1047, 11) 1.0
: :
(336, 1571) 1.0
(630, 1571) 1.0
(1569, 1571) 1.0
(1570, 1571) 1.0
(1572, 1571) 1.0
(336, 1572) 1.0
(630, 1572) 1.0
(1571, 1572) 1.0
(782, 1573) 1.0
(1575, 1574) 1.0
(1577, 1574) 1.0
(1574, 1575) 1.0
(1576, 1575) 1.0
(1575, 1576) 1.0
(1577, 1576) 1.0
(1574, 1577) 1.0
(1576, 1577) 1.0
(1583, 1582) 1.0
(1582, 1583) 1.0
(1586, 1584) 1.0
(1584, 1586) 1.0
(75, 1587) 1.0
(521, 1587) 1.0
(0, 1588) 1.0
(1083, 1588) 1.0
Traceback (most recent call last):
File "Main.py", line 138, in
embeddings = generate_node2vec_embeddings(A, 128, True, train_neg)
File "/home/czw/SEAL/Python/util_functions.py", line 216, in generate_node2vec_embeddings
G = node2vec.Graph(nx_G, is_directed=False, p=1, q=1)
AttributeError: module 'node2vec' has no attribute 'Graph'
btw,我使用readme.md其他的类型的跑代码运行是可以顺利跑通的。
实际上我测试了python Main.py --train-name PB_train.txt --test-name PB_test.txt --hop 1这个,是可以正确运行的,包括我自己定义了对应的数据集也是可以正确取得结果的。但是我加上--use-embedding就会出现和这个一样的错误,最后我才直接测试了python Main.py --data-name NS --test-ratio 0.5 --hop 'auto' --use-embedding出现了这样的结果,打印出来的A的形式是一样的。不知道哪里出现了问题
Hi, this is a such an interesting paper!
I want to train SEAL - Python version - to a specific dataset, e.g. The Human Interactome. I saw that the data is expressed in .mat form composed by net and group, for explicit feature, and in raw form which is just a .txt file. But most of the interactome data is in form with the protein-protein which is an alphanumeric name. How have you done the mapping between node name <-> node number?
Running on USAir_1...
/home/RAAVAN/torch/install/bin/luajit: cannot open <tempdata/USAir_1/USAir_1.dat> in mode r at /tmp/luarocks_torch-scm-1-5020/torch7/lib/TH/THDiskFile.c:673
stack traceback:
[C]: at 0x7f0fb4144160
[C]: in function 'DiskFile'
/home/RAAVAN/.luarocks/share/lua/5.1/torch/File.lua:405: in function 'load'
../DGCNN/main.lua:331: in function 'load_data'
../DGCNN/main.lua:747: in main chunk
[C]: in function 'dofile'
...AVAN/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x004056e0
This error is coming from DCGNN
Hi, I found weighted WCN and WAA in your project, but I'm confusing about how 'alpha' parameter is assigned? Depends on cases? Can you please tell me anything I could do if I want to experiment on WCN and WAA? Thanks a lot
loss: 0.27501 acc: 0.87500: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚
鈻堚枅鈻堚枅鈻堚枅| 9/9 [00:00<00:00, 12.10batch/s]
请问我应该在哪里修改?
I saw that in your PB_train.txt and PB_test.txt, there are observed links for training, as well as "future" links for testing, right?
Do they share the same node set? meaning, if test file contains a node id that doesn't appear in the train file, it it ok? or all the node ids should be specified in train file first? (I know that, if there is a future link both nodes of which didn't appear in train file, it would not be predictable since it is too far away from the community. But I just wonder for an engineering-concern about coding. 😄 )
Another question, for the testing file, do you mean that, if a link is in the test file, it is TRUE, otherwise it is FALSE, when you compute the ACC ? (Meaning the test file shouldn't contain only a subset of test links )
Thank you very much!! 😸
Hi. great work. How can i have access to your predicted link results on those eight datasets? Any codes for that ? Currently, I got the auc and acc results stored in txt files.
Great work..
In util_functions. py the value of "h" gets selected between 1 and 2, on the basis of
if val_auc_AA >= val_auc_CN: h = 2 print('\033[91mChoose h=2\033[0m') else: h = 1 print('\033[91mChoose h=1\033[0m')
From what I understood from the paper, the value of h can remain small because of the gamma decaying property. Which is why,
Thank you.
Hi, when I try to run, I receive the OSError:
OSError: dlopen(.../pytorch_DGCNN/lib/build/dll/libgnn.so, 6): no suitable image found.
I've found online and can not solve this problem(even I use virtualenv), do you know what's wrong with it?
Actually, whatever python3 or python2.7 I use, as long as I run:
import ctypes
ctypes.CDLL(".../pytorch_DGCNN/lib/build/dll/libgnn.so")
I've got OSError:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ctypes/__init__.py", line 366, in __init__
self._handle = _dlopen(self._name, mode)
OSError: dlopen(.../pytorch_DGCNN/lib/build/dll/libgnn.so, 6): no suitable image found. Did find:
.../pytorch_DGCNN/lib/build/dll/libgnn.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00
.../pytorch_DGCNN/lib/build/dll/libgnn.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00
hi Mr. Muhan !
I am a student in master data scientist in Morocco , I a have a project in link prediction in Social network , and I have read some papers and I have implemented them, and now I am reading your paper but when I try to implemented in Google Colab, i get a problem because cpp doesn't run in google jupyter notebook how can run the SEAL in jupyter notebook ?
Thanks for your time and help!
Hi, I am trying to use the SEAL to figure out link prediction problem on multi-relational graphs. And I want to ask you two questions:
parser.add_argument('--use-attribute', action='store_true', default=False, help='whether to use node attributes')
Is the code above for adding node attributes in training process?
And For example, in my own dataset they are 1000 nodes, divided into 6 classes. I also want to use the class information for training the model.
How can I build the file (attribute.txt/attribute.mat)? What is the format of the file for adding node attributes?
You model can only predict whether two nodes have link or not, am I right? Is there any way to extend it to multi relational link prediction?
Thanks
Hi~
From your paper, we can know that the GNN takes (A, X) as input, I have a question from this:
In my opinion, the label of enclosing subgraph assigned by the method called Double-Radius Node Labeling is nodes' tag which equivalent to the nodes type in data mutag
or proteins
mentioned in the GNN, not the index of nodes, so how to determine the index of node in the enclosing subgraph and construct the another input A
?
Hello, run the main program to ask the following questions:
'rm' 不是内部或外部命令,也不是可运行的程序
或批处理文件。
Saving split-1 of 3...
错误使用 SEAL (line 90)
由于 'tempdata\USAir_1' 不存在,无法创建 'split_1.mat'。
出错 Main (line 83)
parfor (ith_experiment = 1:numOfExperiment, workers)
how to solve it, thank you
Hi there,
I just had a question about data leakage regarding SEAL. Say that there are two subgraphs where one is in the training set and the other is in the testing set. They happen to overlap a bit (i.e. share some of the same nodes and edges), but they are distinct subgraphs, and the link trying to be predicted is different between the two of them. If I am understanding correctly, this is not a data leakage issue because SEAL treats each subgraph as its own entity and does not care about the subgraphs position in the larger network, and DRNL is done on individual subgraphs. Thus, they happen to overlap in the larger network, but that has no implications on SEAL's performance, so it is okay for one of them to be in the testing set and the other to be in the training set. Sorry if this is a basic question; I just wanted to clarify my thinking.
Cheers!
Hello
Excuse me, when i run the main.py with USAir, the code running on my CPU
How can i run SEAL on GPU
I installed CUDA, torch and DGCNN too.
thanks
Hi Muhan,
Thank you for sharing this great code.
I am trying on large datasets like around 1800 nodes, 28000 links. However, the subgraph extraction process takes a lot of time (on USAir, it is pretty fast). Is this something anticipated? What is the complexity of this process? or if I am doing something wrong? Thank you!
Hi, Dr. Zhang.
I have a question regarding to the only_predict method. With the command "python Main.py --train-name PB_train.txt --test-name PB_test.txt --hop 1 --only-predict", does the model first build a network based on the PB_train.txt, then predict if the link pairs in PB_test.txt is valid in the PB_train network?
Thanks
Dear Dr. Zhang,
I am running on 12-core server with 220G memory and 4 GPUs. The data has 2511 nodes, 37154 edges and 9073 attributes, and the average degree is 29.59.
In SEAL, max-train-num is set to 10000 and args.max_nodes_per_hop is set to 100.
The algorithm runs successfully when hop = 1. However, in the case of hop =2, it throws memory error due to memory exhaustion.
Could you give me suggestions pls?
Thanks
Wei
In order to use this code, shell I download pytorch_DGCNN first? And the DGCNN is based on structure2vec, so could you please simplify the dependency and I believe DGCNN will become a benchmark on graph deep learning! Thanks a lot!
In the paper, the G assume to be undirected graph, if i want to apply the SEAL on directed network, where are the different points?
Thank you!
Hello, there is no label data set in your code data set. How did you get the label data set?
I am testing your USair dataset. I found I encounter several error concerning cuda issues? Are you working on linux or windows system. I am currently working on mac system, maybe this is cause for this error?
请问,对比实验中,node2vec是用到了那种edge feature?原文中有四种方式,但是原文中在link prediction实验中参数也没有说清楚。请问你用的哪种方式?参数又是如何设置的?还请赐教。
Hi, Dr. Zhang
The code you provided is very detailed so I can learn a lot, thank you! However, I still have some problems when reproducing the benchmarks. So, I hope you can give me some guidance.
The first problem, I reproduced the node2vec on "facebook" without changing anything, but the AUC is just about 0.5(in your paper, it is 0.99 ), I have tried many times and used diffferent "p" and "q", but the result had no improve, is there any detail I haven’t noticed? Meanwhile, the embedding method "SPC" could produce the some result as in the paper.
The second problem, when I used the dataset "arxiv", the code "cmd = sprintf('python ../../software/node2vec/src/main.py --input %s.edgelist --output %s.emd --p %f --q %f --dimensions 128 --window-size 10', data_name, data_name, p, q); system(cmd);" could run successfully but the result "data_name.emd" could not be saved in tempdata, but in other datasets, this problem could not happen, so what could be the reason?
By the way, I reproduced these experiments under windows10, using matlab2017a.
Thank you very much!
I get a new graph and i want to add some edges. I have already trained this network but how i can use this trained network to predict links in my graph. And see the result. I want to konw changes of my graph. Thank you so much.
如何在您的代码里面的最终结果中在添加一个指标Precision?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.