deepakn97 / relationprediction Goto Github PK
View Code? Open in Web Editor NEWACL 2019: Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs
ACL 2019: Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs
We use all your original settings on WN dataset, training epoches for conv is 200 as you set, here are the results of epoch 200 ( the results are nearly stable after ~ epoch 100):
hit10: 0.523
hit3: 0.403
hit1: 0.222
MRR: 0.332
mean rank: 2525
All the results are far lower compared with the results you claimed, even the closest hit10 is only at the level of CONVKB or CONVE. Can you provide your training details to reproduce the results on WN? The results is very similar to only using CONVKB which is the base model of your GAT, i wonder that if GAT really works in your model.
Hi, can you confirm the following claim in [1]?
KBAT incorporates ConvKB in the last layer of its model
architecture, which should be affected by different evaluation
protocols. But we find another bug on the leakage of test
triples during negative sampling in the reported model, which
results in more significant performance degradation.
[1] Z. Sun, S. Vashishth, S. Sanyal, P. Talukdar, Y. Yang, A Re-evaluation of Knowledge Graph Completion Methods, ArXiv:1911.03903 [Cs]. (2019). http://arxiv.org/abs/1911.03903 (accessed November 14, 2019).
Hi, I really appreciate your wonderful work on this problem, especially the huge improvement of FB15k-237. But when I followed the instruction in README.md to run the FB15k-237 model, I got a much lower MRR (which is about 0.43) than it is in the paper.
I wonder whether it is owing to some problems on my GPUs, so could you please provide the best performance model on FB15k-237? Many thanks.
Hi , I get this error when I run the program : CUDA out of memory.
Do you know how to solve this problem? My graphics card model is Nvidia 1080Ti. What type of graphics card are you using ?
Thanks for sharing code.
Where debuging the code, I find a question about the 'graph_create',here is what I understand:
the Graph is a dict<key,head_id of a triple:value,dict<key,tail_id:value,rel_id>>,when traverse the all_tiples
relationPrediction/create_batch.py
Line 233 in 785721b
relationPrediction/create_batch.py
Line 242 in 785721b
Hi, thank you for your code! I am trying to run it on FB15k-237. But it get stuck at "Graph created length of graph keys is 13781" for a long time. Is this normal or not? Besides, how long does it take to run on a single dataset? Thanks in advance!
I am attempting to rerun your results per the blogpost. I am unable to run STransE with the data files in this repo. I noticed that the number of relations between the datasets does not match:
# Seg fault with identically formatted train.txt, valid.txt, test.txt, relation2id.txt, entity2id.txt
$ ./STransE -model 1 -data ../relationPrediction/data/WN18RR/ -size 50 -margin 5 -l1 1 -lrate 0.0005
Model: STransE
Dataset: ../relationPrediction/data/WN18RR/
Number of epoches: 2000
Vector size: 50
Margin: 5
L1-norm: 1
SGD learing rate: 0.0005
#relations = 11
#entities = 40943
Segmentation fault (core dumped)
# Working
$ ./STransE -model 1 -data Datasets/WN18/ -size 50 -margin 5 -l1 1 -lrate 0.0005
Model: STransE
Dataset: Datasets/WN18/
Number of epoches: 2000
Vector size: 50
Margin: 5
L1-norm: 1
SGD learing rate: 0.0005
#relations = 18
#entities = 40943
Optimize entity vectors, relation vectors and relation matrices:
# ...
Could you please share how you init the embeddings based on the data in this post? I would like to init embeddings for my own data and run this model on my own data.
Thank you!
Thank you very much for your work in this paper. I also tried to run your code, but there has been a problem of insufficient GPU capacity. I want to ask how much GPU capacity your code requires?I look forward to receiving your reply as soon as possible. Thank you
Hi,
I just wonder why your model have to go throw layer SpecialSpmmFunctionFinal and what is the intent of this layer
The layer forward is :
class SpecialSpmmFunctionFinal(torch.autograd.Function):
"""Special function for only sparse region backpropataion layer."""
@staticmethod
def forward(ctx, edge, edge_w, N, E, out_features):
# assert indices.requires_grad == False
a = torch.sparse_coo_tensor(
edge, edge_w, torch.Size([N, N, out_features]))
b = torch.sparse.sum(a, dim=1)
ctx.N = b.shape[0]
ctx.outfeat = b.shape[1]
ctx.E = E
ctx.indices = a._indices()[0, :]
return b.to_dense()
I debuging on your default parameter with WN18k dataset. In first epoch or fisrt batch of dataset, I have this shape :
Input :
N : 40943 : number of entity of dataset
E : 294211 : number of concat of head, tail and 2hop_head, tail
edge : (2, 294211) : is present for <head_id, tail_id, and 2_hop_head_id, 2_hop_tail_id>
edge_w : (294211, 1) : is present for weighed of training in GAT layer
Output :
e_rowsum : (40943, 1) is present for : .... ??? .....
As I now, It just sum all feature of training entity into a vector embed, but I don't know why your model have to go throw this layer, Can you explain the intent of Layer SpecialSpmmFunctionFinal ?
Thanks @deepakn97
Hello, authors.
I have learnt a lot from your code, while I have a question about the bfs
function described as follows:
I know the code of create_batch.py lines#259-276
is to achieve breadth-first search and it is effective, but I have a doubt about whether lines#271-273
does not work or not because line#266
q.put((target, graph[top[0]][target]))
have put the child node to queue, such that the time of constructing 2-hop neighbors is so long (as you mentioned in previous issues, about 45 minutes), so in my opinion whether we could move line#266
to the next line of if distance[target] > 2: continue
to make it work.
Thanks.
I want to test your model, say trained on WN, on a different dataset.
Is there a simple way to do this? I'm looking at your code but it seems like it requires a lot of edits.
Thank you!
Hi, can you open the code about how to draw the heatmap about the attention matrix?
Thank you!
Can you explain to me the meaning of this auxiliary edge (preferably with an example), I want to apply it, but I still can't understand its specific meaning.
The project has not supported DataParallel and multi GPUs yet. I tried to add the codes myself but found several bugs while running the new version.
edge = torch.cat((edge[:, :], edge_list_nhop[:, :]), dim=1)
. The shape of parameter 'edge' is (2, n) in the original code. While with DataParallel, the shape of 'edge' becomes (1, n) after a scatter function in DataParallel. Then this line causes Error because the shape of edge and edge_list_nhop in dimmension 1 is different (1 vs 2). The solution to fix it is to transpose the Corpus.train_adj_matrix[0] when initializing Corpus, and then in forward function of class SpKBGATModified, transpose parameter 'adj' again.edge_m = self.a.mm(edge_h)
. However, I have no idea how to fix this error. Hope someone to provide solution.is this code`decoder using convE ? is the code on https://github.com/svjan5/kg-reeval is using conV or conKB? this code i can get the result, but the code in kg-reecal test result are all 0!
Thank you for making this work publicly available.
One request: could you please give the hyperparameters for TransE that you used (embedding vector size, negative ratio and margin at least), especially for NELL-995 as I believe your work is perhaps the only that evaluates TransE on this dataset?
In the paper or code I could only find the best hyperparameters for your model, but I am working on a comparison-type study and this other information would be of extreme help, thank you in advance!
Thanks for sharing your wonderful work. In your paper, you said that you would show the optimal hyper-parameters set for each dataset in your supplementary section. But I didn't find them and I need them to improve the experimental performance. I would appreciate it if you can provide them soon. Looking forward to your reply.
Thank you for your codes! I would like to ask you whether the negative samples are really filtered during the experiment. The negative sampling formula in the paper does not reflect filtering, but the code seems to implement filtering.
headTailSelector = {}
for i in range(len(relation2id)):
headTailSelector[i] = 1000 * right_entity_avg[i] / (right_entity_avg[i] + left_entity_avg[i])
I hope you can reply me as soon as possible.Thank you very much!
Did you use all 2-hop neighbors or partial 2-hop neighbors
Hi,
I've noticed an issue in this work as follows: In module create_batch.py
the negative samples are not allowed to be taken from test data. When valid_triples_dict is defined as:
self.valid_triples_dict = {j: i for i, j in enumerate(
self.train_triples + self.validation_triples + self.test_triples)}
and later used in:
while (random_entities[current_index], self.batch_indices[last_index + current_index, 1],
self.batch_indices[last_index + current_index, 2]) in self.valid_triples_dict.keys():
The model is taught during training that the test data are valid triples. We are not allowed to use test data during training. This will make the performance goes higher and higher with more epochs without any good reasons.
What does that mean by setting -leakrelu and rowsum[rowwum=0]=1e-12 ?
powers = -self.leakyrelu(self.a_2.mm(edge_m).squeeze())
e_rowsum = self.special_spmm_final(
edge, edge_e, N, edge_e.shape[0], 1)
e_rowsum[e_rowsum == 0.0] = 1e-12
I just can't understand why there is a negative sign of leakrelu
and 1e-12
for e_rowsum
.
Looking for forward to your reply. @deepakn97
我想对自建的图谱使用这个模型进行训练,但是我的数据集只有train.txt这样的,relation2vec.txt是transE生成的嘛,我应该怎么生成我自己的relation2vec.txt文件
Hi, thanks for your amazing work. I'm trying to retrieve node embedding from GAT and CONV checkpoints. What the right way to do that?
I'm trying something like this:
def get_embedding(args, unique_entities):
model_conv = SpKBGATConvOnly(entity_embeddings, relation_embeddings, args.entity_out_dim, args.entity_out_dim,
args.drop_GAT, args.drop_conv, args.alpha, args.alpha_conv,
args.nheads_GAT, args.out_channels)
model_conv.load_state_dict(torch.load(
'{0}conv/trained_{1}.pth'.format(args.output_folder, args.epochs_conv - 1)), strict=False)
model_conv.cuda()
model_conv.eval()
with torch.no_grad():
preds = model_conv(Corpus_, Corpus_.train_adj_matrix, unique_entities)
print(preds.size())
But I0m not sure about preds = model_conv(Corpus_, Corpus_.train_adj_matrix, unique_entities)
.
Thanks in advance
Hi,
Why we have to use node_neighbors_2hop and then just get index of [source, 2hop_relation, 1hop_relation, 1hop_entity] instead of entity_2hop ?
In your code node_neighbors_2hop will return a dictionary like this
neighbors[source][distance] = [(tuple(relations), tuple(entities[:-1]))]
and after that just get value of {source, relation_1hop, entity_1hop, relation_2hop} instead of entity_2hop in the following code
batch_source_triples.append([source, nhop_list[i][0][-1], nhop_list[i][0][0], nhop_list[i][1][0]])
As index was show bellow
[source, 2hop_relation, 1hop_relation, 1hop_entity]
0, 1 , 2 , 3
edge_list_nhop = train_indices_nhop [3, 0] # entity : source_entity, 1hop_entity
edge_type_nhop = train_indices_nhop [1, 2] # relation : 2hop_relation, 1hop_relation
Why we must add source to n-hop. Because as I know in your paper, we want to get neighborhood with n-hop to get more info about neighborhood and aggregation it to present more info about embedding.
mask_indices = torch.unique(batch_inputs[:, 2]).cuda()
mask = torch.zeros(self.entity_embeddings.shape[0]).cuda()
mask[mask_indices] = 1.0
I want to know what is the role of "mask", and it doesn't appear in paper.
Thanks for sharing your code. I am really impressed by this paper. I find that this encoder-decoder model works like:
train_gat(args) # train the (multi-hop neighbor...) GAT with a TransE loss, and save the embedding
train_conv(args) # use the saved embedding as input to train ConvKB for the link prediction task
So there is no other interaction between the GAT model and the ConvKB model except for the trained embeddings, am I right?
If so, is the trained embeddings fixed in ConvKB or trainable?
Thank you and hope for your reply.
number of unique_entities -> 14505
number of unique_entities -> 9809
number of unique_entities -> 10348
Initialised relations and entities from TransE
Total triples count 310116, training triples 272115, validation_triples 17535, test_triples 20466
Opening node_neighbors pickle object
Initial entity dimensions torch.Size([14541, 100]) , relation dimensions torch.Size([237, 100])
Defining model
Model type -> GAT layer with 2 heads used, Initial Embeddings training
length of unique_entities 14505
Number of epochs 3000
epoch-> 0
Traceback (most recent call last):
File "/root/zc/GAT2019ACL/main.py", line 366, in
train_gat(args)
File "/root/zc/GAT2019ACL/main.py", line 235, in train_gat
Corpus_, Corpus_.train_adj_matrix, train_indices, current_batch_2hop_indices)
File "/root/embedding-venv/lib/python3.5/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/root/zc/GAT2019ACL/models.py", line 141, in forward
edge_list, edge_type, edge_embed, edge_list_nhop, edge_type_nhop)
File "/root/embedding-venv/lib/python3.5/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/root/zc/GAT2019ACL/models.py", line 52, in forward
edge_type_nhop[:, 0]] + relation_embed[edge_type_nhop[:, 1]]
RuntimeError: CUDA out of memory. Tried to allocate 4.77 GiB (GPU 0; 15.90 GiB total capacity; 10.44 GiB already allocated; 4.73 GiB free; 32.66 MiB cached)
Process finished with exit code 1
Initialised relations and entities from TransE
Traceback (most recent call last):
File "main.py", line 110, in
Corpus_, entity_embeddings, relation_embeddings = load_data(args)
File "main.py", line 105, in load_data
args.batch_size_gat, args.valid_invalid_ratio_gat, unique_entities_train, args.get_2hop)
File "/home/KBAT-master/create_batch.py", line 35, in init
self.graph = self.get_graph()
File "/home/KBAT-master/create_batch.py", line 234, in get_graph
source = data[1].data.item()
AttributeError: 'int' object has no attribute 'data'
Iteration-> 674 , Iteration_time-> 0.1287 , Iteration_loss 0.0011
Iteration-> 675 , Iteration_time-> 0.1287 , Iteration_loss 0.0010
Iteration-> 676 , Iteration_time-> 0.1297 , Iteration_loss 0.0009
Iteration-> 677 , Iteration_time-> 0.1297 , Iteration_loss 0.0009
Iteration-> 678 , Iteration_time-> 0.0578 , Iteration_loss 0.0013
Epoch 199 , average loss 0.0010276363643238785 , epoch_time 88.21463370323181
Saving Model
Done saving Model
Sampled indices
test set length 3134
0
Traceback (most recent call last):
File "main.py", line 369, in
evaluate_conv(args, Corpus_.unique_entities_train)
File "main.py", line 363, in evaluate_conv
Corpus_.get_validation_pred(args, model_conv, unique_entities)
File "C:\Users\a\Documents\GitHub\relationPrediction\create_batch.py", line 423, in get_validation_pred
new_x_batch_head[:num_triples_each_shot, :]).cuda())
File "C:\Users\a\Documents\GitHub\relationPrediction\models.py", line 208, in batch_test
out_conv = self.convKB(conv_input)
File "D:\anaconda3\envs\python37-pytorch10.2\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "C:\Users\a\Documents\GitHub\relationPrediction\layers.py", line 35, in forward
self.non_linearity(self.conv_layer(conv_input)))
File "D:\anaconda3\envs\python37-pytorch10.2\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "D:\anaconda3\envs\python37-pytorch10.2\lib\site-packages\torch\nn\modules\conv.py", line 349, in forward
return self._conv_forward(input, self.weight)
File "D:\anaconda3\envs\python37-pytorch10.2\lib\site-packages\torch\nn\modules\conv.py", line 346, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 3.81 GiB (GPU 0; 11.00 GiB total capacity; 3.87 GiB already allocated; 2.55 GiB free; 5.81 GiB reserved in total by PyTorch)
i used this command $ python3 main.py --get_2hop True
and it says CUDA out of memory after epoch 199, what parameters should i change to finish this training?
thank you
I think SpecialSpmmFunctionFinal 's forward section is intend to compute the row sum of sparse matrix ,and backward return the gradient for the sparse matrix's values,but I find torch.sparse might solve the backward of row sum operation,for example:
`i = torch.LongTensor([[0, 1, 1],[2, 0, 2]]) #row, col
v = torch.FloatTensor([3, 4, 5]) #data
v.requires_grad=True
m=torch.sparse_coo_tensor(i, v, torch.Size([2,3])) #torch.Size()
m.retain_grad()
m1=torch.sparse.sum(m,dim=1)
m1.retain_grad()
m2=torch.sparse.sum(m1)
m2.backward()
print(v.grad)#v's gradient is tensor([1., 1., 1.])`
So why do you write the autograd function or something I understand is wrong?
waitting for your reply,thanks.
Reproducing the results with the hyperparameters mentioned in the repo and even with other variations produced a max MRR of 0.451 but the claimed MRR in paper is 0.518.
Is This expected or am I missing something.
` mask_indices = torch.unique(batch_inputs[:, 2]).cuda()
mask = torch.zeros(self.entity_embeddings.shape[0]).cuda()
mask[mask_indices] = 1.0
entities_upgraded = self.entity_embeddings.mm(self.W_entities)
out_entity_1 = entities_upgraded + \
mask.unsqueeze(-1).expand_as(out_entity_1) * out_entity_1
`
i run the command: python main.py --get_2hop True
it returns:
number of unique_entities -> 40559
number of unique_entities -> 5173
number of unique_entities -> 5323
Initialised relations and entities from TransE
Graph created
length of graph keys is 39610
and it feels stuck.
want to know how long it takes. and whether it return message to let me know its done.
Hi, thank you for sharing the code.
From the description in the paper, the model has more than 1 layer of GAT, what is the number of layers used for the result reported in the paper? I don't find it in the supplementary section. In the code, seems it's just single layer
I have the following problems: RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
I changed the batch_size from 128 to 32 or less, but it still doesn't work. How should I solve it?
I would appreciate your reply
Here are the details:
——————————————————-
epoch-> 2999
Iteration-> 0 , Iteration_time-> 1.6496 , Iteration_loss 0.0034
Epoch 2999 , average loss 0.0034224344417452812 , epoch_time 1.6496999263763428
Saving Model
Done saving Model
Defining model
Only Conv model trained
Number of epochs 200
epoch-> 0
Traceback (most recent call last):
File "main.py", line 368, in
train_conv(args)
File "main.py", line 334, in train_conv
loss.backward()
File "/home/menhaojie/anaconda3/envs/pytorch35/lib/python3.5/site-packages/torch/tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/menhaojie/anaconda3/envs/pytorch35/lib/python3.5/site-packages/torch/autograd/init.py", line 100, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution (try_all at /opt/conda/conda-bld/pytorch_1591914868920/work/aten/src/ATen/native/cudnn/Conv.cpp:693)
Thanks for open source the code.
One thing I am not sure about the evaluation part.
relationPrediction/create_batch.py
Line 382 in 785721b
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.