Giter Site home page Giter Site logo

relationprediction's People

Contributors

deepakn97 avatar nom007 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

relationprediction's Issues

reproduced results is much lower than what you reported

We use all your original settings on WN dataset, training epoches for conv is 200 as you set, here are the results of epoch 200 ( the results are nearly stable after ~ epoch 100):
hit10: 0.523
hit3: 0.403
hit1: 0.222
MRR: 0.332
mean rank: 2525
All the results are far lower compared with the results you claimed, even the closest hit10 is only at the level of CONVKB or CONVE. Can you provide your training details to reproduce the results on WN? The results is very similar to only using CONVKB which is the base model of your GAT, i wonder that if GAT really works in your model.

when evaluation, why remove entities that not seen?

image
In your code, I find that when you evaluate the test set, the entities that not appeared in train set would be removed.
Can you please explain this operation?
And when I reproduce ConvKB by your code, I can get H@10=0.593 in FB15k-237. Why you report 0.471 ?

Test data leakage issue

Hi, can you confirm the following claim in [1]?

KBAT incorporates ConvKB in the last layer of its model
architecture, which should be affected by different evaluation
protocols. But we find another bug on the leakage of test
triples during negative sampling in the reported model, which
results in more significant performance degradation.

[1] Z. Sun, S. Vashishth, S. Sanyal, P. Talukdar, Y. Yang, A Re-evaluation of Knowledge Graph Completion Methods, ArXiv:1911.03903 [Cs]. (2019). http://arxiv.org/abs/1911.03903 (accessed November 14, 2019).

Request for best performance model

Hi, I really appreciate your wonderful work on this problem, especially the huge improvement of FB15k-237. But when I followed the instruction in README.md to run the FB15k-237 model, I got a much lower MRR (which is about 0.43) than it is in the paper.

I wonder whether it is owing to some problems on my GPUs, so could you please provide the best performance model on FB15k-237? Many thanks.

Error: CUDA out of memory.

Hi , I get this error when I run the program : CUDA out of memory.
Do you know how to solve this problem? My graphics card model is Nvidia 1080Ti. What type of graphics card are you using ?

A question about code

Thanks for sharing code.
Where debuging the code, I find a question about the 'graph_create',here is what I understand:
the Graph is a dict<key,head_id of a triple:value,dict<key,tail_id:value,rel_id>>,when traverse the all_tiples

for data in all_tiples:
the rel_id appear later would cover before,so the graph is incompleted,I was confused,can anyone give a explanation. ;)
graph[source][target] = value

How long does it take to run on a dataset

Hi, thank you for your code! I am trying to run it on FB15k-237. But it get stuck at "Graph created length of graph keys is 13781" for a long time. Is this normal or not? Besides, how long does it take to run on a single dataset? Thanks in advance!

STransE to init embeddings error; n relations do not match

I am attempting to rerun your results per the blogpost. I am unable to run STransE with the data files in this repo. I noticed that the number of relations between the datasets does not match:

datquocnguyen/STransE#2

# Seg fault with identically formatted train.txt, valid.txt, test.txt, relation2id.txt, entity2id.txt
$ ./STransE -model 1 -data ../relationPrediction/data/WN18RR/ -size 50 -margin 5 -l1 1 -lrate 0.0005
Model: STransE
Dataset: ../relationPrediction/data/WN18RR/
Number of epoches: 2000
Vector size: 50
Margin: 5
L1-norm: 1
SGD learing rate: 0.0005
#relations = 11
#entities = 40943
Segmentation fault (core dumped)


# Working
$ ./STransE -model 1 -data Datasets/WN18/ -size 50 -margin 5 -l1 1 -lrate 0.0005
Model: STransE
Dataset: Datasets/WN18/
Number of epoches: 2000
Vector size: 50
Margin: 5
L1-norm: 1
SGD learing rate: 0.0005
#relations = 18
#entities = 40943
Optimize entity vectors, relation vectors and relation matrices:
# ...

Could you please share how you init the embeddings based on the data in this post? I would like to init embeddings for my own data and run this model on my own data.

Thank you!

What is the intent of Layer SpecialSpmmFunctionFinal

Hi,
I just wonder why your model have to go throw layer SpecialSpmmFunctionFinal and what is the intent of this layer
The layer forward is :

class SpecialSpmmFunctionFinal(torch.autograd.Function):
    """Special function for only sparse region backpropataion layer."""
    @staticmethod
    def forward(ctx, edge, edge_w, N, E, out_features):
        # assert indices.requires_grad == False
        a = torch.sparse_coo_tensor(
            edge, edge_w, torch.Size([N, N, out_features]))
        b = torch.sparse.sum(a, dim=1)
        ctx.N = b.shape[0]
        ctx.outfeat = b.shape[1]
        ctx.E = E
        ctx.indices = a._indices()[0, :]

        return b.to_dense()

I debuging on your default parameter with WN18k dataset. In first epoch or fisrt batch of dataset, I have this shape :

  • Input :
    N : 40943 : number of entity of dataset
    E : 294211 : number of concat of head, tail and 2hop_head, tail
    edge : (2, 294211) : is present for <head_id, tail_id, and 2_hop_head_id, 2_hop_tail_id>
    edge_w : (294211, 1) : is present for weighed of training in GAT layer

  • Output :
    e_rowsum : (40943, 1) is present for : .... ??? .....

As I now, It just sum all feature of training entity into a vector embed, but I don't know why your model have to go throw this layer, Can you explain the intent of Layer SpecialSpmmFunctionFinal ?
Thanks @deepakn97

A question about BFS in the prossess of generating 2-hop neighbors.

Hello, authors.

I have learnt a lot from your code, while I have a question about the bfs function described as follows:

I know the code of create_batch.py lines#259-276 is to achieve breadth-first search and it is effective, but I have a doubt about whether lines#271-273 does not work or not because line#266 q.put((target, graph[top[0]][target])) have put the child node to queue, such that the time of constructing 2-hop neighbors is so long (as you mentioned in previous issues, about 45 minutes), so in my opinion whether we could move line#266 to the next line of if distance[target] > 2: continue to make it work.

Thanks.

Test on different data

I want to test your model, say trained on WN, on a different dataset.
Is there a simple way to do this? I'm looking at your code but it seems like it requires a lot of edits.

Thank you!

The specific meaning of auxiliary edges

Can you explain to me the meaning of this auxiliary edge (preferably with an example), I want to apply it, but I still can't understand its specific meaning.

Do not support DataParallel

The project has not supported DataParallel and multi GPUs yet. I tried to add the codes myself but found several bugs while running the new version.

  1. Line 106, layers.py: edge = torch.cat((edge[:, :], edge_list_nhop[:, :]), dim=1). The shape of parameter 'edge' is (2, n) in the original code. While with DataParallel, the shape of 'edge' becomes (1, n) after a scatter function in DataParallel. Then this line causes Error because the shape of edge and edge_list_nhop in dimmension 1 is different (1 vs 2). The solution to fix it is to transpose the Corpus.train_adj_matrix[0] when initializing Corpus, and then in forward function of class SpKBGATModified, transpose parameter 'adj' again.
  2. RuntimeError: Expected all tensors to be on the same device, but found at least two devices. This error raised at Line 114, layers.py: edge_m = self.a.mm(edge_h). However, I have no idea how to fix this error. Hope someone to provide solution.

TransE hyperparameters

Thank you for making this work publicly available.

One request: could you please give the hyperparameters for TransE that you used (embedding vector size, negative ratio and margin at least), especially for NELL-995 as I believe your work is perhaps the only that evaluates TransE on this dataset?

In the paper or code I could only find the best hyperparameters for your model, but I am working on a comparison-type study and this other information would be of extreme help, thank you in advance!

Request for best performance parameters

Thanks for sharing your wonderful work. In your paper, you said that you would show the optimal hyper-parameters set for each dataset in your supplementary section. But I didn't find them and I need them to improve the experimental performance. I would appreciate it if you can provide them soon. Looking forward to your reply.
IO EDY$Q0NMK)DRCKLX}9

Raw or Filter

Thank you for your codes! I would like to ask you whether the negative samples are really filtered during the experiment. The negative sampling formula in the paper does not reflect filtering, but the code seems to implement filtering.

error in results due to negative sampling data leakage

Hi,
I've noticed an issue in this work as follows:  In module create_batch.py

the negative samples are not allowed to be taken from test data. When valid_triples_dict is defined as:
self.valid_triples_dict = {j: i for i, j in enumerate(
            self.train_triples + self.validation_triples + self.test_triples)}

and later used in:

while (random_entities[current_index], self.batch_indices[last_index + current_index, 1],
                               self.batch_indices[last_index + current_index, 2]) in self.valid_triples_dict.keys():

The model is taught during training that the test data are valid triples. We are not allowed to use test data during training. This will make the performance goes higher and higher with more epochs without any good reasons.  

A question about attention code

What does that mean by setting -leakrelu and rowsum[rowwum=0]=1e-12 ?

powers = -self.leakyrelu(self.a_2.mm(edge_m).squeeze())
e_rowsum = self.special_spmm_final(
            edge, edge_e, N, edge_e.shape[0], 1)
e_rowsum[e_rowsum == 0.0] = 1e-12

I just can't understand why there is a negative sign of leakrelu and 1e-12 for e_rowsum.
Looking for forward to your reply. @deepakn97

Retrieve node embeddings from checkpoint

Hi, thanks for your amazing work. I'm trying to retrieve node embedding from GAT and CONV checkpoints. What the right way to do that?
I'm trying something like this:

def get_embedding(args, unique_entities):
    model_conv = SpKBGATConvOnly(entity_embeddings, relation_embeddings, args.entity_out_dim, args.entity_out_dim,
                                 args.drop_GAT, args.drop_conv, args.alpha, args.alpha_conv,
                                 args.nheads_GAT, args.out_channels)
    model_conv.load_state_dict(torch.load(
        '{0}conv/trained_{1}.pth'.format(args.output_folder, args.epochs_conv - 1)), strict=False)

    model_conv.cuda()
    model_conv.eval()
    with torch.no_grad():
        preds = model_conv(Corpus_, Corpus_.train_adj_matrix, unique_entities)
        print(preds.size())

But I0m not sure about preds = model_conv(Corpus_, Corpus_.train_adj_matrix, unique_entities).
Thanks in advance

Why we have to use node_neighbors_2hop and then just get index of [source, 2hop_relation, 1hop_relation, 1hop_entity] instead of entity_2hop

Hi,
Why we have to use node_neighbors_2hop and then just get index of [source, 2hop_relation, 1hop_relation, 1hop_entity] instead of entity_2hop ?
In your code node_neighbors_2hop will return a dictionary like this

neighbors[source][distance] = [(tuple(relations), tuple(entities[:-1]))]

and after that just get value of {source, relation_1hop, entity_1hop, relation_2hop} instead of entity_2hop in the following code

batch_source_triples.append([source, nhop_list[i][0][-1], nhop_list[i][0][0], nhop_list[i][1][0]])

As index was show bellow

[source, 2hop_relation, 1hop_relation, 1hop_entity]
0,      1    ,           2 ,              3

edge_list_nhop = train_indices_nhop  [3, 0] # entity : source_entity, 1hop_entity
edge_type_nhop = train_indices_nhop  [1, 2] # relation : 2hop_relation, 1hop_relation

Why we must add source to n-hop. Because as I know in your paper, we want to get neighborhood with n-hop to get more info about neighborhood and aggregation it to present more info about embedding.

What is the role of "mask" in models.py

mask_indices = torch.unique(batch_inputs[:, 2]).cuda()
mask = torch.zeros(self.entity_embeddings.shape[0]).cuda()
mask[mask_indices] = 1.0

I want to know what is the role of "mask", and it doesn't appear in paper.

Is the model equal to a (carefully special) pre-trained embedding + ConvKB?

Thanks for sharing your code. I am really impressed by this paper. I find that this encoder-decoder model works like:

train_gat(args)     # train the (multi-hop neighbor...) GAT with a TransE loss, and save the embedding

train_conv(args)    # use the saved embedding as input to train ConvKB for the link prediction task

So there is no other interaction between the GAT model and the ConvKB model except for the trained embeddings, am I right?

If so, is the trained embeddings fixed in ConvKB or trainable?

Thank you and hope for your reply.

Cannot run FB15k-237 dataset on 16G GPU using default parameters provided by the author

number of unique_entities -> 14505
number of unique_entities -> 9809
number of unique_entities -> 10348
Initialised relations and entities from TransE
Total triples count 310116, training triples 272115, validation_triples 17535, test_triples 20466
Opening node_neighbors pickle object
Initial entity dimensions torch.Size([14541, 100]) , relation dimensions torch.Size([237, 100])
Defining model

Model type -> GAT layer with 2 heads used, Initial Embeddings training
length of unique_entities 14505
Number of epochs 3000

epoch-> 0
Traceback (most recent call last):
File "/root/zc/GAT2019ACL/main.py", line 366, in
train_gat(args)
File "/root/zc/GAT2019ACL/main.py", line 235, in train_gat
Corpus_, Corpus_.train_adj_matrix, train_indices, current_batch_2hop_indices)
File "/root/embedding-venv/lib/python3.5/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/root/zc/GAT2019ACL/models.py", line 141, in forward
edge_list, edge_type, edge_embed, edge_list_nhop, edge_type_nhop)
File "/root/embedding-venv/lib/python3.5/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/root/zc/GAT2019ACL/models.py", line 52, in forward
edge_type_nhop[:, 0]] + relation_embed[edge_type_nhop[:, 1]]
RuntimeError: CUDA out of memory. Tried to allocate 4.77 GiB (GPU 0; 15.90 GiB total capacity; 10.44 GiB already allocated; 4.73 GiB free; 32.66 MiB cached)

Process finished with exit code 1

An AttributeError: 'int' object has no attribute 'data'

Initialised relations and entities from TransE
Traceback (most recent call last):
File "main.py", line 110, in
Corpus_, entity_embeddings, relation_embeddings = load_data(args)
File "main.py", line 105, in load_data
args.batch_size_gat, args.valid_invalid_ratio_gat, unique_entities_train, args.get_2hop)
File "/home/KBAT-master/create_batch.py", line 35, in init
self.graph = self.get_graph()
File "/home/KBAT-master/create_batch.py", line 234, in get_graph
source = data[1].data.item()
AttributeError: 'int' object has no attribute 'data'

have runtimeError after epoch 199

Iteration-> 674 , Iteration_time-> 0.1287 , Iteration_loss 0.0011
Iteration-> 675 , Iteration_time-> 0.1287 , Iteration_loss 0.0010
Iteration-> 676 , Iteration_time-> 0.1297 , Iteration_loss 0.0009
Iteration-> 677 , Iteration_time-> 0.1297 , Iteration_loss 0.0009
Iteration-> 678 , Iteration_time-> 0.0578 , Iteration_loss 0.0013
Epoch 199 , average loss 0.0010276363643238785 , epoch_time 88.21463370323181
Saving Model
Done saving Model
Sampled indices
test set length 3134
0
Traceback (most recent call last):
File "main.py", line 369, in
evaluate_conv(args, Corpus_.unique_entities_train)
File "main.py", line 363, in evaluate_conv
Corpus_.get_validation_pred(args, model_conv, unique_entities)
File "C:\Users\a\Documents\GitHub\relationPrediction\create_batch.py", line 423, in get_validation_pred
new_x_batch_head[:num_triples_each_shot, :]).cuda())
File "C:\Users\a\Documents\GitHub\relationPrediction\models.py", line 208, in batch_test
out_conv = self.convKB(conv_input)
File "D:\anaconda3\envs\python37-pytorch10.2\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "C:\Users\a\Documents\GitHub\relationPrediction\layers.py", line 35, in forward
self.non_linearity(self.conv_layer(conv_input)))
File "D:\anaconda3\envs\python37-pytorch10.2\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "D:\anaconda3\envs\python37-pytorch10.2\lib\site-packages\torch\nn\modules\conv.py", line 349, in forward
return self._conv_forward(input, self.weight)
File "D:\anaconda3\envs\python37-pytorch10.2\lib\site-packages\torch\nn\modules\conv.py", line 346, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 3.81 GiB (GPU 0; 11.00 GiB total capacity; 3.87 GiB already allocated; 2.55 GiB free; 5.81 GiB reserved in total by PyTorch)


i used this command $ python3 main.py --get_2hop True
and it says CUDA out of memory after epoch 199, what parameters should i change to finish this training?
thank you

the intent of SpecialSpmmFunctionFinal and whether it can be done by Pytorch itself ?

I think SpecialSpmmFunctionFinal 's forward section is intend to compute the row sum of sparse matrix ,and backward return the gradient for the sparse matrix's values,but I find torch.sparse might solve the backward of row sum operation,for example:
`i = torch.LongTensor([[0, 1, 1],[2, 0, 2]]) #row, col
v = torch.FloatTensor([3, 4, 5]) #data
v.requires_grad=True
m=torch.sparse_coo_tensor(i, v, torch.Size([2,3])) #torch.Size()
m.retain_grad()

m1=torch.sparse.sum(m,dim=1)
m1.retain_grad()

m2=torch.sparse.sum(m1)
m2.backward()
print(v.grad)#v's gradient is tensor([1., 1., 1.])`
So why do you write the autograd function or something I understand is wrong?
waitting for your reply,thanks.

why only update tail entities?Is there an error here

` mask_indices = torch.unique(batch_inputs[:, 2]).cuda()
mask = torch.zeros(self.entity_embeddings.shape[0]).cuda()
mask[mask_indices] = 1.0

    entities_upgraded = self.entity_embeddings.mm(self.W_entities)
    out_entity_1 = entities_upgraded + \
        mask.unsqueeze(-1).expand_as(out_entity_1) * out_entity_1

`

how long does it take to get 2_hop.

i run the command: python main.py --get_2hop True
it returns:
number of unique_entities -> 40559
number of unique_entities -> 5173
number of unique_entities -> 5323
Initialised relations and entities from TransE
Graph created
length of graph keys is 39610

and it feels stuck.
want to know how long it takes. and whether it return message to let me know its done.

number of GAT layers

Hi, thank you for sharing the code.
From the description in the paper, the model has more than 1 layer of GAT, what is the number of layers used for the result reported in the paper? I don't find it in the supplementary section. In the code, seems it's just single layer

RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

I have the following problems: RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
I changed the batch_size from 128 to 32 or less, but it still doesn't work. How should I solve it?
I would appreciate your reply
Here are the details:
——————————————————-
epoch-> 2999
Iteration-> 0 , Iteration_time-> 1.6496 , Iteration_loss 0.0034
Epoch 2999 , average loss 0.0034224344417452812 , epoch_time 1.6496999263763428
Saving Model
Done saving Model
Defining model
Only Conv model trained
Number of epochs 200

epoch-> 0
Traceback (most recent call last):
File "main.py", line 368, in
train_conv(args)
File "main.py", line 334, in train_conv
loss.backward()
File "/home/menhaojie/anaconda3/envs/pytorch35/lib/python3.5/site-packages/torch/tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/menhaojie/anaconda3/envs/pytorch35/lib/python3.5/site-packages/torch/autograd/init.py", line 100, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution (try_all at /opt/conda/conda-bld/pytorch_1591914868920/work/aten/src/ATen/native/cudnn/Conv.cpp:693)

filtering out unseen entities during scoring

Thanks for open source the code.
One thing I am not sure about the evaluation part.

if(batch_indices[i, 0] not in unique_entities or batch_indices[i, 2] not in unique_entities):

It seems those triples with unseen entities are dropped. I think it is uncommon and quite different from others evaluation. Any idea behind this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.