deepgraphlearning / knowledgegraphembedding Goto Github PK

License: MIT License

Shell 11.32% Python 88.68%

knowledgegraphembedding's Introduction

RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space

Introduction

This is the PyTorch implementation of the RotatE model for knowledge graph embedding (KGE). We provide a toolkit that gives state-of-the-art performance of several popular KGE models. The toolkit is quite efficient, which is able to train a large KGE model within a few hours on a single GPU.

A faster multi-GPU implementation of RotatE and other KGE models is available in GraphVite.

Implemented features

Models:

Evaluation Metrics:

MRR, MR, HITS@1, HITS@3, HITS@10 (filtered)
AUC-PR (for Countries data sets)

Loss Function:

Uniform Negative Sampling
Self-Adversarial Negative Sampling

Usage

Knowledge Graph Data:

entities.dict: a dictionary map entities to unique ids
relations.dict: a dictionary map relations to unique ids
train.txt: the KGE model is trained to fit this data set
valid.txt: create a blank file if no validation data is available
test.txt: the KGE model is evaluated on this data set

Train

For example, this command train a RotatE model on FB15k dataset with GPU 0.

CUDA_VISIBLE_DEVICES=0 python -u codes/run.py --do_train \
 --cuda \
 --do_valid \
 --do_test \
 --data_path data/FB15k \
 --model RotatE \
 -n 256 -b 1024 -d 1000 \
 -g 24.0 -a 1.0 -adv \
 -lr 0.0001 --max_steps 150000 \
 -save models/RotatE_FB15k_0 --test_batch_size 16 -de

Check argparse configuration at codes/run.py for more arguments and more details.

Test

CUDA_VISIBLE_DEVICES=$GPU_DEVICE python -u $CODE_PATH/run.py --do_test --cuda -init $SAVE

Reproducing the best results

To reprocude the results in the ICLR 2019 paper RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space, you can run the bash commands in best_config.sh to get the best performance of RotatE, TransE, and ComplEx on five widely used datasets (FB15k, FB15k-237, wn18, wn18rr, Countries).

The run.sh script provides an easy way to search hyper-parameters:

bash run.sh train RotatE FB15k 0 0 1024 256 1000 24.0 1.0 0.0001 200000 16 -de

Speed

The KGE models usually take about half an hour to run 10000 steps on a single GeForce GTX 1080 Ti GPU with default configuration. And these models need different max_steps to converge on different data sets:

Dataset	FB15k	FB15k-237	wn18	wn18rr	Countries S*
MAX_STEPS	150000	100000	80000	80000	40000
TIME	9 h	6 h	4 h	4 h	2 h

Results of the RotatE model

Dataset	FB15k	FB15k-237	wn18	wn18rr
MRR	.797 ± .001	.337 ± .001	.949 ± .000	.477 ± .001
MR	40	177	309	3340
HITS@1	.746	.241	.944	.428
HITS@3	.830	.375	.952	.492
HITS@10	.884	.533	.959	.571

Using the library

The python libarary is organized around 3 objects:

TrainDataset (dataloader.py): prepare data stream for training
TestDataSet (dataloader.py): prepare data stream for evluation
KGEModel (model.py): calculate triple score and provide train/test API

The run.py file contains the main function, which parses arguments, reads data, initilize the model and provides the training loop.

Add your own model to model.py like:

def TransE(self, head, relation, tail, mode):
    if mode == 'head-batch':
        score = head + (relation - tail)
    else:
        score = (head + relation) - tail

    score = self.gamma.item() - torch.norm(score, p=1, dim=2)
    return score

Citation

If you use the codes, please cite the following paper:

@inproceedings{
 sun2018rotate,
 title={RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space},
 author={Zhiqing Sun and Zhi-Hong Deng and Jian-Yun Nie and Jian Tang},
 booktitle={International Conference on Learning Representations},
 year={2019},
 url={https://openreview.net/forum?id=HkgEQnRqYQ},
}

knowledgegraphembedding's People

Contributors

Stargazers

Watchers

Forkers

clpl leiloong nilesh-agrawal changlinzhang ngo010 codeaudit nguyenvo09 kiddozhu jdc08161063 lzjtt2017 mars-wei li-sizhi aaronanima liuwq168 yuandongdongdong che1qian2 tan92hl jingmouren cheungdaven tonystark262 pouyapez sunzequn gokcemuge yuzhiw shkklt egdenis lcorvle zys0070 fudp jennfer0808 gaohuan2015 gaotianyu1350 wuciawe zwytop richardhgl lichao88 ihaeyong wpfhtl ines-chami remonly zhupengjia ustc-miner yzhangee ankitvad zerotoall beesitech seoe semanticbeeng shilan910 arita37 wangshirui33 zhangxuemiao daemoonn seeker1943 jiajunchen98 shamaimai yichengdwu murphyjoker dragomirradev yinghuofdu newzsh vitalyvels tonydeep hamedmx shehzaadzd zshwuhan dchang56 guixiangyu1 yaqingwang speechlessman abhi4rana7 wengyuanwy honglin-chen zyynnn yangzhou12 afshinsadeghi bishnukuet wilson-zhang rain0831 mlpacheco gdls lfxx123 dongdongwit sdriven himmelstein zhangpiepie maxtrem cse-ljl 4ai mojtabanayyeri realcatking freekang shayanbits genggengcss ameyagodbole irokin qianrenjian wwdxfa liu-jc bosung

knowledgegraphembedding's Issues

typo MRR in README.md

Dataset | FB15k | FB15k-237 | wn18 | wn18rr
MRR | .797 ± .001 | .949 ± .000 | .337 ± .001 | .477 ± .001

The MRR of datasets FB15k-237 and wn18 should be swapped with each other.

Did you just use the first batch to train the model? Can you help me solve my problem?

I have question in (ROTATE) model.py. ROTATE uses next function to generate the data, shouldn't the next function be inside the loop? If use this function, I found in every step, ROTATE just chooses the first batch to train, because if next function is not in the loop, it will generates the first data in the list/dict.... who can help me answer my question?

class BidirectionalOneShotIterator(object):

def __init__(self, dataloader_head, dataloader_tail):
    self.iterator_head = self.one_shot_iterator(dataloader_head)
    # print("bb",next(self.iterator_head))  #一个batch的
    self.iterator_tail = self.one_shot_iterator(dataloader_tail)
    self.step = 0


def __next__(self):
    self.step += 1

    if self.step % 2 == 0:
        data = next(self.iterator_head)
    else:
        data = next(self.iterator_tail)
    print("self.step", self.step)
    return data


@staticmethod
def one_shot_iterator(dataloader):
    '''
    Transform a PyTorch Dataloader into python iterator
    '''
    while True:
        for data in dataloader:
            yield data

def train_step(model, optimizer, train_iterator, args):
'''
A single train step. Apply back-propation and return the loss
'''
model.train()
optimizer.zero_grad()
positive_sample, negative_sample, subsampling_weight, mode = next(train_iterator)

How to use embedding to compute triplet score

I don't quite understand your training script about head-batch. Anyway, I have pretrained ent and rel embeddings without understanding it.

Now I want to figure out that given a triplet: h [x1, ..., x200], r=[r1, ..., r100] and t=[y1, ..., y200], how to compute its score?

A detailed description will be much appreciated.

Some question about the test process.

When I tested the model by the command: CUDA_VISIBLE_DEVICES=0 python -u codes/run.py --do_test --cuda -init models/RotatE_countries_S3_0, it showed that "UnboundLocalError: local variable 'current_learning_rate' referenced before assignment". I'm a novice. Could you tell me how to solve it? I would appreciate it if you can help me. Looking forward to your reply.

why complEx

Compute Head and Tail Prediction on FB15K dataset

I want to check the Prediction Head and Prediction Tail (Hits@10) on different types of relation like 1-to-1, 1-to-N, N-to-1 and N-to-N of TransD model like the results mentioned in the paper. How can I perform the Head and Tail Prediction on 1-to-1, 1-to-N, N-to-1 and N-to-N of relations of FB15k dataset?

why the ComplEx just get 0.74 Hits10 on FB15k which is lower than paper's 0.84 for 0.1

Reproduce DistMult results

Could this implementation also reproduce DistMult results in the paper?

How to get the embeddings file of all the entities and relations?

Hello, Zhiqing. I want to know how to get the embeddings file of all the entities and relations after training a RotatE model on some knowledge graph dataset? Like the glove word embeddings file.

About the seed setting.

Thanks for sharing the wonderful code :)
When I reproduce the RototE，I find the result is different each time(but just a little),such as loss.mr,mrr,etc.So I find that your code didn't contain ant code about seed setting code,like
seed = 42 np.random.seed(seed) torch.manual_seed(seed) if is_cuda: torch.cuda.manual_seed_all(seed)

It is different from the tutorial ever I did.
Could you give some explanation about your thinking about that?

Looking forward to your repeat,sorry for asking so shallow issues QAQ.

hi，a priblem about multi GPU training

how can i change the code to do multi gpu training
怎么样才能用多个gpu进行训练呢，我用的我自己的数据集，报的是显存溢出，我的显存40G，数据量太大啦，请问可以怎么修改呢？谢谢

Scoring functions and Adversarial Loss Parameters

I had 3 queries related to loss function and scoring functions:

a. Why is the margin a part of the scoring function for transE and rotatE. Doesn't it actually change the scores when you do prediction(Eg: if margin is 1 then (1-score) is different from score. Each approach would result in totally different ranks during prediction).

b. Margin is not directly part of the adversarial loss. It is a part of the scoring function as described above. However, this is not the case for complex and distmult in this implementation. Is it equivalent to saying that for these two models you are setting the hyper parameter margin=0 in the loss function? What does this mean, since you are using a margin based loss for optimisation?

c. Have you tried Rotate with other margin based losses? How does it perform compared to Complex/HolE?

Given two node embedding, how to predict the relation embedding between them?

Loss function of TransE and RotatE in the code

Thank you for your excellent research and codes. However, I am confused about why you use the same loss function for TransE and RotatE? I think the loss functions of TransE and RotatE are different according to their definitions in the orginal paper. I hope you can explain it. Thank you.

invalid syntax error

Start Training......
File "codes/run.py", line 101
**save_variable_list,
^
SyntaxError: invalid syntax

Don't know what causing this error. Do you have any idea.

Hi,I want to

RuntimeError: The size of tensor a (2000) must match the size of tensor b (1000) at non-singleton dimension 2

I am using the following parameters to train Transe to report an error. The error message is in the title。
--do_train
--cuda
--do_valid
--do_test
--data_path
D:\KnowledgeGraphEmbedding\data\wn18rr
--model
TransE
-n
256
-b
1
-d
1000
-g
24.0
-a
1.0
-adv
-lr
0.001
--max_steps
1500
-save
models/TransE_wn18rr_0
--test_batch_size
1
-de

The detailed error information is
Traceback (most recent call last):
File "D:/KnowledgeGraphEmbedding/codes/run.py", line 364, in
main(parse_args())
File "D:/KnowledgeGraphEmbedding/codes/run.py", line 308, in main
log = kge_model.train_step(kge_model, optimizer, train_iterator, args)
File "D:\KnowledgeGraphEmbedding\codes\model.py", line 267, in train_step
negative_score = model((positive_sample, negative_sample), mode=mode)
File "E:\anaconda\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\KnowledgeGraphEmbedding\codes\model.py", line 159, in forward
score = model_func[self.model_name](head, relation, tail, mode)
File "D:\KnowledgeGraphEmbedding\codes\model.py", line 169, in TransE
score = (head + relation) - tail
RuntimeError: The size of tensor a (2000) must match the size of tensor b (1000) at non-singleton dimension 2

How does RotatE deal with anti-symmetric relations?

When I train a more huge dataset by model RotatE, always show the question "CUDA out of memory" ?

Init embedding random OR by bert encode from entity description text

Hi, thanks a lot for your work in KGE, but I still am confused about init embedding, I tried to init embedding from bert through entity text information, but when train model, the neg triplets loss seem still upgrade, I have change LR or other hyperparameters, but not work,
Will different init embedding have different results to model?

N-1 and 1-N relations

Thanks for your work，I'm a beginner of KGE.
While reading this paper，I have a problem: for N-1 relation r, it should be x1· r = y and x2· r = y，r corresponds to a counterclockwise rotation, so x1 is close to x2 , which has the same issue with TransE.
I'm not sure whether my understanding is true, it seems that RotatE can't model N-1 relation and 1-N relation properly.
Looking forword to your reply, thank you.

how do you generate the embedding? Is it a random walk or something? I mean what is the main algorithm for embedding.

Question about relaxing RotatE

Hello,

I was wondering have you tried removing the modulus constraint (in RotatE, the modulus of each r_i is 1) and see if that helps?

Thank you.

Give pytorch version in README.md

Which version of pytorch is these codes based on? Could you please give the pytorch version in README.md? Thank you.

Question about the embedding range (weight initialization and phase normalization)

Hello, first of all many thanks for providing the source code alongside the paper.

I was comparing the implementation of RotatE against the paper and I found something which seems quite important, the embedding range, which is defined as (gamma+2)/hidden_dim.

This raises two questions that could be related to each other:

The paper says "Both the real and imaginary parts of the entity embeddings are uniformly initialized, and the phases of the relation embeddings are uniformly initialized between 0 and 2π.", while in the code both the entities and relations are initialized with Uniform(-embedding_range, embedding_range).
The phase relation is divided by the embedding range in the metric implementation, while that does not seem explicitly mentioned in the paper from my reading.

Could you help me understand these two points, or maybe point out in the paper the explanation behind them?

Thank you very much.

confused about implementation of DistMult

According to the model in distmult paper, $$score = h_s * M_r * h_o^T$$. However, in most papers about knowledge graph completion including the model in CompGCN, they are realized as a form of $$score = h_s * r * h_o^T$$. May I ask you about the insight behind your realization form?

Code early stop

A question about the data range of negative sampling

Hi, thanks for such a good job first!

I observe when training, you generate negative samples based on train set, so for triples only appearing in valid or test set, the model will treat them negative and these "false negative" samples will influence the model performance when evaluating. From my opinion, maybe the valid set should be introduced for negative sampling in training?

Thanks for your interpretation.

Can't find log files

Hi!

I trained the model using the following command:

CUDA_VISIBLE_DEVICES=0 python -u codes/run.py --do_train --cuda --do_valid --do_test --data_path data/FB15k-237 --model RotatE -n 256 -b 1024 -d 1000 -g 24.0 -a 1.0 -adv -lr 0.0001 --max_steps 150000 **-save models/RotatE_FB15k-237** --test_batch_size 16 -de --cuda

and tested it using:

time CUDA_VISIBLE_DEVICES=1 python -u codes/run.py --do_test -init models/RotatE_FB15k-237/ --cuda

I am not able to find the log files in models/RotatE_FB15k-237/ folder. I am unsure what went wrong. Please help.

Can we initialize entity embeddings using GLoVE embeddings?

Thanks for making the work public!

It has been reported in the literature that in case of common-sense Knowledge Graphs, initializing entity embeddings as averaged word embeddings leads to faster convergence and better results. Have you tried this and provide functionality to use this? I can implement it for my own use-case, however wanted to know if your work already handles this.

Pretrained models

Hello all,

Thank you for great work. I was wondering whether you plan to make pretrained RotatE models publicly available. I reckon such pretrained models would be very helpful for everyone including the mother nature :) as training RotatE on suitable machine would require at least 19 hours.

Cheers

RuntimeError: CUDA out of memory.

dear bro, very lucky to be able to read such a good paper, and open source, when I run the program, some errors occurred, the same server, when I use the data set FB15K, he is working, when I changed to wn18 , Then RuntimeError: CUDA out of memory

my code：
CUDA_VISIBLE_DEVICES=1 python -u codes/run.py --do_train --cuda --do_valid --do_test --data_path data/wn18 --model RotatE -n 256 -b 256 -d 1000 -g 24.0 -a 1.0 -adv -lr 0.0001 --max_steps 80000 -save models/RotatE_wn18_0 --test_batch_size 16 -de
-b{64,128,256,512}I have tried using these values，I also asked for help

regarding RELATION CATEGORY in FB15k

How do you perform the relation category in paper? Do you have a python code for it?

Question about filter_bias = -1 for positive triplets

In dataloader.py, the filter_bias is set -1 for positive triplets,
and score += filter_bias in model.py is find to filter the rank.

It seems the score of a triplet may be in a large range (at least the range is much larger than 1).
I noticed the small filter_bias really works (Both TransE and Rotate), it gets the same result when set filter_bias = -100.

But, why the small filter_bias "filter_bias = -1" can work?

GPU Out Of Memory

my gpu is 1080ti
Is it impossible to run with this gpu? (rotate)
I wonder how much memory I need.
Or is there another way?

RotatE: this implementation vs GraphVite

Hi, thanks for developing RotatE!
I was wondering, why are your results a bit different from the ones obtained by your other project https://graphvite.io/?

Regarding the direction of the rotation

In a paper, you mention that the direction must be counter clock wise. However image showing direction from head to tail in clockwise. Is there any typo or I made some mistake. It would be great, if you can put some insight on it. Thank you

Issue of the evaluation results on WN18RR

Thank you for your excellent project, but I cannot obtain the evaluation results on WN18RR with the best configuration. I get about 4500 on MR and 0.4 on MRR with the source code. Can you please tell me why I get this result? Thank you.

Interpreting how does RotatE handle 1-N, N-N type relations...

Hi. Great work and repo.
I was curious to know what is the geometrical interpretation when RotatE handles 1-N and N-N type relations.
For example, in TransH we know that the embeddings are projected onto a relation specific hyperplane to bypass this issue.

How is this handled in the complex plane of RotatE?

the early stop when setting the max step

Sorry again for troubling you, I set the max step is 100000 in RotatE FB15k-237. When I run your best congif.sh, I found my code early stop at step 4900, Can you help me sovle this issues? Thanks!!

Memory consumption issue

I use the command:

bash run.sh train RotatE FB15k-237 0 0 1024 256 1000 9.0 1.0 0.00005 100000 16 -de

to train RotatE on a 11 GB GPU. I ensure it is completely free.
I still get the following error:

2022-03-31 19:32:37,370 INFO     negative_adversarial_sampling = False
2022-03-31 19:32:37,370 INFO     learning_rate = 0
2022-03-31 19:32:39,079 INFO     Training average positive_sample_loss at step 0: 5.635527
2022-03-31 19:32:39,079 INFO     Training average negative_sample_loss at step 0: 0.003591
2022-03-31 19:32:39,079 INFO     Training average loss at step 0: 2.819559
2022-03-31 19:32:39,079 INFO     Evaluating on Valid Dataset...
2022-03-31 19:32:39,552 INFO     Evaluating the model... (0/2192)
2022-03-31 19:33:38,650 INFO     Evaluating the model... (1000/2192)
2022-03-31 19:34:38,503 INFO     Evaluating the model... (2000/2192)
2022-03-31 19:34:49,981 INFO     Valid MRR at step 0: 0.005509
2022-03-31 19:34:49,982 INFO     Valid MR at step 0: 6894.798660
2022-03-31 19:34:49,982 INFO     Valid HITS@1 at step 0: 0.004733
2022-03-31 19:34:49,982 INFO     Valid HITS@3 at step 0: 0.005076
2022-03-31 19:34:49,982 INFO     Valid HITS@10 at step 0: 0.005646
Traceback (most recent call last):
  File "codes/run.py", line 371, in <module>
    main(parse_args())
  File "codes/run.py", line 315, in main
    log = kge_model.train_step(kge_model, optimizer, train_iterator, args)
  File "/home/prachi/related_work/KnowledgeGraphEmbedding/codes/model.py", line 315, in train_step
    loss.backward()
  File "/home/prachi/anaconda3/envs/py36/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/prachi/anaconda3/envs/py36/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 1.95 GiB (GPU 0; 10.92 GiB total capacity; 7.41 GiB already allocated; 1.51 GiB free; 1.52 GiB cached)
run.sh: line 79: 
CUDA_VISIBLE_DEVICES=$GPU_DEVICE python -u $CODE_PATH/run.py --do_train \
    --cuda \
    --do_valid \
    --do_test \
    --data_path $FULL_DATA_PATH \
    --model $MODEL \
    -n $NEGATIVE_SAMPLE_SIZE -b $BATCH_SIZE -d $HIDDEN_DIM \
    -g $GAMMA -a $ALPHA -adv \
    -lr $LEARNING_RATE --max_steps $MAX_STEPS \
    -save $SAVE --test_batch_size $TEST_BATCH_SIZE \
    ${14} ${15} ${16} ${17} ${18} ${19} ${20}

: No such file or directory

I get similar errors on trying to train FB15k using the command in best_config.sh file.
I reduced the batchsize to 500 and it worked but the performance is much less than the numbers reported in the paper.

I am not sure what is the issue.

Why do you separate negative head samples and negative tail samples?

Thanks first for such a good job.

I observe in code that you implement two data iterators named train_dataloader_head and train_dataloader_tail, which respectively generate negative head samples and negative tail samples. And when training, these two iterators are alternatively fed into the model. If what I understand above is right, the model will train one positive sample twice, respectively for neg head and neg tail samples. I want to know why you do negative sampling this way, instead train the neg head and neg tail samples together and back propagate one positive sample once, which I think is a more intuitive way?

Thanks a lot for your reply.

what is L3 regularization

I am curious about the L3 regularization you use for complEx and DistMult? Can you give any references for it?

The dimension of entity vector and relation vector

KnowledgeGraphEmbedding/codes/model.py

Lines 203 to 211 in bf86876

    
           re_head, im_head = torch.chunk(head, 2, dim=2) 
        
           re_tail, im_tail = torch.chunk(tail, 2, dim=2) 
        
           #Make phases of relations uniformly distributed in [-pi, pi] 
        
           phase_relation = relation/(self.embedding_range.item()/pi) 
        
           re_relation = torch.cos(phase_relation) 
        
           im_relation = torch.sin(phase_relation)

Here the re_head, im_head are hidden_size vectors but the L208-L211 wouldn't change the dimension of relation vector, so the re_relation and im_relation are both hidden_size*2 vector. How can you operate them with different dimension.

confused about implementation of 1-1, 1-n, n-1, n-n

hi, thanks a lot for your work in knowledge graph completion, but I still am confused about the implementation of the table9 in your paper.

as for the relation category, Following Wang et al. (2014), for each relation r, we compute the average number of tails per head (tphr) and the average number of head per tail (hptr). If tphr < 1.5 and hptr < 1.5, r is treated as one-to-one; if tphr ≥ 1.5 and hptr ≥ 1.5, r is treated as a many-to-many; if tphr < 1.5 and hptr ≥ 1.5, r is treated as one-to-many. So should we take the valid dataset and test dataset into consideration in this process? Or should we only classify them in the training dataset?
take tail prediction in the 1-n relation category as an example, should we choose all 1-n relations prediction scores and take all the results into mean?
I'm confused to re-implement this part of the experiment. It would be best if you could take a script as an example. Thanks a lot in advance!

How to split the train/valid/test

Hi, did you randomly split the knowledge graph into train/valid/test? How do you make sure the training set contains all entities?

Script for finding the best hyperparameters

Hi Zhiqing, thanks for making your code available for reproducibility. I am just wondering whether you could also share the script that you use for tuning the hyperparameters. This would make your approach even more reproducible. Thanks.

L3 regularization

When Using L3 regularization for ComplEx and DistMult, you apply norm function two times for relation embedding, just as follwing (model.relation_embedding.norm(p=3).norm(p=3) ** 3). Can you explain why you apply norm function two times? Thank you.

train error

I used the training instructions you provided, but there are some problems, I don’t know how to solve them

File "codes/run.py", line 361, in
main(parse_args())
File "codes/run.py", line 305, in main
log = kge_model.train_step(kge_model, optimizer, train_iterator, args)
File "codes\model.py", line 267, in train_step
negative_score = model((positive_sample, negative_sample), mode=mode)
File "C:\Users\Kano_Hayashi.conda\envs\rota\lib\site-packages\torch\nn\modules\module.py", line 550, in _call
_
result = self.forward(*input, **kwargs)
File "codes\model.py", line 144, in forward
index=tail_part.view(-1)
RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #3 'index' in call to _th_index_select

CUDA out of memory (resolved) and method to make RotatE run faster

Thank you for developing great work, RotatE. I'm really interested in your research.

I ran your program as the following, but I found that there is a bug "RuntimeError: CUDA out of memory". How did you debug?
I changed the batch size from 1024 to 256 and the program could run successfully. But, I don't really want to change the batch size.

dl-box@DL-Box:~/Downloads/RotatE$ CUDA_VISIBLE_DEVICES=0 python -u codes/run.py --do_train \

--cuda
--do_valid
--do_test
--data_path data/FB15k
--model RotatE
-n 256 -b 1024 -d 1000
-g 24.0 -a 1.0 -adv
-lr 0.0001 --max_steps 150000
-save models/RotatE_FB15k_0 --test_batch_size 16 -de
2021-11-07 17:21:05,436 INFO Model: RotatE
2021-11-07 17:21:05,437 INFO Data Path: data/FB15k
2021-11-07 17:21:05,437 INFO #entity: 14951
2021-11-07 17:21:05,437 INFO #relation: 1345
2021-11-07 17:21:05,892 INFO #train: 483142
2021-11-07 17:21:05,941 INFO #valid: 50000
2021-11-07 17:21:06,000 INFO #test: 59071
2021-11-07 17:21:06,202 INFO Model Parameter Configuration:
2021-11-07 17:21:06,202 INFO Parameter gamma: torch.Size([1]), require_grad = False
2021-11-07 17:21:06,202 INFO Parameter embedding_range: torch.Size([1]), require_grad = False
2021-11-07 17:21:06,202 INFO Parameter entity_embedding: torch.Size([14951, 2000]), require_grad = True
2021-11-07 17:21:06,202 INFO Parameter relation_embedding: torch.Size([1345, 1000]), require_grad = True
2021-11-07 17:21:12,102 INFO Ramdomly Initializing RotatE Model...
2021-11-07 17:21:12,102 INFO Start Training...
2021-11-07 17:21:12,102 INFO init_step = 0
2021-11-07 17:21:12,102 INFO batch_size = 1024
2021-11-07 17:21:12,102 INFO negative_adversarial_sampling = 1
2021-11-07 17:21:12,102 INFO hidden_dim = 1000
2021-11-07 17:21:12,102 INFO gamma = 24.000000
2021-11-07 17:21:12,102 INFO negative_adversarial_sampling = True
2021-11-07 17:21:12,102 INFO adversarial_temperature = 1.000000
2021-11-07 17:21:12,102 INFO learning_rate = 0
Traceback (most recent call last):
File "codes/run.py", line 361, in
main(parse_args())
File "codes/run.py", line 305, in main
log = kge_model.train_step(kge_model, optimizer, train_iterator, args)
File "/home/dl-box/Downloads/RotatE/codes/model.py", line 300, in train_step
loss.backward()
File "/home/dl-box/.local/lib/python3.6/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/dl-box/.local/lib/python3.6/site-packages/torch/autograd/init.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 1.95 GiB (GPU 0; 10.92 GiB total capacity; 6.11 GiB already allocated; 866.06 MiB free; 7.97 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I ran the command line "bash run.sh train ComplEx FB15k 0 0 1024 256 1000 500.0 1.0 0.001 150000 16 -de -dr -r 0.000002" as the following (with dataset FB15k), and your program could run successfully on my Ubuntu server.

dl-box@DL-Box:~/Downloads/RotatE$ 1.10.0+cu102
Start Training......
2021-11-08 04:46:49,552 INFO 2021-11-08 04:46:49,552 INFO 2021-11-08 04:46:49,552 INFO 2021-11-08 04:46:49,552 INFO 2021-11-08 04:46:50,009 INFO 2021-11-08 04:46:50,058 INFO 2021-11-08 04:46:50,120 INFO 2021-11-08 04:46:50,336 INFO 2021-11-08 04:46:50,336 INFO 2021-11-08 04:46:50,336 INFO 2021-11-08 04:46:50,336 INFO 2021-11-08 04:46:50,336 INFO 2021-11-08 04:46:56,318 INFO 2021-11-08 04:46:56,318 INFO 2021-11-08 04:46:56,318 INFO 2021-11-08 04:46:56,318 INFO 2021-11-08 04:46:56,318 INFO 2021-11-08 04:46:56,318 INFO 2021-11-08 04:46:56,318 INFO 2021-11-08 04:46:56,318 INFO 2021-11-08 04:46:56,318 INFO 2021-11-08 04:46:56,318 INFO 2021-11-08 04:46:57,568 INFO 2021-11-08 04:46:57,568 INFO 2021-11-08 04:46:57,568 INFO 2021-11-08 04:46:57,569 INFO 2021-11-08 04:46:57,569 INFO 2021-11-08 04:46:58,255 INFO 2021-11-08 04:47:40,912 INFO 2021-11-08 04:48:24,477 INFO 2021-11-08 04:49:08,110 INFO 2021-11-08 04:49:51,826 INFO 2021-11-08 04:50:35,372 INFO 2021-11-08 04:51:18,479 INFO 2021-11-08 04:51:29,527 INFO 2021-11-08 04:51:29,527 INFO 2021-11-08 04:51:29,527 INFO 2021-11-08 04:51:29,527 INFO 2021-11-08 04:51:29,527 INFO 2021-11-08 04:51:44,653 INFO 2021-11-08 04:51:44,653 INFO 2021-11-08 04:51:44,654 INFO 2021-11-08 04:51:44,654 INFO 2021-11-08 04:51:59,475 INFO 2021-11-08 04:51:59,475 INFO 2021-11-08 04:51:59,475 INFO 2021-11-08 04:51:59,475 INFO 2021-11-08 04:52:14,330 INFO 2021-11-08 04:52:14,330 INFO 2021-11-08 04:52:14,330 INFO 2021-11-08 04:52:14,330 INFO 2021-11-08 04:52:29,411 INFO 2021-11-08 04:52:29,411 INFO 2021-11-08 04:52:29,411 INFO 2021-11-08 04:52:29,411 INFO 2021-11-08 04:52:44,290 INFO 2021-11-08 04:52:44,290 INFO 2021-11-08 04:52:44,290 INFO 2021-11-08 04:52:44,290 INFO 2021-11-08 04:52:59,189 INFO 2021-11-08 04:52:59,189 INFO 2021-11-08 04:52:59,189 INFO 2021-11-08 04:52:59,189 INFO 2021-11-08 04:53:14,166 INFO 2021-11-08 04:53:14,166 INFO 2021-11-08 04:53:14,166 INFO 2021-11-08 04:53:14,166 INFO 2021-11-08 04:53:29,006 INFO 2021-11-08 04:53:29,006 INFO 2021-11-08 04:53:29,006 INFO 2021-11-08 04:53:29,006 INFO 2021-11-08 04:53:43,852 INFO 2021-11-08 04:53:43,852 INFO 2021-11-08 04:53:43,852 INFO 2021-11-08 04:53:43,852 INFO 2021-11-08 04:53:59,565 INFO 2021-11-08 04:53:59,565 INFO 2021-11-08 04:53:59,565 INFO 2021-11-08 04:53:59,565 INFO 2021-11-08 04:54:14,571 INFO 2021-11-08 04:54:14,571 INFO 2021-11-08 04:54:14,571 INFO 2021-11-08 04:54:14,571 INFO 2021-11-08 04:54:29,543 INFO 2021-11-08 04:54:29,543 INFO ............................ bash run.sh train ComplEx FB15k 0 0 1024 256 1000 500.0 1.0 0.001 150000 16 -de -dr -r 0.000002
Model: ComplEx
Data Path: data/FB15k
#entity: 14951
#relation: 1345
#train: 483142
#valid: 50000
#test: 59071
Model Parameter Configuration:
Parameter gamma: torch.Size([1]), require_grad = False
Parameter embedding_range: torch.Size([1]), require_grad = False
Parameter entity_embedding: torch.Size([14951, 2000]), require_grad = True
Parameter relation_embedding: torch.Size([1345, 2000]), require_grad = True
Ramdomly Initializing ComplEx Model...
Start Training...
init_step = 0
batch_size = 1024
negative_adversarial_sampling = 1
hidden_dim = 1000
gamma = 500.000000
negative_adversarial_sampling = True
adversarial_temperature = 1.000000
learning_rate = 0
Training average regularization at step 0: 2.061783
Training average positive_sample_loss at step 0: 0.959978
Training average negative_sample_loss at step 0: 2.498887
Training average loss at step 0: 3.791215
Evaluating on Valid Dataset...
Evaluating the model... (0/6250)
Evaluating the model... (1000/6250)
Evaluating the model... (2000/6250)
Evaluating the model... (3000/6250)
Evaluating the model... (4000/6250)
Evaluating the model... (5000/6250)
Evaluating the model... (6000/6250)
Valid MRR at step 0: 0.000718
Valid MR at step 0: 7412.979920
Valid HITS@1 at step 0: 0.000050
Valid HITS@3 at step 0: 0.000190
Valid HITS@10 at step 0: 0.000820
Training average regularization at step 100: 1.869630
Training average positive_sample_loss at step 100: 0.878554
Training average negative_sample_loss at step 100: 2.214018
Training average loss at step 100: 3.415917
Training average regularization at step 200: 1.649423
Training average positive_sample_loss at step 200: 0.795739
Training average negative_sample_loss at step 200: 1.878687
Training average loss at step 200: 2.986636
Training average regularization at step 300: 1.493370
Training average positive_sample_loss at step 300: 0.723991
Training average negative_sample_loss at step 300: 1.647611
Training average loss at step 300: 2.679172
Training average regularization at step 400: 1.364369
Training average positive_sample_loss at step 400: 0.668379
Training average negative_sample_loss at step 400: 1.480148
Training average loss at step 400: 2.438632
Training average regularization at step 500: 1.252640
Training average positive_sample_loss at step 500: 0.615634
Training average negative_sample_loss at step 500: 1.347466
Training average loss at step 500: 2.234190
Training average regularization at step 600: 1.153765
Training average positive_sample_loss at step 600: 0.570805
Training average negative_sample_loss at step 600: 1.245437
Training average loss at step 600: 2.061886
Training average regularization at step 700: 1.065076
Training average positive_sample_loss at step 700: 0.524925
Training average negative_sample_loss at step 700: 1.163066
Training average loss at step 700: 1.909072
Training average regularization at step 800: 0.984837
Training average positive_sample_loss at step 800: 0.489442
Training average negative_sample_loss at step 800: 1.097700
Training average loss at step 800: 1.778408
Training average regularization at step 900: 0.911781
Training average positive_sample_loss at step 900: 0.451165
Training average negative_sample_loss at step 900: 1.044625
Training average loss at step 900: 1.659676
Training average regularization at step 1000: 0.845027
Training average positive_sample_loss at step 1000: 0.363237
Training average negative_sample_loss at step 1000: 1.000880
Training average loss at step 1000: 1.527086
Training average regularization at step 1100: 0.783731
Training average positive_sample_loss at step 1100: 0.312674
Training average negative_sample_loss at step 1100: 0.966706
Training average loss at step 1100: 1.423422
Training average regularization at step 1200: 0.726847
Training average positive_sample_loss at step 1200: 0.310942
.......................................................................

However, before that time, I ran the following command line "bash run.sh train RotatE wn18 0 0 512 1024 500 12.0 0.5 0.0001 80000 8 -de 1.10.0+cu102" (with dataset wn18), and I also found that your program still has a bug "RuntimeError: CUDA out of memory". Would you please explain to me why sometimes your program has a bug "RuntimeError: CUDA out of memory", but why sometimes your program could run successfully by changing the dataset? How did you debug with this problem?

dl-box@DL-Box:~/Downloads/RotatE$ bash run.sh train RotatE wn18 0 0 512 1024 500 12.0 0.5 0.0001 80000 8 -de
1.10.0+cu102
Start Training......
2021-11-08 04:46:15,756 INFO Model: RotatE
2021-11-08 04:46:15,756 INFO Data Path: data/wn18
2021-11-08 04:46:15,757 INFO #entity: 40943
2021-11-08 04:46:15,757 INFO #relation: 18
2021-11-08 04:46:15,886 INFO #train: 141442
2021-11-08 04:46:15,890 INFO #valid: 5000
2021-11-08 04:46:15,894 INFO #test: 5000
2021-11-08 04:46:16,147 INFO Model Parameter Configuration:
2021-11-08 04:46:16,147 INFO Parameter gamma: torch.Size([1]), require_grad = False
2021-11-08 04:46:16,147 INFO Parameter embedding_range: torch.Size([1]), require_grad = False
2021-11-08 04:46:16,147 INFO Parameter entity_embedding: torch.Size([40943, 1000]), require_grad = True
2021-11-08 04:46:16,147 INFO Parameter relation_embedding: torch.Size([18, 500]), require_grad = True
2021-11-08 04:46:19,692 INFO Ramdomly Initializing RotatE Model...
2021-11-08 04:46:19,692 INFO Start Training...
2021-11-08 04:46:19,692 INFO init_step = 0
2021-11-08 04:46:19,692 INFO batch_size = 512
2021-11-08 04:46:19,692 INFO negative_adversarial_sampling = 1
2021-11-08 04:46:19,692 INFO hidden_dim = 500
2021-11-08 04:46:19,692 INFO gamma = 12.000000
2021-11-08 04:46:19,692 INFO negative_adversarial_sampling = True
2021-11-08 04:46:19,692 INFO adversarial_temperature = 0.500000
2021-11-08 04:46:19,692 INFO learning_rate = 0
Traceback (most recent call last):
File "codes/run.py", line 361, in
main(parse_args())
File "codes/run.py", line 305, in main
log = kge_model.train_step(kge_model, optimizer, train_iterator, args)
File "/home/dl-box/Downloads/RotatE/codes/model.py", line 267, in train_step
negative_score = model((positive_sample, negative_sample), mode=mode)
File "/home/dl-box/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/dl-box/Downloads/RotatE/codes/model.py", line 159, in forward
score = model_func[self.model_name](head, relation, tail, mode)
File "/home/dl-box/Downloads/RotatE/codes/model.py", line 225, in RotatE
score = score.norm(dim = 0)
File "/home/dl-box/.local/lib/python3.6/site-packages/torch/_tensor.py", line 442, in norm
return torch.norm(self, p, dim, keepdim, dtype=dtype)
File "/home/dl-box/.local/lib/python3.6/site-packages/torch/functional.py", line 1442, in norm
return _VF.frobenius_norm(input, _dim, keepdim=keepdim)
RuntimeError: CUDA out of memory. Tried to allocate 1000.00 MiB (GPU 0; 10.92 GiB total capacity; 7.00 GiB already allocated; 22.62 MiB free; 7.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

	re_head, im_head = torch.chunk(head, 2, dim=2)
	re_tail, im_tail = torch.chunk(tail, 2, dim=2)

	#Make phases of relations uniformly distributed in [-pi, pi]

	phase_relation = relation/(self.embedding_range.item()/pi)

	re_relation = torch.cos(phase_relation)
	im_relation = torch.sin(phase_relation)