Giter Site home page Giter Site logo

fast-transx's Introduction

Fast-TransX

This repository is a subproject of THU-OpenSK, and all subprojects of THU-OpenSK are as follows.

An extremely fast implementation of TransE [1], TransH [2], TransR [3], TransD [4], TranSparse [5] for knowledge representation learning (KRL) based on our previous pakcage KB2E ("https://github.com/thunlp/KB2E") for KRL. The overall framework is similar to KB2E, with some underlying design changes for acceleration. This implementation also supports multi-threaded training to save time.

These codes will be gradually integrated into the new framework [OpenKE].

Evaluation Results

Because the overall framework is similar, we just list the result of transE(previous model) and new implemented models in datesets FB15k and WN18.

CPU : Intel Core i7-6700k 4.00GHz.

FB15K:

Model MeanRank(Raw) MeanRank(Filter) Hit@10(Raw) Hit@10(Filter) Time
TransE (n = 50, rounds = 1000) 210 82 41.9 61.3 3587s
Fast-TransE (n = 50, threads = 8, rounds = 1000) 205 69 43.8 63.5 42s
Fast-TransH (n = 50, threads = 8, rounds = 1000) 202 67 43.7 63.0 178s
Fast-TransR (n = 50, threads = 8, rounds = 1000) 196 73 48.8 69.8 1572s
Fast-TransD (n = 100, threads = 8, rounds = 1000) 236 95 49.9 75.2 231s

WN18:

Model MeanRank(Raw) MeanRank(Filter) Hit@10(Raw) Hit@10(Filter) Time
TransE (n = 50, rounds = 1000) 251 239 78.9 89.8 1674s
Fast-TransE (n = 50, threads = 8, rounds = 1000) 273 261 71.5 83.3 12s
Fast-TransH (n = 50, threads = 8, rounds = 1000) 285 272 79.8 92.5 121s
Fast-TransR (n = 50, threads = 8, rounds = 1000) 284 271 81.0 94.6 296s
Fast-TransD (n = 100, threads = 8, rounds = 1000) 309 297 78.5 91.9 201s

More results can be found in ("https://github.com/thunlp/KB2E").

Data

Datasets are required in the following format, containing three files:

entity2id.txt: all entities and corresponding ids, one per line. The first line is the number of entities.

relation2id.txt: all relations and corresponding ids, one per line. The first line is the number of relations.

train2id.txt: training file, the first line is the number of triples for training. Then the follow lines are all in the format (e1, e2, rel). Note that train2id.txt contains ids from entitiy2id.txt and relation2id.txt instead of the names of the entities and relations.

We provide FB15K and WN18, and more datasets can be found in ("https://github.com/thunlp/KB2E"). If you use your own datasets, please check the format of your training file. Files in the wrong format may cause segmentation fault. Datasets in KB2E also need to change their formats before training.

Compile

g++ transX.cpp -o transX -pthread -O3 -march=native

g++ test_transX.cpp -o test_transX -pthread -O3 -march=native

Train

./transX [-size SIZE] [-sizeR SIZER]
         [-input INPUT] [-output OUTPUT] [-load LOAD]
         [-load-binary 0/1] [-out-binary 0/1]
         [-thread THREAD] [-epochs EPOCHS] [-nbatches NBATCHES]
         [-alpha ALPHA] [-margin MARGIN]
         [-note NOTE]

optional arguments:
-size SIZE           dimension of entity embeddings
-sizeR SIZER         dimension of relation embeddings
-input INPUT         folder of training data
-output OUTPUT       folder of outputing results
-load LOAD           folder of pretrained data
-load-binary [0/1]   [1] pretrained data need to load in is in the binary form
-out-binary [0/1]    [1] results will be outputed in the binary form
-thread THREAD       number of worker threads
-epochs EPOCHS       number of epochs
-nbatches NBATCHES   number of batches for each epoch
-alpha ALPHA         learning rate
-margin MARGIN       margin in max-margin loss for pairwise training
-note NOTE           information you want to add to the filename

Test

./test_transX [-size SIZE] [-sizeR SIZER]
         [-input INPUT] [-init INIT]
         [-binary 0/1] [-thread THREAD]
         [-note NOTE]

optional arguments:
-size SIZE           dimension of entity embeddings
-sizeR SIZER         dimension of relation embeddings
-input INPUT         folder of testing data
-init INIT           folder of embeddings
-binary [0/1]        [1] embeddings are in the binary form
-thread THREAD       number of worker threads
-note NOTE           information you want to add to the filename

Citation

If you use the code, please kindly cite the following paper and other papers listed in our reference:

Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, Xuan Zhu. Learning Entity and Relation Embeddings for Knowledge Graph Completion. The 29th AAAI Conference on Artificial Intelligence (AAAI'15). [pdf]

Reference

[1] Bordes, Antoine, et al. Translating embeddings for modeling multi-relational data. Proceedings of NIPS, 2013.

[2] Zhen Wang, Jianwen Zhang, et al. Knowledge Graph Embedding by Translating on Hyperplanes. Proceedings of AAAI, 2014.

[3] Yankai Lin, Zhiyuan Liu, et al. Learning Entity and Relation Embeddings for Knowledge Graph Completion. Proceedings of AAAI, 2015.

[4] Guoliang Ji, Shizhu He, et al. Knowledge Graph Embedding via Dynamic Mapping Matrix. Proceedings of ACL, 2015.

[5] Guoliang Ji, Kang Liu, et al. Knowledge Graph Completion with Adaptive Sparse Transfer Matrix. Proceedings of AAAI, 2016.

fast-transx's People

Contributors

helloxcq avatar prokil avatar sxndqc avatar thucsthanxu13 avatar zibuyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fast-transx's Issues

Missing documentation

Hello,

I have just managed to run it locally and I wanted to point out that some of the documentation is a bit misleading.

The description of the input file format should mention that triple2id.txt contains ids from entitiy2id and relation2id and not the names of the entities and relations. It is different from https://github.com/thunlp/KB2E. At first, I though that I can use just the same input files.

It is unclear what does the -nbatches parameter does. The default setting (=1) works best for me, but i don't get why.

Otherwise, thank you for the great package!

Daniil

SEGV signal found in TransD

A SEGV signal found when running program transD in TransD directory:

=================================================================
==8745==ERROR: AddressSanitizer: SEGV on unknown address 0x0000000000c0 (pc 0x7f46e86cd908 bp 0x7ffd676d5a30 sp 0x7ffd676d5350 T0)
    #0 0x7f46e86cd907 in _IO_vfscanf (/lib/x86_64-linux-gnu/libc.so.6+0x5b907)
    #1 0x7f46e954c5d0 in vfscanf (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x525d0)
    #2 0x7f46e954c749 in __interceptor_fscanf (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x52749)
    #3 0x4034fa in init() /home/mfc_fuzz/Fast-TransX/transD/transD.cpp:170
    #4 0x40e088 in main /home/mfc_fuzz/Fast-TransX/transD/transD.cpp:626
    #5 0x7f46e869282f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #6 0x401cf8 in _start (/home/mfc_fuzz/Fast-TransX/transD/transD+0x401cf8)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV ??:0 _IO_vfscanf
==8745==ABORTING

你好,请教关于transe的一些问题

1.关于epochs和batch这两个参数,epoch是不是指的要跑的批次数,而batches是不是指每一批中的三元组的数目?
2.训练的loss值到什么情况算是收敛?

编译出错

你好,我照此执行:
cd transE
g++ transE.cpp -o transE -pthread -O3 -march=native
编译出错,提示:
transE.cpp:1: error: bad value (native) for -march= switch
transE.cpp:1: error: bad value (native) for -mtune= switch
请帮忙看看

关于TransE实验结果的问题

在使用自己的数据集跑TransE的时候出现了维度较低时 meanRank很高,但hits@10指标却很好的情况(0.8),但拿这个embedding去做其他任务比如推荐效果很差,这大概是什么原因呢?为什么两个指标有这种不一致的情况?我的数据集情况: epoch =1000 dimension=20 lrate =0.1 数据集 6w实体,40w元组,6种关系,每个节点大概有6条边, 是一个音乐实体的数据集。

Segmentation fault with TransR while running the test_transR

Hi,

I am using the transR and WN18 data to train. Training is fineing. Then I am using the output .vec and the given test2id.txt given in the WN18 directory to make an inference from the data using test_transR. Now I am getting a segmentation fault(core dumped). Can someone tell me how to fix this issue?

TransR实验结果问题

我用Fast-transR跑WN11的时候,没有用其他的程序训练的embedding,按照alpha = 0.001,batch_size = 100,dimension = 50,margin = 1时训练,测试时的效果非常差,meanRank平均都有几千了,Hit@10只有0.00几,这显然不对,能否告诉readme的实验结果是如何跑出来的?

请教Fast-PTransE代码中的一些问题

在ptranse.cpp 中,有一些地方实在看不明白,希望麻烦解答下:

  1. 在randn函数中,dScope=rand(0.0,normal(miu,miu,sigma));的作用是什么?整个randn()函数实现的是什么功能?
  2. 在gradient_path()函数中,relationVec[r1* dimension + ii]+=belta*ptranseAlpha*x;我在算梯度时感觉应该是relationVec[r1 * dimension + ii]-=belta*ptranseAlpha*x;同样,relationVec[j * dimension + i]-=belta*ptranseAlpha*x;我觉得应该是relationVec[j * dimension + ii]+=belta*ptranseAlpha*x;。不知道这个是不是我计算有问题,麻烦查看下,谢谢。
  3. pransetrainMode() 函数中, if (randd(id) % 1000 < pr)表示什么意思?pr_path = 0.99*pr_path + 0.01;为什么要做这个操作?train_path(trainList[i]->r, j, relPath, 2*margin, pr*pr_path);中为什么要用2*margin而不是margin?

Fast_transE在WN18数据集的结果与给出相差较大

您好!我用当前版本的fast-transe和前一两个版本跑WN18数据集
出来的结果大致如下(参数使用size = 50,epoch = 1000,alpha = 0.001)
left 426.596802 0.798800
left(filter) 411.633392 0.943800
right 446.216400 0.808000
right(filter) 432.734406 0.941200
网站给出结果如下:
mean_rank|mean_rank_filter|hits@10|hits@10_filter
273 | 261 | 71.5 | 83.3
相差较大,请问是哪里有问题吗?还是数据和最初的WN18有变化了

How to create files:“set_num_l.txt” and "set_num_r" used in TranSparse

Could you please tell me how to create files:“set_num_l.txt” and "set_num_r" used in TranSparse?
I just use the file created by code that is provided by the paper.
But I find them diffferent from your files in your code.
And I don't understand the meaning of "numLef", "numRig" and "last".
If is it convenient to share you code to create these two files?
Thank you very much.

在WN18上跑TransE,结果与paper差距很大 (已解决)

您好,我用的参数

INT threads = 8;
INT bernFlag = 0;
INT loadBinaryFlag = 0;
INT outBinaryFlag = 0;
INT trainTimes = 1000;
INT nbatches = 1;
INT dimension = 50;
REAL alpha = 0.001;
REAL margin = 1.0;

在wn18上跑TransE,结果却非常差,请问可能是哪里错了呢:

left 1226.356201 0.221600
left(filter) 1216.166016 0.232600
right 1402.401367 0.204600
right(filter) 1392.722656 0.235600
left 369.013214 0.400800
left(filter) 358.822998 0.419600
right 281.390411 0.339200
right(filter) 271.711609 0.377600
left 1095.500000 0.523810
left(filter) 1095.309570 0.523810
right 825.595215 0.547619
right(filter) 825.428589 0.547619
left 1188.984863 0.190579
left(filter) 1188.887939 0.191121
right 1205.759033 0.145642
right(filter) 1180.281006 0.209529
left 1651.449219 0.066633
left(filter) 1626.444214 0.078243
right 2118.939941 0.060071
right(filter) 2118.886475 0.060071
left 547.074341 0.532743
left(filter) 545.986755 0.560177
right 489.092926 0.541593
right(filter) 488.010620 0.574336

谢谢指教!

Results of test_transE

Hi I tried to run your codes, and when I run test_transE, I got following results:

left 12155.270508 0.143083
left(filter) 11667.890625 0.186583
right 12955.742188 0.281667
right(filter) 12945.541016 0.325417
left 1880.772949 0.156500
left(filter) 1393.394775 0.204750
right 936.673889 0.299750
right(filter) 926.475159 0.343583
left 18177.582031 0.558659
left(filter) 18177.003906 0.573184
right 30702.458984 0.537430
right(filter) 30702.173828 0.546369
left 19323.531250 0.456829
left(filter) 19322.150391 0.483516
right 38445.351562 0.353218
right(filter) 38435.953125 0.395604
left 33830.359375 0.061051
left(filter) 33046.398438 0.111928
right 13426.873047 0.515545
right(filter) 13426.180664 0.521198
left 6602.977051 0.094034
left(filter) 6090.237305 0.140246
right 9167.539062 0.202552
right(filter) 9154.329102 0.257731

I think these results are printed in
https://github.com/thunlp/Fast-TransX/blob/master/transE/test_transE.cpp#L332

However, I don't understand each for loop means because there is no explanation about these codes and results.

Could you describe the meaning of these results?
In addition, I want to know to get the results (MeanRank, HITS@10) as written in Markdown
https://github.com/thunlp/Fast-TransX#evaluation-results

Thank you.

I meet Segmentation fault when I run your code on linux

I use gdb to debug
in line 184 sum += fabs(entityVec[last2 + ii] - entityVec[last1 + ii] - relationVec[lastr + ii]);
once last2 is too big(for examble: 1775403720),
I print entityVec[1775403720]
there is a tip: Cannot access memory at address 0x80019e18db30

I want to know how to fix this bug

Confidence.txt file

Hello - sorry if I missed it, but can you please point me to where the file "confidence.txt" and "train_pra.txt" are, for the Fast-PTransE method? I don't see it in any of the download locations or zip files that are mentioned. Thanks.

请教,TransE程序有些地方看不懂

由于自己水平有限,对于transE中有一些地方实在看不懂,希望不吝赐教:

  1. transE.cpp 负样本生成里面corrupt_head, corrupt_tail 里面大概怎么实现的负样本生成?
  2. test_transE.cpp 中triple结构体里的label(包括test2id.txt文件里也有)表示什么?
    希望得到解答,谢谢!

Segmentation fault whilst training with transR

Hello,
when running transR for training on the provided data I receive a segmentation fault. Could this be due to my machine or my parameters? Maybe you can provide an example script.

These are my parameters (which are more or less random to get it running):

./transR \
    -size 100 \
    -sizeR 100 \
    -input ../data/FB15K \
    -load ../data/FB15K \
    -output ./output \
    -load-binary 0 \
    -out-binary 0 \
    -thread  6 \
    -epochs 150 \
    -nbatches 1 \
    -alpha 0.1 \
    -margin 0.4 \
    -note ""

Thanks in advance!

段错误,核心已转储

(1)你好, https://github.com/thunlp/IEAJKE 该网址运行代码出现问题:段错误,核心已转储。提示过来你这边看是否有类似问题。

(2)我的机器ubuntu 16 编译返回0没问题,一运行就报段错误,核心已转储。网上查看说是内存或者stack问题。能帮忙看一下吗?谢谢

TransH.cpp的gradient函数

h
TransH.cpp 第356行 entityVecDao(等价于头实体)只进行了一次计算,根据变量多次出现的求导法则,不是应该计算两次吗?你看ADao在gradient函数中算了四次。

Validation file

I am confused on how to generate a valid file from a given training and testing file. Is there any documentation or code for that? Is test_transR.cpp the code which can predict missing entity in a triplet ?

test_transE.cpp 中的calc_sum函数问题

函数如下:

float calc_sum(int e1, int e2, int rel) { 
    float res = 0;
    int last1 = e1 * relationTotal * dimensionR + rel * dimensionR;
    int last2 = e2 * relationTotal * dimensionR + rel * dimensionR;
    int lastr = rel * dimensionR;
    for (int i = 0; i < dimensionR; i++)
        res += fabs(entityVec[last1 + i] + relationVec[lastr + i] - entityVec[last2 + i]);
    return res;
}

对last1和last2的赋值不能理解

为什么不是

int last1 = e1 * dimension; int last2 = e2 * dimension;

谢谢!

PCRA.py的问题请教

我在运行ptranse过程中发现train_pra.txt 文件中格式有点不太明白,而且实际只读取了一行数据进行训练,通过查看PCRA.py程序发现Fast版本中和林博士那个版本有点不同,如下所示:

Fast版本为:第189行代码 g.write(str(ent2id[e1])+" "+ent2id([e2])+' '+str(rel)+"\n")
而林博士的版本为:第180行代码g.write(str(e1)+" "+str(e2)+' '+str(rel)+"\n")

因此导致Fast版本中train_pra.txt数据存储格式不同,而Fast版本ptranse.cpp在读取train_pra.txt时可能会有问题,请您查看下,或者是我自己对程序理解有问题,希望得到您的解答,谢谢。

Fast-TransX的TransH

Fast-TransX的TransH效果更好,看懂c语言gradient梯度更新的计算有些困难。(可能是自己的失误)

transR无法重现页面上的结果

我尝试在FB15K上运行transR,但得到的结果与项目页面上的数字相去甚远,请问会是什么原因呢?
参数设置部分,我把向量长度也设为50,迭代次数100,是不过把线程数增加到48。

transR 段错误 (核心已转储)

$ ls
data out transR.cpp
$ ls data/
entity2id.txt relation2id.txt triple2id.txt
$ g++ transR.cpp -o transR -pthread -O3 -march=native
$ ./transR
段错误 (核心已转储)

我在尝试运行transR的时候遇到这个问题,请问这是我执行过程有问题,还是代码哪里有问题?
运行环境:Ubuntu 16.04.02

about the input data

in the train.txt, the formal (e1, e2, rel) represent the triples, example
/m/027rn /m/06cx9 /location/country/form_of_government, I want to know what's mean of the /m/027rn or /m/06cx9. how to understand the entity?

thank you very much.

TransE中O3优化导致train函数无法终止

系统为Archlinux,gcc版本8.2.1

具体表现为epoch < trainTimes始终为true,导致epoch超出给定的迭代数目后仍然在进行循环

无优化时程序运行正常,-O1起即出现循环无法终止的情况。将循环中的具体迭代代码注释掉仅保留printf也可复现

What's in file 'transRdata/entity2vec'?

When I saw this : FILE* f1 = fopen((inPath + "transRdata/entity2vec.bern").c_str(),"r");
FILE* f2 = fopen((inPath + "transRdata/relation2vec.bern").c_str(),"r");
I wanna know what are these two files?
Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.