Giter Site home page Giter Site logo

rgtn-nie's Introduction

RGTN-NIE

Dataset and code for Representation Learning on Knowledge Graphs for Node Importance Estimation.

NIE Dataset

  • FB15k: a subset from FreeBase.

  • TMDB5k: original files are from Kaggle.

  • IMDB: original files are from IMDb Datasets. We provide the node text description files on Google Drive, and the graph construction files on Google Drive.

  • Processed features: Google Drive. Download the feature files and put them on 'datasets'.

Dependencies

  • pytorch 1.6.0
  • DGL 0.5.3

Training Examples

  • run sh train_geni.sh for GENI in FB15k (full batch training)
  • run sh train_geni_batch.sh for GENI in IMDB (minibatch training)
  • run sh train_two.sh for RGTN in FB15k (full batch training)
  • run sh train_two_batch.sh for RGTN in IMDB (minibatch training)

Note that hyperparameters may require grid search in small datasets.

Citation

If you find our work useful for your reseach, please consider citing this paper:

@inproceedings{Huang21RGTN-NIE,
  author    = {Han Huang and Leilei Sun and Bowen Du and Chuanren Liu and Weifeng Lv and Hui Xiong},
  title     = {Representation Learning on Knowledge Graphs for Node Importance Estimation},
  booktitle = {{KDD} '21: The 27th {ACM} {SIGKDD} Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14-18, 2021},
  pages     = {646--655},
  publisher = {{ACM}},
  year      = {2021}
}

rgtn-nie's People

Contributors

graph-0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

rgtn-nie's Issues

Ask For the problem of Transformer-XL to get text embedding

Dear Author:

Very appreciate for the sample code of RGTN, I try to use Hugging Face transformer-XL to get semantic embedding of the node text.
Here is the official code in the Hugging Face transformer-XL doc:

===
`from transformers import TransfoXLTokenizer, TransfoXLModel
import torch

tokenizer = TransfoXLTokenizer.from_pretrained("transfo-xl-wt103")
model = TransfoXLModel.from_pretrained("transfo-xl-wt103")

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

last_hidden_states = outputs.last_hidden_state`

===

But the last_hidden_states was the embedding vector of the word not the embedding of text, can you please give me the way to get the text embedding or release the part of sample code?

About dataset

Hi. First of all, I really appreciate to your wonderful work and code.

I open this issue to ask question:

How did you define node feature of each datasets?

Sincerely,
Wooseong Cho.

Ask For the problem of Loss Function

Hello, I read your paper "Representation Learning on Knowledge Graphs for Node Importance Estimation" carefully, and saw that the paper mentioned that you applied multiple RMSE Loss for training. But in your code, I only see one MSE Loss in the geni_train.py file, would like to ask where did you set up the Loss Function mentioned in the paper so that I can understand your paper better.

More details about the processded TMDB5K datasets

Thank you for your excellent work.
When looking through the processed TMDB5K dataset you provided, I notice that there seems to be no information about the mapping from "node name/term" to "node id". For example, in the file "node_info.tsv", there only four columns: "node_id","score","valid" and "description". And when I try to figure out what is the exact node with node_id xxxx, I cannot map the node_id back to its name/term.
I believe you preprocessed the TMDB5K from the kaggle movie dataset, could you please provide the intermediate data file or the data processing script so that I can get the "node name/term“ with certain "node_id"?

Paper says that code will be available here, but...

The Paper says that code will be available here and the paper has been published for more than a month. But there is no code uploaded here.
Besides, the baseline model GENI is a closed source project. Could you please give a more detailed description of the production of GENI. Thx.

Ask for the code of word embedding

Thanks for open source of EST task, it help me a lot.
I try to use this model for my own datasets, but I got some problem about the embedding process...
May I ask for the code that show graph and text in this paper?
thank you !

Question about input data format

Hi, I followed the tutorial and ran 'sh train_geni.sh' in my terminal but it returned 'FileNotFoundError: [Errno 2] No such file or directory: 'datasets/fb15k_rel.pk''. Can you help me?

About Hyperparmeters

Hi, thanks you for your wonderful work!

In your paper GENI, you wrote "The dimension of predicate embedding was set to 10 for all KGs," in Appendix B.3
But I failed to find the specific code relate to this.
Would you let me know where it is or how to do?

Sorry. I was confused.

My issues are:

  1. Is it fair to use same learning rate and weight decay for all KGs?
    FB15k has about 15k, and IMDB_S has about 1.1M nodes.
    So it might be unfair to use same learning rate and weight decay.

  2. There are so many hyperparameters
    I can find a lots of hyperparameters but never used: residual, attn-drop
    And those only used the default: num-heads, num-out-heads, num-layers, num-hidden, in-drop, negative-slope, pred-dim.
    Why did you designate them using argparse?

Sincerely,
Syzseisus

LTR Loss issue

Hi , I am very interested in your paper and read it carefully . But I don't understand the LRT Loss very well.I think if the predicted value closer to the ground truth, the loss value becomes smaller, isn't it? But for node v , if ground truth importance value is very small ,and predicted value is very big ,two values are very different . After Softmax and LRT Loss function ,I think the LRT Loss is a positive number approaching 0 , so small. Conversely, the loss value will be large. I can not understand the reason why the two values are very different but the loss value becomes so small, so I want to ask you this question

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.