Giter Site home page Giter Site logo

oag-taxo's Introduction

OAG-taxo

Introduction

This project aims to implement several methods for taxonomy expansion. We provide three methods: Bilinear, TaxoExpan, and TaxoEnrich. Also, we provide inference on AI taxonomy via pre-trained models on Computer Science taxonomy.

This work is mainly based on the work of TaxoEnrich. We choose it because it has many available models and trainers.

Environment

You need to prepare an environment of cuda10 + dgl0.4.0. It can be only used on the Graphics Card below 30.

Install requirements.txt via pip install -r requirements.txt (test with Python 3.7)

Run the following command before running any methods.

export PYTHONPATH="`pwd`:$PYTHONPATH"

Data Preparation

If you want to try the dataset of Mag-CS [Aliyun], Mag-full, and OAG-AI [Aliyun] on these models, we have prepared the dataset on Google Drive. You can put MAG-CS/MAG-full/OAG-AI folder in the data directory in the project root directory.

If you want to try other datasets, you can follow the methods mentioned in Taxoenrich. In short, you need to prepare the x.terms file, x.taxo file. Next, run the embedding_generation.py and generate_dataset_binary.py, then you can get the x.bin file for training.

For example,

python data_creation/embedding_generation.py --dataset oag-ai
python data_creation/generate_dataset_binary.py -d data/OAG_AI -t "Artificial Intelligence" -p 0

Train the model

try: python train.py --config config-file, in which config-file has been prepared in config_files folder. Run the enrich model with config.test.enrich.json. Run the config.test.baseline.json for Billiear Model. Run the config.test.baselineextmn.json for TaxoExpan Model. For example,

python train.py -c config_files/MAG-CS/config.test.enrich.json

Infer the model

We provide inference methods for the Artificial Intelligence dataset from pre-trained models on Computer Science taxonomy.

python inner_infer.py --resume your_model_path_here --config config_files/MAG-CS/config.test.enrich.json

We provide our pre-trained models for you here

Config File Prepared

For example, in ./config_file/mag_cs, we introduce each file's usage:

config.test.enrich.json: TaxoEnrich method on completion task config.test.baseline.json: baseline Bilinear method on completion task config.test.baselineex.json: TaxoExpan method on expansion task config.test.baselineextmn.json: TaxoExpan method on completion task

config.valid.X.json means the corresponding infer config file for the X method and config.test.X.json

If you do not have enough GPU memory for training, decrease the batch size and the number of negative samples.

References

[1] Jiaming Shen, Zhihong Shen, Chenyan Xiong, Chi Wang, Kuansan Wang and Jiawei Han ”TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network”, in Proc. 2020 Int. World Wide Web Conf. (WWW’20), Taipei, Taiwan, Apr. 2020.

[2] Minhao Jiang, Xiangchen Song, Jieyu Zhang and Jiawei Han, “TaxoEnrich: Self-Supervised Taxonomy Completion via Structure-Semantic Representations”, in Proc. The ACM Web Conf. 2022 (WWW’22), April 2022

oag-taxo's People

Contributors

zfjsail avatar oasis-git avatar

Stargazers

LI, Tong avatar  avatar SQ avatar zhoujiang avatar  avatar

Watchers

ZhuYifan avatar Sleepy_chord avatar Aohan Zeng avatar Chenhui Zhang avatar Qingsong Lv avatar Dan avatar Zhengxiao Du avatar  avatar

oag-taxo's Issues

训练环境

你好:训练python是多少啊,我用3.7提示说pickle的版本有问题

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.