Giter Site home page Giter Site logo

soujanyaporia / hyperred Goto Github PK

View Code? Open in Web Editor NEW

This project forked from declare-lab/hyperred

0.0 0.0 0.0 224 KB

This repository implements our EMNLP 2022 research paper A Dataset for Hyper-Relational Extraction and a Cube-Filling Approach.

Python 79.37% Jupyter Notebook 20.63%

hyperred's Introduction

A Dataset for Hyper-Relational Extraction and a Cube-Filling Approach

HD PWC Colab Jupyter

This repository implements our EMNLP 2022 research paper.

diagram

HyperRED is a dataset for the new task of hyper-relational extraction, which extracts relation triplets together with qualifier information such as time, quantity or location. For example, the relation triplet (Leonard Parker, Educated At, Harvard University) can be factually enriched by including the qualifier (End Time, 1967). HyperRED contains 44k sentences with 62 relation types and 44 qualifier types. Inspired by table-filling approaches for relation extraction, we propose CubeRE, a cube-filling model which explicitly considers the interaction between relation triplets and qualifiers.

Setup

Install Python Environment

conda create -n cube python=3.7 -y
conda activate cube
pip install torch==1.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt

Download HyperRED Dataset (Available on Huggingface Datasets)

python data_process.py download_data data/hyperred/
python data_process.py process_many data/hyperred/ data/processed/

Data Exploration

from data_process import Data

path = "data/hyperred/train.json"
data = Data.load(path)

for s in data.sents[:3]:
    print()
    print(s.tokens)
    for r in s.relations:
        print(r.head, r.label, r.tail)
        for q in r.qualifiers:
            print(q.label, q.span)

Data Fields

  • tokens: Sentence text tokens.
  • entities: List of each entity span. The span indices correspond to each token in the space-separated text ( inclusive-start and exclusive-end index)
  • relations: List of each relationship label between the head and tail entity spans. Each relation contains a list of qualifiers where each qualifier has the value entity span and qualifier label.

Data Example

An example instance of the dataset is shown below:

{              
  "tokens": ['Acadia', 'University', 'is', 'a', 'predominantly', 'undergraduate', 'university', 'located', 'in', 'Wolfville', ',', 'Nova', 'Scotia', ',', 'Canada', 'with', 'some', 'graduate', 'programs', 'at', 'the', 'master', "'", 's', 'level', 'and', 'one', 'at', 'the', 'doctoral', 'level', '.'],
  "entities": [
    {'span': (0, 2), 'label': 'Entity'},
    {'span': (9, 13), 'label': 'Entity'},
    {'span': (14, 15), 'label': 'Entity'},
  ],
  "relations": [
    {
      "head": [0, 2],
      "tail": [9, 13],
      "label": "headquarters location",
      "qualifiers": [
        {"span": [14, 15], "label": "country"}
      ]
    }
  ], 
}

Model Training

python training.py \
--save_dir ckpt/cube_prune_20_seed_0 \
--seed 0 \
--data_dir data/processed \
--prune_topk 20 \
--config_file config.yml

Research Citation

If the code is useful for your research project, we appreciate if you cite the following paper:

@inproceedings{chia-etal-2022-hyperred,
    title = "A Dataset for Hyper-Relational Extraction and a Cube-Filling Approach",
    author = "Chia, Yew Ken and Bing, Lidong and Aljunied, Sharifah Mahani and Si, Luo and Poria, Soujanya",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
    year = "2022",
    url = "https://arxiv.org/abs/2211.10018",
}

hyperred's People

Contributors

chiayewken avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.