MultPAX: Keyphrase Extraction using Language Models and Knowledge Graphs

This repositoy contains the source code of our paper: "MultPAX: Keyphrase Extraction using Language models and Knowledge Graphs". The paper has been accepted at the ISWC 2022 conference.

Fig. 1 the architecure of MultPAX framework

Summary:

Keyphrase extraction is the process of extracting a small set of phrases that best describe an input corpus.
The automatic generation of keyphrases has become essential for many natural language applications such as text categorization, indexing, and summarization.
In this paper, we propose MultPAX, a multitask framework for extracting present and absent keyphrases using pretrained language models and knowledge graphs. In particular, our framework contains three components:
1. MultPAX identifies present keyphrases from the input corpus.
2. MultPAX then links the input corpus with external knowledge graphs to get more relevant phrases.
3. MultPAX ranks the extracted phrases based on their semantic relatedness to input corpus.

Our Contributions:

1) We propose an *unsupervised* multitask framework that not only extracts present keyphrases, but also generate absent ones.
    
2) To the best of our knowledge, our approach is the first attempt that leverages existing knowledge graphs for keyphrase extraction without the need to create keyphrase vocabularies or phrase banks.
    
3) We introduce an embedding-based F1 score that considers semantic similarity between generated and ground-truth keyphrases rather than the existing exact-matching. 
    
4) We carried out several experiments on four benchmark datasets. The evaluation results showed that our approach proved to be more accurate compared with state-of-the-art baselines.

Repository Structure:

.
├── Baselines
│   ├── EmbedRank-Baseline.ipynb
│   ├── EmbedRank(Wordwise)- Baseline.ipynb
│   ├── TextRank-Baseline.ipynb
│   └── YAKE-Baseline.ipynb
├── Inspec experiment
│   └── MltPAX-Inspec.ipynb
├── Krapivin2009 experiment
│   └── MltPAX-Krapivin2009.ipynb
├── NUS experiment
│   └── MltPAX-NUS.ipynb
├── SemEval2010 experiment
│   └── MltPAX-SemEval2010.ipynb
└── .DS_Store

How to run:

We conduct several experiments on four benchmark datasets, namely: Inspec, SemEval2010, NUS and Krapivin2009. The datasets are available at the Dropbox Folder.

To setup the experiments, you need to install the following libraries via pip install -r requirments.txt or install them manually:

Python 3.7
keybert
sentence-transformers 2.2.0
SPARQLWrapper 2.0.0
SciPy 1.8.0
NumPy 1.21.5
Pandas 1.4.2
NLTK 3.6.6 
requests 2.27.1
py-babelnet

We provide our experiements as Jupyter notebooks (see Experiments Folder) and source files (see src Folder). We recommend using Jupyter notebooks for an interactive execution of our experiments. Furhtermore, we provide a Jupyter notebook for each experiments:

Baselines:

We obtain the implementation of baselines: TextRank, YAKE from the open source library PKE. The source-codes for these baselines are available at:

Furthermore, we implemented the EmbedRank using the BERT pretrained model from the spaCycake library. Our implementation can be found at:

EmbedRank

For the baseline AutoGen: We obtain the implemenation from its official GitHub repository

For the baseline CopyRNN, the implemenation can be obtained from its Github repository.

Evaluation

The following notebooks contains the implementation of the evaluation metrics used in our experiments:

Citation

@INPROCEEDINGS
{zahera2022multpax, 
author = "Hamada M. Zahera, Daniel Vollmers, Mohamed Ahmed Sherif and Axel-Cyrille Ngonga Ngomo", 
title = "MultPAX: Keyphrase Extraction using Language Models and Knowledge Graphs",
booktitle = "The 21th International Semantic Web Conference (ISWC) 2022", 
year = "2022", series = "Springer"}

dice-group / multpax Goto Github PK

multpax's Introduction

MultPAX: Keyphrase Extraction using Language Models and Knowledge Graphs

Summary:

Our Contributions:

Repository Structure:

How to run:

Baselines:

Evaluation

Citation

multpax's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent