A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition

The repo includes the code for the following paper:

@inproceedings{li2021sodner,
 title={A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition},
 author={Li, Fei and Lin, Zhichao and Zhang, Meishan and Ji, Donghong},
 booktitle={Proceedings of the ACL},
 year={2021}
}

Setup

Use "conda" or "virtualenv" to create a virtual python3 environment. Take "conda" as example, run:

conda create -n sodner python=3.6

Activate the environment.

conda activate sodner

Run the following command to install necessary packages.

pip install -r requirements.txt

Download the PyTorch AllenNLP version of SciBERT from here. Put it into the current directory.
Put the preprocessed data into "data" directory. There is a sample directory for your reference to preprocess original datasets.

Training & Evaluation

Below is the command to run experiments on the sample dataset. If use GPU, change -1 to 0 or other number that is larger than 0.

nohup ./train_sample.sh -1 > sample_0001.log 2>&1 &

Inference

Run the following command.

cuda_device=-1 allennlp predict models/sample_0001/model.tar.gz data/sample/sample.json --include-package sodner --predictor my_predictor --output-file prediction.txt

Debug

Change the settings in "sample_working_example.jsonnet" as below.

debug: true,
shuffle: false,

Add the following environment into your IDE such as PyCharm.

ie_test_data_path=./data/sample/sample.json;
ie_dev_data_path=./data/sample/sample.json;
ie_train_data_path=./data/sample/sample.json;
cuda_device=-1;

Run "debug_sample.py" with debug mode.

Data Preprocessing

We show an example to preprocess the CADEC data. First, download the code of Dai et al. 2020. Use their instructions to preprocess the CADEC data and get 3 output files, namely "train.txt", "dev.txt" and "test.txt".
Download Stanford CoreNLP. We use "stanford-corenlp-full-2018-10-05".
Modify the directory paths at the beginning of "preprocess_cadec.py" based on your environment. Create a "1.sh" file like

#!/bin/bash
sudo /xxx/envs/python37/bin/python "$@"

and run "1.sh preprocess_cadec.py".

Acknowledgement

We thank all the people that provide their code to help us complete this project. This project is built mainly based on the code published by Wadden et al. 2019.

albertpenny / sodner Goto Github PK

sodner's Introduction

A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition

Setup

Training & Evaluation

Inference

Debug

Data Preprocessing

Acknowledgement

sodner's People

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent