BERT-DTI

This repo provide the experiment codes for the KD-DTI benchmark, which aims to extract Drug-Target Interaction knowledge from biomedical literatures. Our code is based on BERT-NMT.

Public version dataset is aviailable at here

Get stared:

Prepare environment

Run ./utils/prepare_environment.sh to install required package and install bert-nmt to default path /tmp/bert-nmt/

Preprocess the raw data:

Run ./data_scripts/build_seq2seq_data.sh: a script that preprocess the raw files, it takes two params:

input_dir: path to dir contain json raw data
output_dir: path to save processed seq2seq data Tips: see example params in the scripts

In this step, we need to process raw input into train.x, train.y, valid.x, valid.y, test.x, test.y

For the *.x files, each line is a document.

For the *.y files, each line is made up of drug_1 relation_1 target_1 drug_2 relation_2 target_2, etc

Notice!! Before processing the data, you should first register a DrugBank account, download the xml data set, and replace the entity id with the entity name in the drugbank.

Tokenize and Binarize data:

Run ./data_scripts/move_and_bin_data.sh: a script that tokenize and binarize the preprocessed files, it takes two params:

input_dir: path to seq2seq raw data
script_dir: code dir for BERT-DTI Tips: see example params in the scripts

In this step, we first use build_bpe_data.sh to get the BPE data.

And get bin data for different settings:

For conventional model, use bin.sh
For bert model, use bin-bert.sh
If you woud like to use PubMEBBERT, please use bin-pubmedbert.sh.

Training and Inference

All train and inference scripts can be found at ./train_and_test_scripts/

For training, run ./train_and_test_scripts/train_seq2seq{pretrained_model_name}.sh, it takes four params:

dr: dropout rate
las: label smoothing rate
lr: learning rate
data_path: path to the processed /data-bin, eg: ./data/seq2seq/data-bin-BERT

For inference, run ./train_and_test_scripts/predict_seq2seq{pretrained_model_name}.sh, it takes three params:

model: path to checkpoint pt file
data_path: path to dir of bin data
output_file: path to result file

Evaluation

Run ./evaluation_scripts/hard_match_evaluation.py to get results An example of usage is provided in ./evaluation_scripts/run_hard_eval.sh

trellixvulnteam / bert-dti_11yd Goto Github PK

bert-dti_11yd's Introduction

BERT-DTI

Get stared:

Prepare environment

Preprocess the raw data:

Tokenize and Binarize data:

Training and Inference

Evaluation

bert-dti_11yd's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent