Giter Site home page Giter Site logo

bert-dti_11yd's Introduction

BERT-DTI

This repo provide the experiment codes for the KD-DTI benchmark, which aims to extract Drug-Target Interaction knowledge from biomedical literatures. Our code is based on BERT-NMT.

Public version dataset is aviailable at here

Get stared:

Prepare environment

Run ./utils/prepare_environment.sh to install required package and install bert-nmt to default path /tmp/bert-nmt/

Preprocess the raw data:

Run ./data_scripts/build_seq2seq_data.sh: a script that preprocess the raw files, it takes two params:

  • input_dir: path to dir contain json raw data
  • output_dir: path to save processed seq2seq data Tips: see example params in the scripts

In this step, we need to process raw input into train.x, train.y, valid.x, valid.y, test.x, test.y

For the *.x files, each line is a document.

For the *.y files, each line is made up of drug_1 relation_1 target_1 drug_2 relation_2 target_2, etc

Notice!! Before processing the data, you should first register a DrugBank account, download the xml data set, and replace the entity id with the entity name in the drugbank.

Tokenize and Binarize data:

Run ./data_scripts/move_and_bin_data.sh: a script that tokenize and binarize the preprocessed files, it takes two params:

  • input_dir: path to seq2seq raw data
  • script_dir: code dir for BERT-DTI Tips: see example params in the scripts

In this step, we first use build_bpe_data.sh to get the BPE data.

And get bin data for different settings:

  • For conventional model, use bin.sh
  • For bert model, use bin-bert.sh
  • If you woud like to use PubMEBBERT, please use bin-pubmedbert.sh.

Training and Inference

All train and inference scripts can be found at ./train_and_test_scripts/

For training, run ./train_and_test_scripts/train_seq2seq{pretrained_model_name}.sh, it takes four params:

  • dr: dropout rate
  • las: label smoothing rate
  • lr: learning rate
  • data_path: path to the processed /data-bin, eg: ./data/seq2seq/data-bin-BERT

For inference, run ./train_and_test_scripts/predict_seq2seq{pretrained_model_name}.sh, it takes three params:

  • model: path to checkpoint pt file
  • data_path: path to dir of bin data
  • output_file: path to result file

Evaluation

Run ./evaluation_scripts/hard_match_evaluation.py to get results An example of usage is provided in ./evaluation_scripts/run_hard_eval.sh

bert-dti_11yd's People

Contributors

atmahou avatar bert-nmt avatar trellixvulnteam avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.