Giter Site home page Giter Site logo

biobert-mcnn's Introduction

BioBERET-MCNN

This repository provides the code for fune-tuning in Named Entity Recognition (NER) tasks. Please refer to our paper BioBERT-MCNN: A semantic integration approach for biomedical named entity recognition. for more details

Pre-trained weights

Pre-training was based on the BERT code provided by google and Pre-trained weights used BioBERT Base V1.0 (+PubMed 200K + PMC 270K) provided by BioBERT.

Datasets

We used 8 BioNER datasets (BC4CHEMD,BioNLP09,BioNLP11D,NCBI,Linnaeus,BC2GM,BC5CDR-disease and BC5CDR-chem) for experiments in our paper. All the datasets can get in the ner_data directory. The program will automatically process the original data if there is no tfrecord file in the data directory.

Installation

Sections below describe the installation and the fine-tuning process of our model based on Tensorflow 1.52 (python version = 3.6). To fine-tune BioBERT-MCNN, you need to download the pre-trained weights of BioBERT After downloading the pre-trained weights, requirements.txt to install our model as follows:

$ cd BioBERT-MCNN; pip install -r requirements.txt

Fine-tune

For fine-tuning in NER tasks, you can get the exmaple bash and run the bash as follow:

$ cd BioBERT-MCNN
$ ./fine-tune

The meaning of the parameters in bash is as follows:

  • model_config_path: set the pre-trained model configuration file path.
  • init_checkpoint: set the path of pre-trained weights .
  • model_dir: set the weights path of fine-tuning for saving.
  • vocab_file: set the path of vocabulary text.
  • train_batch_size: set the number of training batch size.
  • eval_batch_size: set the number of evaluation batch size.
  • task: set the task name.
  • do_train: weather to do training, default True.
  • do_eval: weather to do evaluation, default True.
  • do_predict: weather to do prediction, default True.
  • data_dir: set the path for read data.
  • learning_rate: set the number of learning rate.
  • trainable_layer: set the trainable layer of BioBERT, default 12
  • trainable_layer: set the trainable layer of BioBERT, default 12
  • label_mode: set the label mode for fine-tuning, the value should be BL or WPL.
  • train_epoch: set the number of train_epoch.
  • result_dir: set the saving path of evaluation results.
  • no_cnn: set weather to use MCNN for fine-tuning, default True denoting to not use MCNN.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.