Giter Site home page Giter Site logo

biomedical-translation's Introduction

biomedical-translation

**科学技术大学-微软亚洲研究院创新实践项目 医学文献翻译组

数据

自己收集所得的数据:

OneDrive地址

Google云盘地址(备用)

环境依赖

要求python>=3.6,torch>=1.6.0。主要依赖有

  • transformers
  • datadets
  • sentencepiece
  • nltk
  • sacrebleu
  • pdfplumber
  • PyPDF2

在命令行输入

pip install -r requirements.txt

即可快速配置好环境。

模型

1. 基于pytorch实现的标准transformer模型(无预训练模型)

训练方法:

首先基于数据集训练分词器:

cd typical_transformer
python mytokenize.py

分词器训练完毕后,再训练整个模型:

python main.py

可根据需求修改config.py中的模型参数。

2. 基于transformers库实现的transformer模型(有预训练模型)

训练方法:

cd pretrained_transformers
python train.py

使用该模型翻译文献:

python main.py --pdf --field abstract --filename FILENAME_WITH_PATH

3. 基于transformers库实现的mBART模型(有预训练模型)

训练方法:

cd pretrained_transformers
python train_mbart.py

使用该模型翻译文献:

python main_mbart.py --pdf --field abstract --filename FILENAME_WITH_PATH

结果

无预训练transformer 预训练transformer 预训练mBART
BLEU 20.93 25.33 28.96

翻译单句示例:

模型 结果
原文 However, the within-host evolutionary dynamics of influenza viruses remain incompletely understood, in part because most studies have focused on within-host virus diversity of infections in otherwise healthy adults based on single timepoint data.
无预训练transformer 流感嗜血杆菌在变迁1区, 在病毒感染过程中起重要作用, 主要从单面资料提炼, 在病毒感染成人基础上,由基因四季度图象数据控制。
预训练transformer 但是, 对流感病毒宿主内演化动力仍不完全认识, 部分是由于多数研究都以单时点数据为基础, 着重讨论了原代健康成人感染宿主内病毒多样性。
预训练mBART 流感病毒在宿主体内的演变动力学至今尚未完全阐明, 主要是基于单个时间点资料对健康成人体内流感病毒感染者感染的多样性进行研究。

代码文件目录结构说明

get_data

主要用于获取数据。

typical_transformer

基于pytorch实现transformer的基本架构。

  • beam_decoder.py 实现束搜索解码
  • config.py 模型参数配置
  • data_loader.py 装载数据
  • model.py 模型基本架构的实现
  • main.py 主函数
  • mytokenize.py 训练分词器
  • train.py 训练模型
  • utils.py 某些功能函数的实现

pretrained_transformers

基于transformers库及其所提供的预训练模型进行训练。

  • main_mbart.py 使用mBART模型的主函数
  • main.py 使用transformer模型的主函数
  • metric.py 用于计算模型BLEU分数
  • train_mbart.py 训练mBART模型
  • train.py 训练transformer模型

biomedical-translation's People

Contributors

hankerwu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

shunsunsun

biomedical-translation's Issues

I can‘t run main_mbart.py

I run train_mbart.py and get the result file
image
but when I run main_mbart.py, I find the model path is not correspondding to the path in main_mbart.py。
image

can you help me?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.