Giter Site home page Giter Site logo

chip2021-task3-top3's Introduction

CHIP2021-Task3-临床术语标准化任务

评测网站: http://cips-chip.org.cn/2021/eval3

所有的代码都是基于我们开源的ark-nlp实现。 本次CHIP2021的临床术语标准化任务是没有A榜的,所以代码调试都是在天池的中文医疗信息处理数据集CBLUE的临床术语标准化任务上完成的

ark-nlp地址:https://github.com/xiangking/ark-nlp

中文医疗信息处理数据集CBLUE:https://tianchi.aliyun.com/dataset/dataDetail?dataId=95414

运行设备

Cuda:11.0
GPU:GeForce RTX 3060 * 1
显存:12GB
CPU:7核 Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
内存:20GB
硬盘:100GB SSD

运行python环境

Python 3.8.10
pip install ark-nlp==0.0.2
pip install scikit-learn 
pip install pandas
pip install elasticsearch
pip install openpyxl
pip install python-Levenshtein

模型简介

模型主要分为三部分:召回、个数预测、相似预测

1. 召回

将ICD文件、训练集合、清洗后的训练集加入ES创建索引

2.个数预测

1637117921242

对非标准词,直接使用文本分类对其对应的标准词个数进行预测(label分别为对应标准词一个,标准词两个和标准词两个以上)

3.相似预测

将非标准词和标准词拼接成如下bert输入,预测相似性

1637118179438

复现步骤

1. 创建ES索引

使用docker创建容器

docker pull nshou/elasticsearch-kibana
docker run -it --name es-kibana -d -p 8080:9200 -p 5601:5601 nshou/elasticsearch-kibana

使用如下代码创建索引

python es_index.py
2.整体复现
bash ./run.sh

PS:创建conda环境时会出现y/n选项,请手动输入y进行环境创建

3.run.sh各命令说明
  • 必要的模型存储文件夹创建

    • mkdir -p ./checkpoint/textsim
    • mkdir -p ./checkpoint/predict_num
  • 创建代码执行的虚拟环境

    • conda create -n goodwang python=3.8.10
    • conda activate goodwang
  • 安装依赖包

    • pip install ark-nlp
    • pip install scikit-learn
    • pip install pandas
    • pip install elasticsearch
    • pip install openpyxl
    • pip install python-Levenshtein
  • 数据预处理,生成相似模型训练所需的训练数据

    • python data_process.py
  • 训练

    • 训练相似模型:python textsim.py
    • 训练个数预测模型:python predictnum.py
  • 预测

    • python predict.py

预训练模型

1. 下载链接
https://huggingface.co/nghuyong/ernie-1.0
2. 下载完成后的处理步骤
不需要处理,代码中会自动下载和处理

chip2021-task3-top3's People

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.