Giter Site home page Giter Site logo

markhsia / aicup-deidentification-of-medical-data Goto Github PK

View Code? Open in Web Editor NEW

This project forked from windsuzu/aicup-deidentification-of-medical-data

0.0 1.0 0.0 5.82 MB

Chinese NER problem that needs to capture 18 types of entities in medical conversation text. The process is divided into 4 parts that are encapsulated in high-level abstract classes. We control the workflow in a single Jupyter notebook.

License: MIT License

Python 94.54% Jupyter Notebook 5.46%

aicup-deidentification-of-medical-data's Introduction

Contributors MIT License PR Welcome Author LinkedIn


Logo

AICUP Deidentification-of-Medical-Data

AICUP 醫病資料去識別化
View Demo · Report Bug · Request Feature

Table of Contents

About

這個專案來自 AICUP 競賽 - 醫病資料去識別化,該競賽提供了從成功大學醫院收集的臨床對話和相關訪談的文字內容。其中,文本的隱私內容和命名實體都是由人工標註的。 F1-Score 將被用來評估測試數據集上預測的正確性。 簡而言之,這個競賽就是中文的 NER (named-entity-recognition) 任務,我們必須在文字中識別出 18 種命名實體。 我們不只想要提升任務的表現,還想藉由該任務學習應用 design pattern 於一個 AI 專案。

The competition provides information on clinical conversations and related interviews collected from the NCKU Hospital. The private contents and named entities of the text data are marked manually. The F1-Score will be used to evaluate the correctness of predictions on the test dataset. In short, this competition is the Chinese NER (named-entity-recognition) task, where we must identify 18 types of named entities in text. We not only want to improve the performance of the task, but we also want to use the task to learn to apply the design pattern to an AI project.

Built With
  • Python 3
  • PyTorch
  • Transformers
  • Tensorflow 2
  • Jupyter Notebook
  • absl-py

Getting Started


Dataset and Baseline

Baseline Source Code


Design Pattern

我們將不同 notebook 都切成四個部分: data generator, data preprocessor, trainer, predictor,並以這四個為基礎分別建立他們的 abstract class。最終在一個終端控制的 main notebook 使用 absl.flags 來操控所有的類別。

Abstract Classes

Illustration
data generator data preprocessor trainer predictor
source code source code source code source code

Main

Illustration
main data generator data preprocessor trainer predictor
source code source code source code source code source code

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Reach out to the maintainer at one of the following places:

Acknowledgements

aicup-deidentification-of-medical-data's People

Contributors

windsuzu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.