dataec's Introduction

to correct text

主要过程是：输入data和错误text，输出正确text。分成两步：识别错误实体、改正错误实体。识别错误实体：首先识别实体，然后判断是否需要修改，是的话输出前后单词坐标。识别实体：使用现有实体识别模型+数字——得到实体位置。判断修改：平均化实体单词向量，与data总向量做attention 修改错误实体：每个错误实体的平均向量对data的实体平均做attention来直接copy实体

具体步骤

制造伪数据：利用分词来随机加，分词前后注意标记位置对，得到：src.data\src.text\tgt.text\tgt.textpos\tgt.datapos
尝试实体识别模型
写向量的平均等操作
尝试pointernetwork，学习阅读源码
后补模型架构

文件说明： data：真数据+伪数据 label-model：标注错误实体 pointer-model：修改实体 pointer-nn-model：指针网络学习

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

arvid-pku / dataec Goto Github PK