Giter Site home page Giter Site logo

liuhuanyong / huannlp Goto Github PK

View Code? Open in Web Editor NEW
50.0 7.0 19.0 29.43 MB

self implement of NLP toolkit 个人实现NLP汉语自然语言处理组件,提供基于HMM与CRF的分词,词性标注,命名实体识别接口,提供基于CRF的依存句法接口。

Python 100.00%

huannlp's Introduction

HuanNLP

self implement of NLP toolkit 个人实现NLP汉语自然语言处理组件,提供基于HMM与CRF的分词,词性标注,命名实体识别接口,提供基于CRF的依存句法接口。

使用简介

引入

import nlp nlp = huannlp.HuanNLP('HMM') 或者 nlp = huannlp.HuanNLP('CRF') text = "刘焕勇硕士毕业于北京语言大学,目前在**科学院软件研究所工作"  

分词

words = huannlp.cut(text)

HMM模式:

['刘焕勇', '硕士', '毕业', '于', '北京', '语言', '大学', ',', '目前', '在', '中', '国', '科学', '院', '软', '件', '研究', '所', '工作']

CRF模式:

['刘焕勇', '硕士', '毕业于', '北京', '语言', '大学', ',', '目前', '在', '**科学院', '软件', '研究', '所', '工作']

词性标注

postags = huannlp.postag(text)

HMM模式:

['r', 'n', 'v', 'p', 'ns', 'n', 'n', 'w', 'nt', 'p', 'nd', 'n', 'n', 'n', 'a', 'n', 'v', 'u', 'n']

CRF模式:

['n', 'n', 'v', 'ns', 'n', 'n', 'w', 'nt', 'p', 'ni', 'n', 'v', 'u', 'n']

词性对照表

标记 词性 标记 词性 标记 词性 标记 词性
n 普通名词 nt 时间名词 nd 方位名词 nl 处所名词
nh 人名 nhf nhs ns 地名
nn 族名 ni 机构名 nz 其他专名 v 动词
vd 趋向动词 vl 联系动词 vu 能愿动词 a 形容词
f 区别词 m 数词 q 量词 d 副词
r 代词 p 介词 c 连词 u 助词
e 叹词 o 拟声词 i 习用语 j 缩略语
h 前接成分 k 后接成分 g 语素字 x 非语素字
w 标点符号 ws 非汉字字符串 wu 其他未知的符号 -- ---

命名实体识别

ners = huannlp.ner(text)

HMM模式:

{'TIM': [], 'PER': ['刘焕勇'], 'LOC': [], 'ORG': []}

CRF模式:

{'LOC': [], 'TIM': ['目前'], 'PER': ['刘焕勇'], 'ORG': ['**科学院', '北京语言大学']}

实体标记对照表

标记 实体类型
LOC 地名实体
PER 人名实体
ORG 机构实体
TIM 时间实体

依存句法标注

deps = nlp.dep(words, postags)

['刘焕勇', 'n', '硕士', 'n', 'ATT']
['硕士', 'n', '毕业于', 'v', 'SBV']
['毕业于', 'v', 'Root', '-', 'HED']
['北京', 'ns', '大学', 'n', 'ATT']
['语言', 'n', '大学', 'n', 'ATT']
['大学', 'n', '**科学院', 'ni', 'COO']
[',', 'w', '大学', 'n', 'WP']
['目前', 'nt', '大学', 'n', 'ATT']
['在', 'p', '软件', 'n', 'POB']
['**科学院', 'ni', '软件', 'n', 'ATT']
['软件', 'n', '研究', 'v', 'SBV']
['研究', 'v', '软件', 'n', 'VOB']
['所', 'u', '研究', 'v', 'ATT']
['工作', 'n', '**科学院', 'ni', 'VOB']

huannlp's People

Contributors

liuhuanyong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

huannlp's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.