Giter Site home page Giter Site logo

typinyin2hanzi's Introduction

拼音转汉字

基于HMM的拼音转汉字,支持字符粒度和词语粒度的模型训练与测试。

训练

训练数据:thuc新闻数据集+人民日报(1998+2014)+维基百科
训练脚本:train_hmm_model.py

测试

测试数据:开源语音数据集data_thchs30 (点击下载)
预处理脚本:thchs30_preprocess.py
测试脚本:model_test.py

利用预处理脚本,讲原始的thchs30数据集集合到一个文本中,方便测试
利用语音识别ASR的thchs30数据集(共计26777篇文档,每篇文档一句话)对拼音转汉字模型进行测试,测试结果如下:

模型方法 总字符数 正确字符数 字符准确率 耗时(s)
ty_py2hz_char 436196 327792 0.7515 825.00 (1.0x)
ty_py2hz_char_init 436196 327700 0.7513 840.29 (1.02x)
ty_py2hz_word 436196 337844 0.7745 2072.47 (2.51x)
ty_py2hz_word_init 436196 337924 0.7747 2209.89 (2.68x)
ty_py2hz_word_hmm 436196 339244 0.7777 2701.74 (3.27x)
ty_py2hz_word_init_hmm 436196 339367 0.7780 2890.15 (3.50x)
Pinyin2Hanzi 436196 309308 0.7091 8099.16 (9.82x)

_init表示训练之前导入了拼音词典进行拼音的初始化;
_hmm表示结巴分词启用HMM

typinyin2hanzi's People

Contributors

tianyunzqs avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.