Giter Site home page Giter Site logo

input-method_pinyin's Introduction

⌨Input Method -- PinYin

Files Structure

  • To run IM_shell.py and IM.ipynb, make sure the files ended with ♦ exist.
  • To run IM_test.py, make sure the files ended with ♠ exist.
-- bin
---- IM_shell.py
---- IM_test.py
---- IM.ipynb
-- data # for IM_test.py
---- input.txt ♠
---- my_output.txt ♠
---- std_output.txt ♠
-- src
---- MPDH # for multiply processing data
-------- 0_data
------------ 0_all.csv ♦
-------- 0_Process
------------ log/
------------ 0_run.txt
------------ 00_template.txt
-------- 1_data
------------ 1_all.csv ♦
-------- 1_Process
------------ log/
------------ 1_run.txt
------------ 1_template.txt
-------- run_0.py
-------- run_0.py
-------- run.ipynb
---- 拼音汉字表.txt ♦
---- 一二级汉字表.txt ♦
---- 语料库.zip
---- data_handler.ipynb
---- pinyin_normal.txt
---- sentences_normal.txt ♦
---- sentences_raw.txt
---- sentences.txt

Pre-processed Data

Located in folder ./src, you can download the pre-processed data together with raw data with

wget https://cloud.tsinghua.edu.cn/f/1f6e33ed073e42cbb758/?dl=1 -O files.zip
unzip files.zip

Then you will have files following

-- 拼音汉字表.txt ♦
-- 一二级汉字表.txt ♦
-- pinyin_normal.txt
-- sentences_normal.txt ♦
-- sentences_raw.txt
-- sentences.txt

With command

wget https://cloud.tsinghua.edu.cn/f/ad2a884a89204c1eaa15/?dl=1 -O 语料库.zip
unzip 语料库.zip

you can get

-- 语料库
---- sina_news_gbk
-------- 2016-02.txt
-------- ...
-------- 2016-11.txt

You can use the ♦ files directly and run IM_shell.py and IM.ipynb. Or you can run data_handler.ipynb to generate these files manually.

Inference

run IM_shell.py or IM.ipynb when necessary files are ready.

python IM_shell.py

run IM.ipynb

input-method_pinyin's People

Contributors

richards0268 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.