Giter Site home page Giter Site logo

chiwenzhen / crf-word-segmenter Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 1.0 3.98 MB

chinese word segment using CRF++

Python 0.50% Makefile 2.17% C++ 39.26% Shell 20.33% CSS 2.46% HTML 27.24% Perl 1.35% JavaScript 1.37% TeX 0.83% Java 0.96% Perl 6 0.02% Ruby 0.13% C 3.36%

crf-word-segmenter's Introduction

Word Segmenter

1.Install CRF++

  • download CRF++
  • install CRF++
./configure
make
sudo make install

  • install python package(in CRF++-0.xx/python directory)

It enables python to load binary model from disk.

python setup.py build
sudo python setup.py install

2.Preparing Data

3.Convert data to CRF++ format

python conv_format.py icwb2-data/training/pku_training.utf8 train.data

4.CRF++ train

crf_learn -f 3 -c 4.0 CRF++-0.58/example/seg/template train.data model

training time: about 20 minutes

5.CRF++ test

python run_test.py model icwb2-data/testing/pku_test.utf8 test.result

6.Backoff2005 experiment

perl icwb2-data/scripts/score icwb2-data/gold/pku_training_words.utf8 icwb2-data/gold/pku_test_gold.utf8 test.result > a.txt

Reference

http://www.mutouxiaogui.cn/blog/?p=224

crf-word-segmenter's People

Contributors

chiwenzhen avatar

Stargazers

asdfs avatar  avatar

Watchers

 avatar

Forkers

yanghaocsg

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.