Giter Site home page Giter Site logo

kylelkh / ruby-pinyin Goto Github PK

View Code? Open in Web Editor NEW

This project forked from janx/ruby-pinyin

1.0 1.0 0.0 9.18 MB

中文汉字转拼音, 支持中英文符号混合词语。Pinyin is a romanization system (phonemic notation) of Chinese characters, this gem helps you to convert Chinese characters into pinyin form.

License: BSD 3-Clause "New" or "Revised" License

Ruby 100.00%

ruby-pinyin's Introduction

ruby-pinyin: 支持多音字的汉字转拼音工具

Build Status

ruby-pinyin: zhī chí duō yīn zì de hàn zì zhuǎn pīn yīn gōng jù

ruby-pinyin可以把汉字转化为对应的拼音,并能够较好的处理多音字的情况。比如:

    PinYin.of_string('南京市长江大桥', :ascii)

能够正确的将“长”转为"chang2", 而不是"zhang3".

Features

  • 支持多音字
  • 使用最新的UNICODE数据(6.3.0 published at 2013/02/26)
  • 能够显示数字或者UNICODE音调(eg: 'cao1', 'cāo')
  • 丰富的API
  • 支持中英文标点混合字符串
  • 中文标点转为英文标点
  • 支持自定义读音

Installation

    gem install ruby-pinyin

或者把ruby-pinyin加入你的Gemfile:

    gem 'ruby-pinyin'

Examples

    # encoding: utf-8
    require 'ruby-pinyin'

    # return ['jie', 'cao']
    PinYin.of_string('节操')

    # return ['jie2', 'cao1']
    PinYin.of_string('节操', true)
    PinYin.of_string('节操', :ascii)

    # return ["jié", "cāo"]
    PinYin.of_string('节操', :unicode)

    # 正确处理多音字: return ["nán", "jīng", "shì", "cháng", "jiāng", "dà", "qiáo"]
    PinYin.of_string('南京市长江大桥', :unicode)

    # return %w(gan xie party gan xie guo jia)
    PinYin.of_string('感谢party感谢guo jia')

    # return 'gan-xie-party-gan-xie-guo-jia'
    PinYin.permlink('感谢party感谢guo jia')

    # return 'gxpartygxguojia'
    PinYin.abbr('感谢party感谢guo jia')

    # return 'gan xie party, gan xie guo jia!'
    # PinYin.sentence保留标点符号, 同时用对应英文标点代替中文标点
    PinYin.sentence('感谢party, 感谢guo家!')

    # override readings with your own data file
    PinYin.override_files = [File.expand_path('../my.dat', __FILE__)]

更多的例子和参数请参考测试用例

配置

ruby-pinyin有两个PinYin::Backend: PinYin::Backend::Simple 以及PinYin::Backend::MMSeg. 默认是使用MMSeg backend, 支持多音字识别。如果你不需要多音字识别,或是对内存使用要求很高,或是有其它任何原因想要fallback到Simple backend, 可以如下配置:

PinYin.backend = PinYin::Backend::Simple.new

自定义发音

通过PinYin.override_files可以自定义某些字的发音。自定义的数据以普通文本文件存放,每行定义一个字的发音,以ASCII空格将汉字的unicode编码和拼音隔开。格式可参考lib/ruby-pinyin/data/Mandarin.dat文件。

欢迎任何帮助

如果你喜欢这个项目,请通过(不限)以下方式帮助她!

  • 各种使用
  • 各种宣传
  • 各种报告bug, 提供建议 (github issue tracker)
  • 各种修bug, 实现feature (github pull request)

LICENSE

BSD LICENSE

ruby-pinyin中的拼音数据由作者整理自互联网,你可以在ruby-pinyin之外的地方任意使用,但是请注明数据来自ruby-pinyin :-)

Contributors

ruby-pinyin's People

Contributors

eric-guo avatar jiangxin avatar pzgz avatar forresty avatar martin91 avatar

Stargazers

Kyle Liu avatar

Watchers

Kyle Liu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.