Giter Site home page Giter Site logo

xpinyin's Introduction

xpinyin

Translate Chinese hanzi to pinyin (拼音) by Python, 汉字转拼音

Install

Python version >= 3.6

pip install -U xpinyin

Python version < 3.6

pip install xpinyin==0.5.7

Usage

>>> from xpinyin import Pinyin
>>> p = Pinyin()
>>> # default splitter is `-`
>>> p.get_pinyin("上海")
'shang-hai'
>>> # show tone marks
>>> p.get_pinyin("上海", tone_marks='marks')
'shàng-hǎi'
>>> p.get_pinyin("上海", tone_marks='numbers')
>>> 'shang4-hai3'
>>> # remove splitter
>>> p.get_pinyin("上海", '')
'shanghai'
>>> # set splitter as whitespace
>>> p.get_pinyin("上海", ' ')
'shang hai'
>>> p.get_initial("上")
'S'
>>> p.get_initials("上海")
'S-H'
>>> p.get_initials("上海", '')
'SH'
>>> p.get_initials("上海", ' ')
'S H'
>>> # get_initials with retroflex, #39
>>> p.get_initials("上海", splitter='-', with_retroflex=True)
'SH-H'
>>> # New in version 0.7.0, get combinations of the multiple readings of the characters
>>> p.get_pinyins('模型', splitter=' ', tone_marks='marks')
['mó xíng', 'mú xíng']
>>> p.get_pinyins('模样', splitter=' ', tone_marks='marks')
['mó yáng', 'mó yàng', 'mó xiàng', 'mú yáng', 'mú yàng', 'mú xiàng']

xpinyin's People

Contributors

djuretic avatar eumiro avatar fanchong avatar iaiti avatar lxneng avatar riverstrider avatar shacharmirkin avatar tangsty avatar xuwei0455 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xpinyin's Issues

居然用的是国语字典

尴尬啊,最近用了该库或者用了该字典的库,纠正过程中,一开始以为用的是康熙字典,到后来才发现是国语字典,尴尬啊,回头有空谁整理一下新版的现代汉语词典吧。。。

Please document that words are not accounted for

Thank you for the work on this library. A big and easy improvement to its usefulness is to prominently document that it currently performs a character-level translation, which is incorrect for many common words and sentences.

>>> from xpinyin import Pinyin
>>> p = Pinyin()
>>> p.get_pinyin(u"了解")
'le-jie'

The output should be

'liao-jie'

在windows上安装失败

Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 20, in
File "C:\Users****\AppData\Local\Temp\pip-build-kr_xe5jj\xpinyin\setup.py", line 7, in
README = open(os.path.join(here, 'README.rst')).read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x8a in position 921: illegal multibyte sequence

Command "python setup.py egg_info" failed with error code 1 in C:\Users****\AppData\Local\Temp\pip-build-kr_xe5jj\xpinyin

生成音调

在Unicode字符集里已经有带音调的汉语拼音注音字母。
可以利用来生成带音调的注音。

厦门

好像不支持多音字。
p = Pinyin()

print p.get_pinyin(u"厦门")
sha-men

多音字

>>> p.get_pinyin("西藏")
'xicang'

请问是否有支持多音字的计划?
谢谢。

“嗯”字没有对应的拼音

from xpinyin import Pinyin
p = Pinyin()
print(p.get_pinyin('嗯'))

返回的结果仍然是"嗯"。
此外对于这种没有拼音的汉字是否可以抛出一个异常?否则很难知道有哪些内容没有被转换。

me and yao

Great package!

However, I notice that the very commonly used character 么 is often assigned the pinyin yāo. I'm not expecting that this package can be perfect (its 98%). This character is so common, it would be great if this can be fixed.

bug

from xpinyin import Pinyin
p = Pinyin()
p.get_pinyin(u"重庆市")
'zhong-qing-shi'

声调

mac osx 终端下出现:

>>>p.get_pinyin(u"上海",show_tone_marks=True)
u'sh\xe0ng-h\u01cei'

平仄不分

get_initials()这个方法没有区分平仄,比如“知”和“字”,我觉得它们应当分别返回ZH和Z,这是很重要的。但是目前它们都会返回Z

0.6.0无法在python2中使用

pip install xpinyin==0.6.0

import xpinyin
Traceback (most recent call last):
File "", line 1, in
File "/home/sx/.local/lib/python2.7/site-packages/xpinyin/init.py", line 112
def get_pinyins(self, chars: str, splitter: str = u'-',
^
SyntaxError: invalid syntax

这个类型声明python2还没有吧~

缺字?

0x4e00-0x9fa5 中没有

\u4e06 
\u4e37 
\u4e4a 
\u4e5b 
\u4e64 
\u4e65 
\u4e67 
\u4e6c 
\u4e6e 
\u4e6f 
\u4e72 
\u4e7a 
\u4e7b 
\u4e7c 
\u4e7d 
\u4e87 
\u4eaa 
\u4ebd 
\u4ed2 
\u4ee9 
\u4f2c 
\u4f66 
\u4f68 
\u4fa4 
\u4fe7 
\u4fec 
\u503f 
\u50a6 
\u50f2 
\u510f 
\u5159 
\u5161 
\u516f 
\u517a 
\u5183 
\u5186 
\u51a7 
\u51d6 
\u51e7 
\u51e9 
\u51ea 
\u5301 
\u5302 
\u5307 
\u5381 
\u5391 
\u53bc 
\u53fe 
\u545a 
\u5463 
\u5481 
\u549c 
\u54d6 
\u54d8 
\u54db 
\u551c 
\u551f 
\u5525 
\u5579 
\u55b8 
\u55e7 
\u55ed 
\u55f4 
\u5625 
\u567a 
\u5691 
\u5692 
\u56a1 
\u56ce 
\u56d5 
\u56d6 
\u5715 
\u5726 
\u5737 
\u5738 
\u5788 
\u578a 
\u57aa 
\u57af 
\u57b0 
\u57b3 
\u57d6 
\u580f 
\u5812 
\u5814 
\u5840 
\u5841 
\u5846 
\u5870 
\u589b 
\u58b8 
\u58b9 
\u58d7 
\u58e5 
\u58ea 
\u58ed 
\u5908 
\u5911 
\u591e 
\u593b 
\u594d 
\u599b 
\u5a10 
\u5a33 
\u5a54 
\u5a6e 
\u5a72 
\u5a88 
\u5a98 
\u5aab 
\u5aac 
\u5af2 
\u5b04 
\u5b1c 
\u5b2b 
\u5b33 
\u5b36 
\u5b67 
\u5c1b 
\u5c21 
\u5c57 
\u5c72 
\u5c76 
\u5c77 
\u5c83 
\u5cbc 
\u5cbe 
\u5cc5 
\u5ce0 
\u5d0a 
\u5d30 
\u5d59 
\u5d5c 
\u5d75 
\u5d76 
\u5d7b 
\u5db6 
\u5dbf 
\u5dd5 
\u5dea 
\u5dec 
\u5ded 
\u5dfc 
\u5e49 
\u5e64 
\u5e65 
\u5e92 
\u5ee4 
\u5eed 
\u5f41 
\u5f45 
\u5f94 
\u5fc4 
\u603a 
\u603d 
\u603e 
\u6056 
\u6077 
\u6125 
\u6150 
\u61f3 
\u6256 
\u62a3 
\u62e4 
\u6318 
\u6327 
\u6364 
\u63b5 
\u63b6 
\u63b9 
\u63fb 
\u63fc 
\u6457 
\u64b6 
\u64dc 
\u64dd 
\u651a 
\u657e 
\u658f 
\u65c0 
\u65d5 
\u6683 
\u66e2 
\u66fb 
\u6711 
\u6725 
\u6730 
\u6741 
\u6762 
\u6763 
\u6764 
\u6766 
\u67a0 
\u67a4 
\u67a6 
\u67a9 
\u67e8 
\u6803 
\u6806 
\u680d 
\u6836 
\u685b 
\u685d 
\u685e 
\u68ba 
\u68bb 
\u6919 
\u691a 
\u691b 
\u6921 
\u6922 
\u6923 
\u6926 
\u6927 
\u6928 
\u6929 
\u692c 
\u697e 
\u697f 
\u6981 
\u698a 
\u698b 
\u698c 
\u69d7 
\u69dd 
\u69de 
\u69e1 
\u6a2d 
\u6a2e 
\u6a30 
\u6a72 
\u6a73 
\u6a74 
\u6a75 
\u6a78 
\u6a7a 
\u6a7b 
\u6aaa 
\u6ab2 
\u6aca 
\u6ae4 
\u6ae6 
\u6af5 
\u6af7 
\u6b0c 
\u6b0d 
\u6b15 
\u6b1f 
\u6b44 
\u6b5a 
\u6b9d 
\u6bb1 
\u6bdc 
\u6bdd 
\u6bdf 
\u6bee 
\u6bf6 
\u6c1e 
\u6c3a 
\u6c62 
\u6c63 
\u6c7c 
\u6cf9 
\u6d1c 
\u6d4c 
\u6da5 
\u6e0b 
\u6e0f 
\u6e13 
\u6e6a 
\u6e6d 
\u6e82 
\u6e84 
\u6e8a 
\u6e8b 
\u6ee7 
\u6f48 
\u6f49 
\u6f71 
\u6f9a 
\u6f9b 
\u6f9d 
\u6fcf 
\u6ff5 
\u6ff8 
\u6ff9 
\u702d 
\u702e 
\u7050 
\u705c 
\u7073 
\u709a 
\u709e 
\u70bf 
\u70e5 
\u70ea 
\u70ee 
\u7101 
\u7111 
\u7112 
\u7139 
\u713d 
\u713e 
\u7140 
\u716f 
\u7176 
\u7177 
\u718d 
\u7195 
\u7196 
\u71b4 
\u71dd 
\u71de 
\u71f5 
\u71f6 
\u720e 
\u7218 
\u7220 
\u7226 
\u722b 
\u7233 
\u7257 
\u725c 
\u729e 
\u72a0 
\u731f 
\u7320 
\u7347 
\u7363 
\u7364 
\u73ef 
\u73f1 
\u7411 
\u7412 
\u743b 
\u7443 
\u7461 
\u748d 
\u7493 
\u74a4 
\u74b4 
\u74e7 
\u74e9 
\u74f0 
\u74f1 
\u74f2 
\u74f8 
\u74fc 
\u7505 
\u7553 
\u7560 
\u7569 
\u7582 
\u7666 
\u766a 
\u7677 
\u7775 
\u77a3 
\u781b 
\u783d 
\u783f 
\u7853 
\u7858 
\u785b 
\u7873 
\u7874 
\u78b5 
\u78b7 
\u78d7 
\u78ee 
\u7900 
\u7902 
\u7922 
\u794d 
\u7999 
\u79a3 
\u7a25 
\u7a43 
\u7a52 
\u7a5d 
\u7a66 
\u7a6f 
\u7aa4 
\u7aa7 
\u7abd 
\u7acd 
\u7acf 
\u7ad3 
\u7ad4 
\u7ad5 
\u7ae1 
\u7b02 
\u7b39 
\u7b3d 
\u7b7a 
\u7b7d 
\u7bae 
\u7bd0 
\u7bd2 
\u7c13 
\u7c17 
\u7c2f 
\u7c31 
\u7c42 
\u7c4e 
\u7c4f 
\u7c56 
\u7c61 
\u7c80 
\u7c82 
\u7c8c 
\u7c8f 
\u7c90 
\u7ca9 
\u7cab 
\u7cac 
\u7cad 
\u7cc0 
\u7cd8 
\u7cec 
\u7d26 
\u7d4b 
\u7d9a 
\u7d9b 
\u7dd5 
\u7e04 
\u7e05 
\u7e07 
\u7e28 
\u7e4a 
\u7e4c 
\u7e67 
\u7e90 
\u7f3c 
\u7f40 
\u7f49 
\u7f56 
\u7faa 
\u7ff6 
\u8002 
\u8041 
\u8053 
\u8062 
\u8063 
\u8068 
\u807a 
\u80ff 
\u810b 
\u810c 
\u8133 
\u8192 
\u81a4 
\u820e 
\u8224 
\u822e 
\u823f 
\u8248 
\u8254 
\u825d 
\u8260 
\u82c6 
\u8310 
\u8312 
\u833e 
\u8362 
\u83b5 
\u83bb 
\u8419 
\u841e 
\u841f 
\u8421 
\u8422 
\u8423 
\u8485 
\u848a 
\u848f 
\u84d9 
\u84dc 
\u84de 
\u84e4 
\u8536 
\u8571 
\u85ad 
\u85d4 
\u85f2 
\u85f5 
\u8612 
\u8615 
\u8630 
\u8644 
\u8645 
\u8672 
\u86ef 
\u874a 
\u8781 
\u87a6 
\u87a7 
\u87a9 
\u87d0 
\u87f5 
\u8834 
\u8850 
\u8864 
\u88a5 
\u88ae 
\u88b0 
\u88c3 
\u88c4 
\u88c7 
\u8904 
\u891c 
\u891d 
\u8945 
\u8968 
\u8977 
\u897d 
\u8984 
\u8a01 
\u8aae 
\u8b03 
\u8b09 
\u8b22 
\u8d0c 
\u8d18 
\u8db0 
\u8e0e 
\u8eae 
\u8eb5 
\u8ebb 
\u8ebc 
\u8ebe 
\u8ec5 
\u8ec8 
\u8f4c 
\u8faa 
\u8fb7 
\u8fbb 
\u8fbc 
\u8fcc 
\u8fda 
\u8ff2 
\u9027 
\u9056 
\u9064 
\u909c 
\u90a4 
\u90d2 
\u90ee 
\u915b 
\u915c 
\u91d2 
\u91fb 
\u9228 
\u9229 
\u922a 
\u922b 
\u92af 
\u92f2 
\u92f4 
\u933a 
\u933b 
\u933f 
\u9342 
\u9345 
\u9386 
\u93b9 
\u93ba 
\u93bc 
\u93bd 
\u93be 
\u93ef 
\u93f1 
\u93f2 
\u9422 
\u9423 
\u9453 
\u9466 
\u9467 
\u958a 
\u9596 
\u959a 
\u95aa 
\u95ce 
\u95cf 
\u95e7 
\u9666 
\u9679 
\u96a1 
\u9717 
\u973b 
\u974d 
\u9786 
\u9790 
\u97b0 
\u97ba 
\u97d5 
\u97df 
\u98aa 
\u98ca 
\u99c7 
\u99ef 
\u99f2 
\u9af8 
\u9b5e 
\u9b78 
\u9b79 
\u9b96 
\u9b97 
\u9b98 
\u9bb1 
\u9bb2 
\u9bb4 
\u9bc2 
\u9bce 
\u9bcf 
\u9bd0 
\u9bd1 
\u9bf1 
\u9bf2 
\u9bf3 
\u9c18 
\u9c1a 
\u9c26 
\u9c30 
\u9c5b 
\u9c5c 
\u9c69 
\u9c6a 
\u9c6b 
\u9c70 
\u9cf0 
\u9d2b 
\u9d46 
\u9d47 
\u9d64 
\u9d65 
\u9d8d 
\u9d8e 
\u9d91 
\u9dab 
\u9eb6 
\u9ebf 


虽说是些用不到的字。

pip install problem

When using pip:
pip install xpinyin
It says:
Skipping installation of /path/to/lib/python2.7/site-packages/xpinyin/__init__.py (namespace package)
Which leads to the main script not installed.

>>> from xpinyin import Pinyin
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name Pinyin

Where did the tones go?

I just stumbled upon your repository while searching for some xpinyin latex stuff.
From you examples I don't see any tones on the pinyin syllables. Does the program not get them? This would reduce usefulness quite a bit.

Strange output

I tried using your lib with both 2.7.11 and 3.5.1. Py3 definitely seems to work better, but I'm getting odd output for some chars in both. For example:

$ python3                   
>>> from xpinyin import Pinyin
>>> p = Pinyin()
>>> p.get_pinyin(u"秋")
'qiu'
>>> p.get_pinyin(u"秋", show_tone_marks=True)
'qiū!!!!'
$ python
>>> from xpinyin import Pinyin
>>> p = Pinyin()
>>> p.get_pinyin(u"秋")
u'qiu'
>>> p.get_pinyin(u"秋", show_tone_marks=True)
u'qi\u016b!!!!'

Pinyin to hanzi

Is there a way to use the tool to convert back a text from pinyin to hanzi?

Python 2 support broken: ImportError: No module named pathlib

I believe this line (introduced 3 days ago) broke Python 2 support:

from pathlib import Path

If I understand correctly, pathlib was introduced in Python 3 as a standard library, but is not readily available to Python 2 environments.

When trying to install a package that imports xpinyin on a Python 2 environment, this causes pip installation to fail.

Collecting xpinyin<1,>=0.5.4
  Downloading xpinyin-0.7.3.tar.gz (131 kB)
    ERROR: Command errored out with exit status 1:
     command: /usr/local/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-JWNu6W/xpinyin/setup.py'"'"'; __file__='"'"'/tmp/pip-install-JWNu6W/xpinyin/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-JWNu6W/xpinyin/pip-egg-info
         cwd: /tmp/pip-install-JWNu6W/xpinyin/
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-JWNu6W/xpinyin/setup.py", line 1, in <module>
        from pathlib import Path
    ImportError: No module named pathlib

Workaround for Python 2

Pin to an older version of xpinyin in requirements.txt, e.g:

xpinyin==0.6.0

打包运行后出现如下错误

Traceback (most recent call last):
File "Area.py", line 534, in
File "xpinyin_init_.py", line 60, in init
File "pathlib.py", line 1175, in read_text
File "pathlib.py", line 1162, in open
File "pathlib.py", line 1016, in _opener
File "pathlib.py", line 388, in wrapped
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\surface\AppData\Local\Temp\_MEI10162\Mandarin.dat'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.