pombredanne / unihandecode Goto Github PK
View Code? Open in Web Editor NEWThis project forked from miurahr/unihandecode
unihandecode original tree
This project forked from miurahr/unihandecode
unihandecode original tree
Unihandecode ASCII transliterations of Unicode text that recognize CJKV complex charactors EXAMPLE USE ----------- from unihandecode import Unihandecoder u = Unihandecoder(lang='ch') print d.decode(u"\u660e\u5929\u660e\u5929\u7684\u98ce\u5439") # That prints: Ming Tian Ming Tian De Feng Chui u = Unihandecoder(lang='ja') print d.decode( u'\u660e\u65e5\u306f\u660e\u65e5\u306e\u98a8\u304c\u5439\u304f') # That prints: Ashita ha Ashita no Kaze ga Fuku DESCRIPTION ----------- It often happens that you have non-Roman text data in Unicode, but you can't display it -- usually because you're trying to show it to a user via an application that doesn't support Unicode, or because the fonts you need aren't accessible. You could represent the Unicode characters as "???????" or "\15BA\15A0\1610...", but that's nearly useless to the user who actually wants to read what the text says. What Unihandecode provides is a function, 'decode(...)' that takes Unicode data and tries to represent it in ASCII characters (i.e., the universally displayable characters between 0x00 and 0x7F). The representation is almost always an attempt at *transliteration* -- i.e., conveying, in Roman letters, the pronunciation expressed by the text in some other writing system. (See the example above) These are same meaning in both language in example above. "明天明天的风吹" for Chinese and "明日は明日の風が吹く" for Japanese. The character "明" is converted "Ming" in Chinese. "明日" is converted "Ashita" but single charactor "明" will be converted "Mei" in Japanese. This is an improved version of Python unidecode, that is Python port of Text::Unidecode Perl module by Sean M. Burke <[email protected]>. REQUIREMENTS ------------ There is no required staff other than standard python libraries. Because it is still under development for python3, you can use it with python2.x.(>2.6) INSTALLATION ------------ You install Unihandecode, as you would install any Python module, by running these commands: python setup.py gendict python setup.py install python setup.py test If you got egg package, it is easy to install by $ easy_install Unihandecode-0.40-py2.7.egg BUILD ------ To build egg package, we need additional instruction. python setup.py gendict python setup.py bdist_egg LIMITATION ---------- This library uses pickler that format is depend on platform and python version. You should re-create dictionary for each python version. SUPPORT -------- Questions, bug reports, useful code bits, and suggestions for Unihandecode are handled on github.com/miurahr/unihandecode AVAILABILITY ------------ The latest version of Unihandecode is available from Git repository in github.com: https://github.com/miurahr/unihandecode WARNING: There was launchpad.net Bazzar repository named unhandecode. It has NOT been maintained and moved github entirely. COPYRIGHT --------- Unicode Character Database: Date: 2010-09-23 09:29:58 UDT [JHJ] Unicode version: 6.0.0 Copyright (c) 1991-2010 Unicode, Inc. For terms of use, see http://www.unicode.org/terms_of_use.html For documentation, see http://www.unicode.org/reports/tr44/ Unidecode's character transliteration tables: Copyright 2001, Sean M. Burke <[email protected]>, all rights reserved. Python code: Copyright 2010,2011, Hiroshi Miura <[email protected]> Copyright 2009, Tomaz Solc <[email protected]> LICENSE ------- Unihandecode Copyright 2010-2013 Hiroshi Miura This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.