Giter Site home page Giter Site logo

cjklib's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

cjklib's Issues

Use SQLAlchemy Tables/Schemas for installing data?

After seeing #4, some SQL languages handle table creation a bit differently.

SQLAlchemy has it's own way of creating tables, @cburgmer, would SQLAlchemy tables/schemas make sense for handling creating of /data's schemas?

If this is something that could help improve robustness via using SQLAlchemy's dialects, I'd like to take a bite. Anything you can think of that would block this?

Get Yale readings

Since Yale encodes the difference between the high level and high falling tones but Jyutping doesn't, would it be possible to get the Yale readings directly?

Update cjklib to be compatible with SQLAlchemy >=0.7

I'm currently working on a new project that uses Flask and cjklib. However, Flash-SQLAlchemy seems to error our when I'm using SQLAlchemy 0.6.9 (despite it stating it's compatible with 0.6 or higher). This means I've got a problem, since I need the lower version of SQLAlchemy for cjklib, and the higher version for Flask-SQLAlchemy.

It would be nice if cjklib could be updated, or if at least some information could be posted on what's currently preventing 0.7 from being usable.

make it run with Python3

It would be great if this library could work with Python3, and, by extension, with a recent version of SQLAlchemy.

Character has no stroke count information

Hi @cburgmer, thanks so much for this library. After a bit of fiddling, I was able to get everything going! Everything I've tried so far works: translation, pinyin, etc. I can't, however, seem to figure out getStrokeCount():

from cjklib.characterlookup import CharacterLookup
cjk = CharacterLookup('C')
print(cjk.getStrokeCount(u'说'))

When I run the above, I get the following:

Traceback (most recent call last):
File "/Users/user/Documents/GitHub/chinese/hanzi.csv/generate.py", line 7, in print(cjk.getStrokeCount(u'说'))
File "/Users/user/Documents/GitHub/chinese/hanzi.csv/cjklib/characterlookup.py", line 644, in getStrokeCount
"Character has no stroke count information")
cjklib.exception.NoInformationError: Character has no stroke count information

I've tried rebuilding the databases, reinstalling, etc. but no luck. I was wondering if you had any suggestions?

have a test suite

It would be nice if this software had a test suite. It's quite difficult to develop without one.

State of the cjklib / understanding our datasets

I think it'd be good to get a state of matters for where we stand on cjklib in terms of its current codebase. Do we want to use it? As it stands, I'm not sure if I'm failing to grasp the complexities of comingling our data, or if there are architectural mistakes within that just would be best if we rewrote it.

If that is the case - I wonder if you could take some time to document what is what from a data perspective. Here are few questions that'd be helpful to have answers on:

  • In cjklib.data's csv an sql files - what are these datasets? how are they used? are they used in the same way? what data do/can they hold?

More specifically, what is the following:

  • edict
  • cedict
  • cedictgr
  • handedict
  • cfdict
  • unihan
  • kanjidic2

and

  • cantoneseipainitialfinal
  • cantoneseipainitialfinal
  • cantoneseyaleinitialnucleuscoda
  • cantoneseyalesyllables
  • characterdecomposition
  • charactershanghaineseipa
  • grabbreviation
  • grrhotacisedfinals
  • grsyllables
  • jyutpinginitialfinal
  • jyutpingipamapping
  • jyutpingsyllables
  • jyutpingyalemapping
  • kangxiradical
  • localecharacterglyph
  • mandarinipainitialfinal
  • pinyinbraillefinalmapping
  • pinyinbrailleinitialmapping
  • pinyingrmapping
  • pinyininitialfinal
  • pinyinipamapping
  • pinyinsyllables
  • radicalequivalentcharacter
  • shanghaineseipasyllables
  • strokeorder
  • strokes
  • Unihan.zip (is this downloaded to here?)
  • wadegilesinitialfinal
  • wadegilespinyinmapping
  • wadegilessyllables

What are the above? Why are some included while otheres are downloaded remotely? Can we package any/all of the remote data in cjklib? Is it it matter of licensing of assuring downloading of fresh data?

What data in the above datasets intersect, where?

If there is a place where the data intersects, often, I'm assuming we're massaging it in some sense so we can match it to a lookup? Maybe it'd help to have a spreadsheet / table on this?

I think that if we mapped the data we have to a spreadsheet it'd offer us all a better view of the picture - imo. Then we can take a look back away from legacy assumptions and be in a better position to make pull requests for larger architecture changes.

I realize the above is a pretty time-consuming thing, think you could take a bite at it though?

Pinyin to MandarinIPA bugs

Thanks for your wonderful cjklib and cjknife command-line tool. When making system calls to cjknife to produce IPA for some Pinyin (I'm writing a command-line pinyin drilling program in R) and I noticed some bugs in the production of MandarinIPA using the following system call:

cjknife -s Pinyin -t MandarinIPA -m pinyin_to_convert_to_ipa
  1. cjknife throws an error when asking it to convert the legitimate pinyin yo, m, n, ng, hng, and hm. I've seen yo (final io without an initial) cast in ipa as [jo] or [jɔ]. Sometimes they use the i with a tilde underneath instead of a j. According to Wikipedia's syllabic consonant page you should be able to use [m̩], [n̩], [ŋ̍], [xŋ̍], and [xm̩] for those Mandarin syllabic consonant interjections (IPA adds a little line above or below to signify it is a syllabic consonant).

  2. cjknife gives 'o' IPA for Pinyin (u)o after b, p, m, f where it would have a 'wo' sound e.g. po = [pʰwo] not [p‘o]. Although written with an 'o' in fact bo, po, mo, fo (and wo) all have "uo" finals. The only examples of pure "o" finals are the interjection "o" and the rather rare participle "lo" (yo being the only example of the "io" final).

  3. cjknife gives incorrect IPA for erhua e.g. dianr3 = tjɐɚ̯ not tiɛn.ər
    If we restrict the erhua to what is expected to know in order to pass the 普通话水平测试 exam (i.e. who has a standard Mandarin pronunciation) we still have a lot of erhua syllables. For comparison I've compiled by own Mandarin syllable to IPA mapping:

https://u14129277.dl.dropboxusercontent.com/u/14129277/pinyin_ipa.csv

which I built from the following tables I compiled (the final and initial one mainly from the Pinyin and Erhua pages on Wikipedia but also from other sources) and the pinyin to initial to final I decomposed by hand from all the pinyin examples I could find):

https://u14129277.dl.dropboxusercontent.com/u/14129277/initial.csv

https://u14129277.dl.dropboxusercontent.com/u/14129277/final.csv

https://u14129277.dl.dropboxusercontent.com/u/14129277/pinyin_initial_final.csv

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.