Giter Site home page Giter Site logo

chinesestuff's Introduction

Howell etymological dictionary

The file "Howell character dictionary.pdb" is my conversion, hopefully with not too many blunders of a dictionary previously piblished at kanjinetworks.com's (now defunct) to Pleco format. The data from kanjinetworks sort of lives at http://www.smarthanzi.net now. "Howell Chars Mandarin.pdb" is the same for the "Mandarin redaction" of the listing. Note the "global version" has 6255 entries (I believe I omitted all made-in-Japan characters) and the Mandarin version has 6198. so there is a margin for error, unless the original PDFs also differ by 58 characters. Shrug.

The original is by Lawrence Howell, who is rather elusive, and the "late Hikaru Morimoto". I do not know what licence it comes under, assuming Creative Commons of some sort. Pdf's are included for reference, and are avaliable at his Slideshare. Whatever the scientific value of the dictionary (it is still disputed), I find it useful for thinking about mnenomics for remembering characters that confuse my memory at the time. Character entries in the pdb file are traditional and do not have pinyin associated with them, thus they will only show up under "Chars/Other Pronunciations" tab in Pleco.

I've later run the dictionaries, or rather flashcards with content, through Pleco's "add missing" function, which sort of addedd pinyin to the both. These are "w/pinyin" files. On the one hand, having pinyin in entries makes many Pleco bugs/glitches go away (Pleco is terrible in handling user dicts without pinyin), on the other hand, this pinyin is quite unreliable and ocassionally wrong.

For a more "scientific" critique of the method, see Mair's blog

Why are Chinese characters so damn hard

A cool thesis by Michal Kosek plus my conversion of his Appendix A of common (well, his own) confusions of simplified characters to an Anki deck. Again, I find it a useful reference for "tough" (for me, at the moment) characters.

Modern Chinese Character Frequency List

A Pleco dictionary listing basic meanings and frequency of characters in modern Chinese. The list comes from Jun Da's Chinese text computing page. That corpus is non-specific to genre, unlike SUBTLEX-CH. This is a good idea if you are targeting texts other than movie subtitles ;)

现代汉语常用词表

Another Pleco dictionary listing "official" frequencies of "words" in Chinese, according to some CN government agency (forgot which). It's a bit overloaded with official party terms at quite high rankings, but this is the reality for newspapers published on the mainland. I might make a SUBTLEX-CH based frequency dictionary later. My use of it is purely for the reverence.

I've read in Jun Da's article that basic news reading requires knowledge of about 20k high-medium frequency words, so "20k Top Common Words" is some Pleco/whatever flashcards taken from this frequency list. From most to the least common, split in groups by 1k words. I use them to fill the gaps in my vocab before and sometime instead of (wink-wink) reading the actual news articles.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.