Giter Site home page Giter Site logo

krmanik / hsk-3.0-words-list Goto Github PK

View Code? Open in Web Editor NEW
77.0 6.0 22.0 24.94 MB

Contains HSK 3.0 (HSK 1 to HSK 9) Hanzi, Handwritten, Words and Grammar list, also contains list for Anki decks, with frequency, pinyin, zhuyin and meaning.

License: Other

Python 95.27% JavaScript 4.73%

hsk-3.0-words-list's Introduction

HSK 3.0

The repository contains HSK 3.0 (HSK 1 - HSK 9) Hanzi, Handwritten, Word and Grammar list. It also contains list for Anki decks with frequency, pinyin, zhuyin and meaning.

The HSK 3.0 PDF file OCRed and saved as text files.
http://www.moe.gov.cn/jyb_xwfb/gzdt_gzdt/s5987/202103/W020210329527301787356.pdf

If you find this repository useful then consider staring this repository.

Download Anki-xiehanzi decks from here,
https://github.com/krmanik/Anki-xiehanzi

License

View License

hsk-3.0-words-list's People

Contributors

hellohejinyu avatar krmanik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

hsk-3.0-words-list's Issues

Grammar Points

Has anyone been able to OCR the 572 grammar points at the end of the HSK 3.0 document?

HSK 语法点文本转成 json 数据

项目中要展示 HSK 语法点,写了段 JS 代码把 txt 文档转成 json 格式方便使用。不过我看本项目是 Python 项目,写了个 Python 版本的贴出来供有需要的同学参考。

image

import re
import requests

class GrammarPoint:
    def __init__(self, level=0, name="", children=None):
        self.level = level
        self.name = name
        self.children = children if children is not None else []

def get_level_type(line):
    if re.match(r'^A\.\d .+$', line):
        return 1
    if re.match(r'^A\.\d\.\d .+$', line):
        return 2
    if re.match(r'^A\.\d\.\d\.\d .+$', line):
        return 3
    if re.match(r'^A\.\d\.\d\.\d\.\d .+$', line):
        return 4
    if line.startswith('【'):
        return 5
    if line.startswith('(') or line.startswith('('):
        return 6
    return 7

def pick_text_from_line(line):
    match = re.match(r'^A(\.\d)+ (.+)$', line)
    if match:
        return match.group(2).strip()
    match2 = re.match(r'^【(.+)】(.+)$', line)
    if match2:
        return match2.group(2).strip()
    if line.startswith('(') or line.startswith('('):
        find_index = line.find(')') if ')' in line else line.find(')')
        return line[find_index + 1:].strip()
    return line

def parse_line(line, current):
    if not line or line.startswith('※'):
        return

    level_type = get_level_type(line)
    if level_type == 1:
        current.name = pick_text_from_line(line)
        current.level = level_type
        return

    target = current
    while target.level < level_type - 1:
        if not target.children or target.children[-1].level == level_type:
            break
        else:
            target = target.children[-1]

    target.children.append(GrammarPoint(level=level_type, name=pick_text_from_line(line)))

def main():
    url = "https://cdn.jsdelivr.net/gh/hellohejinyu/HSK-3.0/HSK%20Grammar/HSK%201.txt"
    response = requests.get(url)
    text = response.text
    lines = [line.strip() for line in text.split('\n') if line.strip()]
    grammar = GrammarPoint(level=1, name="")
    for line in lines:
        parse_line(line, grammar)
    print(grammar.__dict__)  # For demonstration, you might want to implement a more sophisticated way to display the nested structure.

if __name__ == "__main__":
    main()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.