Giter Site home page Giter Site logo

kevinhu / hotpot Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 2.0 121.5 MB

A lightweight Chinese-English dictionary

Home Page: https://hotpot.kevinhu.io

License: MIT License

JavaScript 63.96% HTML 1.69% CSS 1.83% Python 32.52%
chinese chinese-nlp chinese-translation chinese-traditional dictionary nlp

hotpot's Introduction

A static Chinese-English dictionary entirely hosted on GitHub Pages and Netlify. See it live at https://hotpot.kevinhu.io.

Overview

Chinese-English dictionaries are essential tools for learning the language. This project constructs a dictionary with the basic function of providing English definitions for Chinese words plus three powerful extensions:

  1. Word frequency statistics
  2. Word/character decomposition and etymology
  3. Recommendations for related words
  4. Examples of word usage in translated sentences

How it works

Dictionary construction

  1. Retrieval of source data; performed by /dictionary/1_retrieve.py:
  2. Conversion of source data to Pandas-processable tables; /dictionary/2_to_tables.py
  3. Filtering of translated sentences; /dictionary/3_filter_examples.py
  4. Segmentation of filtered translated sentences using jieba; /dictionary/4_segment_examples.py
  5. Extraction of segmented words from sentences to create a word -> example sentences mapping; /dictionary/5_words_to_sentences.py
  6. Computation of words-containing-words through Aho-Corasick on CEDICT; /dictionary/6_containing_words.py
  7. Computation of related words by using nearest-neighbor search (via annoy) on FastText vectors; /dictionary/7_word2vec_similars.py
  8. Unification of previous outputs into single JSON files for each word ready for the frontend, split by simplified and traditional; /dictionary/8_unify.py
  9. Construction of an index for search; /dictionary/9_client_search.py

Considerations:

  • Due to the size of the output of step 8, the outputs are hosted in a submodule (kevinhu/dictionary-files) rather than in hotpot itself.
  • The Chinese-English translated sentences are not included in /dictionary/1_retrieve.py because a Kaggle login is required for download.

API

The API consists of a single serverless function hosted on Netlify that implements full-text search with FlexSearch.

  1. We first prepare a FlexSearch index in /api/prepare_index.js. This cuts down cold-start times to about a few seconds.
  2. The actual serverless endpoint is then described in /api/search.js.

Client

The web client (a standard create-react-app) then takes the JSON files hosted on GitHub to render the entries. It also makes calls to the API for searching.

Getting started

Dictionary construction

  1. Install Python dependencies with poetry install
  2. Activate virtual environment with poetry shell

API

  1. Link the repository to your Netlify account and enable continuous deployments.
  2. Change the search paths in the frontend to the correct URL.

Client

  1. Install JavaScript dependencies with yarn install
  2. Start the client with yarn start
  3. Deploy to GitHub Pages with yarn deploy (make sure the "homepage" parameter in package.json and CNAME record in /public are configured correctly)

Note that the scraper and frontend are more or less independent with the exception of the final .json output.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.