s1r-j / jinmei-dict Goto Github PK

View Code? Open in Web Editor NEW

辞書データから人名だけを抜き出し、読み仮名（カタカナ）をキーとして、候補となる書き文字をリストで保持するようなJSON形式に整形しています。

License: Apache License 2.0

HTML 62.37% Python 37.63%

jinmei-dict's Introduction

jinmei-dict

下記の辞書データから人名だけを抜き出し、読み仮名（カタカナ）をキーとして、候補となる書き文字をリストで保持するようなJSON形式に整形しています。

さらに、厚生労働省のウェブサイトにあった異体字リストを参考に異体字対応表（scripts/itaiji.json）を作成し、辞書データから取得した書き文字を異体字に変換して追加登録しました。

NAIST-jdic
mecab-ipadic-neologd
自作辞書データ(data/addon.csv)

2020年4月2日時点では、
姓は読み仮名が54,970語で漢字候補は210,676語、名の読み仮名が15,740語で漢字候補は186,651語となっています。

利用できる辞書データの探索と自作辞書へのデータ追加が課題です。

Description

sei.jsonは姓のデータです。mei.jsonは名のデータです。

scriptsフォルダ以下には人名データを抜き出してJSONに整形するスクリプト（Python）があります。
使い方は以下のとおりです。

各辞書データのCSVファイル（mecab形式）を用意します。
異体字リスト（scripts/itaiji.json）をスクリプトと同じ位置に配置します。
scripts/jinmei-dict.pyを実行します。（Python3）

python jinmei-jdic.py '~/naist-jdic.csv' '~/mecab-user-dict-seed.yyyyMMdd.csv' '~/addon.csv'

Usage

jinmeiフォルダ以下に姓・名それぞれのJSONデータがあります。

かんたんに使うだけなら、GitHub Pagesで作成したサイトで読み仮名から人名漢字を検索する事ができます。

Licence

Apache-2.0

Author

s1r-J

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.