The meaning behind this repository is to collect diacritics with their associated ASCII characters in a structured form. It should be the central place for various projects when it comes to diacritics mapping.
As there is no single, trustworthy and complete source, all information need to be collected by users manually.
Example mapping:
Schön => Schoen
Schoen => Schön
User Requirements
Someone using diacritics mapping information.
It should be possible to:
- Output diacritics mapping information in a CLI and web interface
- Output diacritics mapping information for various languages, e.g. a JavaScript array/object
- Fetch diacritics mapping information in builds
- Filter diacritics mapping information based by:
- By diacritic
- By mapping value
- By language
- By continent
- By alphabet (e.g. Latin)
Contributor Requirements
Someone providing diacritics mapping information.
Assuming every contributor has a GitHub account and is familiar with Git.
Providing information should be:
- Easy to collect
- Possible without manual invitations
- Possible without registration (an exception is: "Register with GitHub")
- Done at one place
- Easy to check correctness of information structure
- Checkable before acceptance by another contributor familiar with the language
- Possible without a Git clone
System Specification
There are two ways of realization:
-
Create a JSON database in this GitHub repository, as this fits user and contributor requirements.
-
Create a database in a third-party service that fits the user and contributor requirements.
Tested:
- Transifex: Doesn't fit requirements. It would allow providing mapping information, but not metadata.
- Contentful: Doesn't fit requirements. It would require a manual invitation and registration.
Because we're not familiar with further third-party services that could fit user and contributor requirements, we'll continue realizing the first point.
System Requirements
See the documentation and pull request.
Build & Distribution
Build
According to the contributor requirements it should be possible to compile source files without making a Git clone necessary. This means that we can't require users to run e.g. $ grunt dist
at the and, since this would require to clone, install dependencies and run things. What we'll do is implementing a build bot that will run our build on Travis CI and commits changes directly to a dist
branch in this repository. Therefore once you merge something or you commit something yourself the dist
branch will be updated automatically. Some people already doing this to update their gh-pages
branch when something changes in the master
branch (e.g. this script).
Since we'll use a server-side component to filter and serve actual mapping information we just need to generate one diacritics.json
file containing all data.
To make parsing easier and to encode diacritics to unicode numbers in production we're going to need a build that minifies the files and encodes diacritics. This should be done using Grunt.
Integrity
In order to ensure integrity and consistency we need the following in our build process:
- A JSON validator that validates database files (must work with comments)
- A code style guideline, e.g. .jsbeautify
- A linter for JSON files that makes sure the database is formatted according to the code style
Distribution
To provide diacritics mapping according to the User Requirements it's necessary to run a custom server-side component that makes it possible to sort, limit and filter information and output them in different ways (e.g. JS object or array). This component should be realized using Node.js as it's made for handling JS/JSON files and PHP would cause a lot more serializing/deserializing.
Next Steps
This comment is updated continuously during the discussion