Giter Site home page Giter Site logo

ucdn's Introduction

UCDN - Unicode Database and Normalization

UCDN is a Unicode support library. Currently, it provides access
to basic character properties contained in the Unicode Character
Database and low-level normalization functions (pairwise canonical
composition/decomposition and compatibility decomposition). More
functionality might be provided in the future, such as additional
properties, string normalization and encoding conversion.

UCDN uses standard C89 with no particular dependencies or requirements
except for stdint.h, and can be easily integrated into existing
projects. However, it can also be used as a standalone library,
and a CMake build script is provided for this. The first motivation
behind UCDN development was to provide a standalone set of Unicode
functions for the HarfBuzz OpenType shaping library. For this purpose,
a HarfBuzz-specific wrapper is shipped along with it (hb-ucdn.h).

UCDN is published under the ISC license, please see the license header
in the C source code for more information. The makeunicodata.py script
required for parsing Unicode database files is licensed under the
PSF license, please see PYTHON-LICENSE for more information.

UCDN was written by Grigori Goronzy <[email protected]>.

How to Use

Include ucdn.c, ucdn.h and ucdn_db.h in your project. Now, just use the
functions as documented in ucdn.h.

In some cases, it might be necessary to regenerate the Unicode
database file. The script makeunicodedata.py (Python 3.x required)
fetches the appropriate files and dumps the compressed database into
ucdn_db.h.

ucdn's People

Contributors

bakercp avatar behdad avatar brawer avatar deepakjois avatar dscorbett avatar edmundmk avatar grigorig avatar mzn723 avatar sebras avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ucdn's Issues

Bug in Hangul composition

Patch and test:

diff --git a/src/hb-ucdn/ucdn.c b/src/hb-ucdn/ucdn.c
index 30747fea..f7b33d64 100644
--- a/src/hb-ucdn/ucdn.c
+++ b/src/hb-ucdn/ucdn.c
@@ -163,7 +163,8 @@ static int hangul_pair_decompose(uint32_t code, uint32_t *a, uint32_t *b)
 
 static int hangul_pair_compose(uint32_t *code, uint32_t a, uint32_t b)
 {
-    if (a >= SBASE && a < (SBASE + SCOUNT) && b >= TBASE && b < (TBASE + TCOUNT)) {
+    if (a >= SBASE && a < (SBASE + SCOUNT) && b > TBASE && b < (TBASE + TCOUNT) &&
+       !((a - SBASE) % TCOUNT)) {
         /* LV,T */
         *code = a + (b - TBASE);
         return 3;
diff --git a/test/api/test-unicode.c b/test/api/test-unicode.c
index 6195bb28..0587c6e7 100644
--- a/test/api/test-unicode.c
+++ b/test/api/test-unicode.c
@@ -755,6 +755,10 @@ test_unicode_normalization (gconstpointer user_data)
   g_assert (hb_unicode_compose (uf, 0xCE20, 0x11B8, &ab) && ab == 0xCE31);
   g_assert (hb_unicode_compose (uf, 0x110E, 0x1173, &ab) && ab == 0xCE20);
 
+  g_assert (!hb_unicode_compose (uf, 0xAC00, 0x11A7, &ab));
+  g_assert (hb_unicode_compose (uf, 0xAC00, 0x11A8, &ab) && ab == 0xAC01);
+  g_assert (!hb_unicode_compose (uf, 0xAC01, 0x11A8, &ab));
+
 
   /* Test decompose() */

Provide Bidi_Paired_Bracket and Bidi_Paired_Bracket_Type properties

I am writing Lua bindings for ucdn at https://github.com/deepakjois/luaucdn, with the ultimate goal of implementing a pure Lua version of the Unicode Bidirectional Algorithm.

One part of algorithm utilizes the properties Bidi_Paired_Bracket and Bidi_Paired_Bracket_Type defined in the BidiBrackets.txt file. Both of these are derived properties, so I suppose it is possible to obtain them by using the current data and API methods made available by ucdn.

However, there is also this warning in UAX #44:

Implementations should simply use the derived properties, and should not try to rederive them from lists of simple properties and collections of rules, because of the chances for error and divergence when doing so.

In light of that, what is your opinion on providing Bidi_Paired_Bracket and Bidi_Paired_Bracket_Type directly in ucdn?

Mistake in pregenerated database.

Hi, i think there's a problem with the makeunicodedata.py script. The general_category column of the code points in a range (such as U+5000 in U+4E00..9FCC range) is incorrectly set to
UCDN_GENERAL_CATEGORY_CN, which means it is not assigned. However it should be same with the 'First>' and 'Last>' code point, which is meaningful.

Look forward for someone getting this fixed. Thanks a lot.

UCDN needs automated testing

UCDN doesn't have any automated testing. We need at least some equivalence class testing, or ideally verification of correctness against the whole character database.

Update to Unicode 11

Just out of the oven...

I'm updating HarfBuzz to make a release later today. If you can update soon, would be nice.

HarfBuzz does not use UCDN anymore

Hi,

I like to thank you for UCDN again. It served HarfBuzz very well for many years. Recently I was trying to squeeze bytes out of HarfBuzz and replacing UCDN become a fruitful target. Please see:

harfbuzz/harfbuzz@65392b7

The generator can be used for other arrays as well, in case you want to use it in other places or regenerate UCDN based on it.

Cheers,
b

Add header guards

Please add the typical "ifdef __cplusplus extern "C"" stuff such that including the header from C++ works as well. See hb/src/hb-ucdn/ucdn.h for example.

Make data structs const

Currently they are not const, which means they will end up in the .data section of the library. Not good. Just add const.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.