Giter Site home page Giter Site logo

ucdn's Issues

HarfBuzz does not use UCDN anymore

Hi,

I like to thank you for UCDN again. It served HarfBuzz very well for many years. Recently I was trying to squeeze bytes out of HarfBuzz and replacing UCDN become a fruitful target. Please see:

harfbuzz/harfbuzz@65392b7

The generator can be used for other arrays as well, in case you want to use it in other places or regenerate UCDN based on it.

Cheers,
b

Mistake in pregenerated database.

Hi, i think there's a problem with the makeunicodedata.py script. The general_category column of the code points in a range (such as U+5000 in U+4E00..9FCC range) is incorrectly set to
UCDN_GENERAL_CATEGORY_CN, which means it is not assigned. However it should be same with the 'First>' and 'Last>' code point, which is meaningful.

Look forward for someone getting this fixed. Thanks a lot.

Bug in Hangul composition

Patch and test:

diff --git a/src/hb-ucdn/ucdn.c b/src/hb-ucdn/ucdn.c
index 30747fea..f7b33d64 100644
--- a/src/hb-ucdn/ucdn.c
+++ b/src/hb-ucdn/ucdn.c
@@ -163,7 +163,8 @@ static int hangul_pair_decompose(uint32_t code, uint32_t *a, uint32_t *b)
 
 static int hangul_pair_compose(uint32_t *code, uint32_t a, uint32_t b)
 {
-    if (a >= SBASE && a < (SBASE + SCOUNT) && b >= TBASE && b < (TBASE + TCOUNT)) {
+    if (a >= SBASE && a < (SBASE + SCOUNT) && b > TBASE && b < (TBASE + TCOUNT) &&
+       !((a - SBASE) % TCOUNT)) {
         /* LV,T */
         *code = a + (b - TBASE);
         return 3;
diff --git a/test/api/test-unicode.c b/test/api/test-unicode.c
index 6195bb28..0587c6e7 100644
--- a/test/api/test-unicode.c
+++ b/test/api/test-unicode.c
@@ -755,6 +755,10 @@ test_unicode_normalization (gconstpointer user_data)
   g_assert (hb_unicode_compose (uf, 0xCE20, 0x11B8, &ab) && ab == 0xCE31);
   g_assert (hb_unicode_compose (uf, 0x110E, 0x1173, &ab) && ab == 0xCE20);
 
+  g_assert (!hb_unicode_compose (uf, 0xAC00, 0x11A7, &ab));
+  g_assert (hb_unicode_compose (uf, 0xAC00, 0x11A8, &ab) && ab == 0xAC01);
+  g_assert (!hb_unicode_compose (uf, 0xAC01, 0x11A8, &ab));
+
 
   /* Test decompose() */

Provide Bidi_Paired_Bracket and Bidi_Paired_Bracket_Type properties

I am writing Lua bindings for ucdn at https://github.com/deepakjois/luaucdn, with the ultimate goal of implementing a pure Lua version of the Unicode Bidirectional Algorithm.

One part of algorithm utilizes the properties Bidi_Paired_Bracket and Bidi_Paired_Bracket_Type defined in the BidiBrackets.txt file. Both of these are derived properties, so I suppose it is possible to obtain them by using the current data and API methods made available by ucdn.

However, there is also this warning in UAX #44:

Implementations should simply use the derived properties, and should not try to rederive them from lists of simple properties and collections of rules, because of the chances for error and divergence when doing so.

In light of that, what is your opinion on providing Bidi_Paired_Bracket and Bidi_Paired_Bracket_Type directly in ucdn?

Update to Unicode 11

Just out of the oven...

I'm updating HarfBuzz to make a release later today. If you can update soon, would be nice.

Make data structs const

Currently they are not const, which means they will end up in the .data section of the library. Not good. Just add const.

Add header guards

Please add the typical "ifdef __cplusplus extern "C"" stuff such that including the header from C++ works as well. See hb/src/hb-ucdn/ucdn.h for example.

UCDN needs automated testing

UCDN doesn't have any automated testing. We need at least some equivalence class testing, or ideally verification of correctness against the whole character database.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.