Giter Site home page Giter Site logo

tehreer / sheenbidi Goto Github PK

View Code? Open in Web Editor NEW
122.0 10.0 20.0 2.74 MB

A sophisticated implementation of Unicode Bidirectional Algorithm

License: Apache License 2.0

C 72.69% C++ 26.60% Makefile 0.59% Meson 0.12%
unicode bidi unicode-bidirectional-algorithm uax-9 uax-24 c c89 ansi-c c-plus-plus library

sheenbidi's People

Contributors

khaledhosny avatar mta452 avatar radarhere avatar utelle avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sheenbidi's Issues

CJK / Hangul has return SBBidiTypeON which reversed when mixing with arabic

For example text 阿拉伯語العربية when given to sheenbidi the SBLineGetRunCount is only 1 which 阿拉伯語 will be reversed like the arabic because it returns a neutral bidi type for chinese

From UnicodeData.txt it seems the cjk and hangul has 4E00;<CJK Ideograph, First>;Lo;0;L;;;;;N;;;;; "L" for LTR, maybe it return something wrongly elsewhere?

I use this as a workaround:

        if ((codepoint >= 0x4E00 && codepoint <= 0x9FFF) ||
            (codepoint >= 0x3400 && codepoint <= 0x4DBF) ||
            (codepoint >= 0xF900 && codepoint <= 0xFAFF) || // CJK
            (codepoint >= 0xAC00 && codepoint <= 0xD7A3)) // Hangul
            types[firstIndex] = SBBidiTypeL;
        else
            types[firstIndex] = LookupBidiType(codepoint);

Using SheenBidi with utf8

Hi,

I am looking at a library for Bidi to pair up with Harfbuzz and FreeType for rendering text in a game engine. Do you know any project that is using SheenBidi with those so I can get a little headstart when working with these? Does SheenBidi support emojis? I am using utf8.

requirement: Is there a demo for utf-16?

I guess if I input a string of utf-8 characters, when it will be revert direction, it's hard to handle it when it's represented with utf-8. So is there a demo for utf-16 ? I need to process uyghur characters.

Small documentation mistake

If I'm not mistaken, README file contains a small mistake. It says this about SBParagraph :

It represents a single paragraph of text processed with rules X1-I2. It provides resolved embedding levels of all the code units of a paragraph.

However, levels returned by SBParagraphGetLevelsPtr does not seem to include L1 and L2 rules, only SBLine includes those rules.

I tested with the following code point sequence:
0661 0009 0028 0662 0029

Support Meson

Hi, I would like to add support for SheenBidi in libraqm (issue), which uses the increasingly popular Meson build system - as do all of its current dependencies. It would be amazing if SheenBidi supported Meson as well, since it can automatically build libraries that also use it when they are not installed on the host system (details here). Given SheenBidi does not have a package in most distros, Meson really is a must-have for us. Is supporting it something you can do?

Thanks :)

Unsafe for allocation failure

At the moment the library calls malloc and uses the result without checking for NULL throughout.

For my use case this makes it unusable. Would it be possible to add allocation failure handling?

Getting script of each code unit in an array

Currently, the script runs are identified in an iterating fashion. But sometimes an array turns out to be a better option to find out the overlapping regions of bidi runs and script runs. Technically it's not a big effort to write a utility function that fills out the script array by iterating over script runs. However, it would be really helpful if such a utility function is already provided in the library.

For example, it was a simpler approach for Java interoperability in Tehreer-Android. The relevant piece of code is available in ScriptClassifier.cpp

extern "C" in public headers?

Hi!
Thank you for this library, it is exactly what I looked for as I don't want to include ICU in my project. I'm using it from a C++ code, so I needed extern "C", but it would be nicer, if this would be in the library's public headers at least.

I'm not sure which option would you like better: adding it to every public header or just for the ones declaring functions, so I decided not to open a PR about this, as it's a trivial thing anyway :)

SBAlgorithmGetParagraphBoundary() leaves separatorLength unset if no separators.

When running SBAlgorithmGetParagraphBoundary() on a string that contains no separators (e.g. "This is a single line string."), the result of separatorLength is unchanged. For example, if you send the address to an uninitialized variable the result would be uninitialized. If there is no separator in the paragraph, I would expect it to be explicitly set to 0.

Upgrade data files to Unicode 11

As Unicode 11 has been released, bidi type, general category, script, mirror and bracket lookups should be updated accordingly.

PopulateBidiChain in some situations leaves SBBidiType portion of memory uninitialized.

I checked with valgrind that on every run it reports
"==2247== Warning: unimplemented fcntl command: 1033
==2247== Thread 32:
==2247== Conditional jump or move depends on uninitialised value(s)
==2247== at 0x14B4107: SBAlgorithmCreateParagraph"
I tried to narrow down the part of memory which is uninitialized according to valgrind and it came out that if I will add in CreateParagraphContext function just after BidiChainInitialize additional code

for (int i = 0;i < length;++i) fixedTypes[i] = SBBidiTypeNil;

the valgrind is not reporting any problem.

I followed PopulateBidiChain code and it looks that in some cases it can omit initialization of some links in the chain. I do not understand sheenbidi well enough to say if this mean we have a data error in such case. Anyway application is not crashing because of it, the only visible problem so far is valgrind report.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.