tehreer / sheenbidi Goto Github PK
View Code? Open in Web Editor NEWA sophisticated implementation of Unicode Bidirectional Algorithm
License: Apache License 2.0
A sophisticated implementation of Unicode Bidirectional Algorithm
License: Apache License 2.0
For example text 阿拉伯語العربية
when given to sheenbidi the SBLineGetRunCount is only 1 which 阿拉伯語 will be reversed like the arabic because it returns a neutral bidi type for chinese
From UnicodeData.txt it seems the cjk and hangul has 4E00;<CJK Ideograph, First>;Lo;0;L;;;;;N;;;;; "L" for LTR, maybe it return something wrongly elsewhere?
I use this as a workaround:
if ((codepoint >= 0x4E00 && codepoint <= 0x9FFF) ||
(codepoint >= 0x3400 && codepoint <= 0x4DBF) ||
(codepoint >= 0xF900 && codepoint <= 0xFAFF) || // CJK
(codepoint >= 0xAC00 && codepoint <= 0xD7A3)) // Hangul
types[firstIndex] = SBBidiTypeL;
else
types[firstIndex] = LookupBidiType(codepoint);
Parameters
encoding = SBStringEncodingUTF16
string = "abcdابجد"
paragraphOffset = 0
suggestedLength = 4
baseLevel = 0
Input Function
SBParagraph paragraph = SBAlgorithmCreateParagraph()
Output Function
SBUInteger actualLength = SBParagraphGetLength()
Expected Value = 4
Actual Value = 5
Hi,
The license at line 190 ( https://github.com/Tehreer/SheenBidi/blob/master/LICENSE#L190 ) contains the blueprint fields for the year of copyright and the copyright owner. I've noticed that https://spdx.org/licenses/Apache-2.0.html suggests to fill the fields. I couldn't find an open or closed ticket about this. I wonder: is this on purpose?
Best
Robert
Hi,
I am looking at a library for Bidi to pair up with Harfbuzz and FreeType for rendering text in a game engine. Do you know any project that is using SheenBidi with those so I can get a little headstart when working with these? Does SheenBidi support emojis? I am using utf8.
I guess if I input a string of utf-8 characters, when it will be revert direction, it's hard to handle it when it's represented with utf-8. So is there a demo for utf-16 ? I need to process uyghur characters.
If I'm not mistaken, README file contains a small mistake. It says this about SBParagraph
:
It represents a single paragraph of text processed with rules X1-I2. It provides resolved embedding levels of all the code units of a paragraph.
However, levels returned by SBParagraphGetLevelsPtr
does not seem to include L1 and L2 rules, only SBLine
includes those rules.
I tested with the following code point sequence:
0661 0009 0028 0662 0029
The default values of SBBidiType, SBGeneralCategory, and SBScript should be set according to the specification of UAX #44.
The main intention of SBScriptLocator
was to facilitate text shaping. Since text shaping is mostly done with an opentype layout engine, there should be a function to directly get the opentype tag of a script. A list of open type script tags is available at https://docs.microsoft.com/en-us/typography/opentype/spec/scripttags
Appveyor should be used to make sure that SheenBidi remains compatible with visual studio.
Hi, I would like to add support for SheenBidi in libraqm (issue), which uses the increasingly popular Meson build system - as do all of its current dependencies. It would be amazing if SheenBidi supported Meson as well, since it can automatically build libraries that also use it when they are not installed on the host system (details here). Given SheenBidi does not have a package in most distros, Meson really is a must-have for us. Is supporting it something you can do?
Thanks :)
At the moment the library calls malloc and uses the result without checking for NULL throughout.
For my use case this makes it unusable. Would it be possible to add allocation failure handling?
Parameters
encoding = SBStringEncodingUTF16
string = "abcdابجد\r\n"
paragraphOffset = 0
suggestedLength = 9
baseLevel = 0
Input Function
SBAlgorithmGetParagraphBoundary(&actualLength)
Expected Value = 10
Actual Value = 8
Currently, the script runs are identified in an iterating fashion. But sometimes an array turns out to be a better option to find out the overlapping regions of bidi runs and script runs. Technically it's not a big effort to write a utility function that fills out the script array by iterating over script runs. However, it would be really helpful if such a utility function is already provided in the library.
For example, it was a simpler approach for Java interoperability in Tehreer-Android. The relevant piece of code is available in ScriptClassifier.cpp
Hi!
Thank you for this library, it is exactly what I looked for as I don't want to include ICU in my project. I'm using it from a C++ code, so I needed extern "C"
, but it would be nicer, if this would be in the library's public headers at least.
I'm not sure which option would you like better: adding it to every public header or just for the ones declaring functions, so I decided not to open a PR about this, as it's a trivial thing anyway :)
When running SBAlgorithmGetParagraphBoundary() on a string that contains no separators (e.g. "This is a single line string."), the result of separatorLength is unchanged. For example, if you send the address to an uninitialized variable the result would be uninitialized. If there is no separator in the paragraph, I would expect it to be explicitly set to 0.
As Unicode 11 has been released, bidi type, general category, script, mirror and bracket lookups should be updated accordingly.
I checked with valgrind that on every run it reports
"==2247== Warning: unimplemented fcntl command: 1033
==2247== Thread 32:
==2247== Conditional jump or move depends on uninitialised value(s)
==2247== at 0x14B4107: SBAlgorithmCreateParagraph"
I tried to narrow down the part of memory which is uninitialized according to valgrind and it came out that if I will add in CreateParagraphContext function just after BidiChainInitialize additional code
for (int i = 0;i < length;++i) fixedTypes[i] = SBBidiTypeNil;
the valgrind is not reporting any problem.
I followed PopulateBidiChain code and it looks that in some cases it can omit initialization of some links in the chain. I do not understand sheenbidi well enough to say if this mean we have a data error in such case. Anyway application is not crashing because of it, the only visible problem so far is valgrind report.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.