Comments (11)
Looks like python-bidi will need to be modified to fix the original issue anyway, as it removes Boundary Neutral characters unconditionally.
I tried to play with the code, but this seems to be more involved than I thought it would be. So, here is how I’d do the segmentation:
- Resolve script for characters with Inherit and Common script property (see https://unicode.org/reports/tr24/#Common):
- Inherit (usually combining marks) take the script of the preceding character.
- Common also take the script of preceding character, additionally paired characters like brackets might be better if they take the same script e.g. latin (ARABIC) latin, the brackets should both take Latin script instead of the first taking Latin and the 2nd taking Arabic.
- Any remaining unresolved characters at the start of the string take the script of the next resolved character.
- Run bidi algorithm, segment into runs based on bidi level, and reorder the runs (but not the characters inside the each run). Run direction depends on its bidi level (odd RTL, even LTR).
- Split the text into same script and direction, pass this to HarfBuzz. Text passed to HarfBuzz should always be in logical order.
from fontgoggles.
Yes, your observation is correct! ( ͡° ͜ʖ ͡°)
from fontgoggles.
I've implemented @khaledhosny's scheme, more or less.
Regarding paired characters: I made "opening" chars (mirrored chars with category Ps
) look back, and "closing" chars (mirrored chars with category Pe
) look forward.
I considered doing pair matching, but it felt weird to do that level of text parsing to get script info. I ignored Pi
and Pf
as it's less clear which is opening and which is closing.
I ended up with a relatively simple scheme, that seems to match CoreText, at least in the examples that I tried. I bet it is far from perfect, but wow, BiDi is complex, and segmenting, too :)
I still use python-bidi to get bidi levels, but I don't use its reordering anymore.
from fontgoggles.
BiDi code shouldn't re order the chracters (or do mirroring), it should resolve the bidi levels and this can be used to split the text into runs and pass each run to HarfBuzz with the respective direction.
from fontgoggles.
Thanks for your feedback. These are my first steps in BiDi land, so please bear with me as I learn :)
(I'm using the python-bidi package, which does reordering, but also gives embedding information.)
I had some sort of segmenting working (still with reordering), but I felt I didn't know what I was doing so I gave up on that for now.
What I perceived as a problem with my approach to segmenting is that I lost HB's automatic script detection for the whole string. IBM-Plex-Sans-Arabic for example as a separate set of numerals to match the Arabic script, but a segment with numerals only won't be recognized as Arabic.
On the other hand, mixing Arabic and Latin currently doesn't work properly either, and one has to manually override the Script as Arabic.
Taking the Plex example: would it be acceptable if we let HB do the script detection per segment, and therefore by default show default numerals despite the overall script is Arabic? The user would have to set the script explicitly to Arabic to see the correct numerals.
I'm afraid of the added layer of complexity of having that work automatically out of the box. Or is script detection for the whole string doable?
Perhaps I'm framing the question wrong, but either way, I'd very much appreciate your input.
from fontgoggles.
Text should be segmented into runs that has the same direction, script and langauge. The first can (and should) be done automatically, language need to be set by the user.
Current bidi option will reorder the text then ask HarfBuzz to shape it in LTR direction, but this will certainly give wrong output at certain cases. I’m looking into that code now trying to see if it can be improved.
from fontgoggles.
Ok, thanks. Probably best to try to approach this from the Tests angle, and not how it's used in the application, as that complex and messy.
python-bidi helps with the direction detection. I'm curious about how to add script detection to the mix.
from fontgoggles.
Thanks so much, I'll see how far I can get with this.
from fontgoggles.
I considered doing pair matching, but it felt weird to do that level of text parsing to get script info. I ignored
Pi
andPf
as it's less clear which is opening and which is closing.
That is OK I guess, not many applications/layout libraries do paired characters matching, and the few I know do have a hard-coded small subset if paired characters to handle.
from fontgoggles.
I still use python-bidi to get bidi levels, but I don't use its reordering anymore.
Maybe something that could be contributed/go into a library?
from fontgoggles.
Maybe something that could be contributed/go into a library?
An API like that would be a good addition for python-bidi itself, but it seems to be barely maintained, and only super trivial PRs seem to get merged. Forking may be a good idea, or offering the author help, but I don't have enough interest and time at the moment to do that myself.
python-bidi contains a lot of complex but useful code, but its outside API is too primitive: it's focused on reordering an input string, and doesn't nicely expose its internals. I had to copy and alter some code before if was useful for FontGoggles.
from fontgoggles.
Related Issues (20)
- System requirements are not up to date HOT 3
- Need to upgrade to Python 3.10 HOT 3
- Show COLR layers in the glyph list
- No automatic mark feature when viewing designspace files HOT 8
- Show baseline / ascender / descender HOT 2
- Add support for selecting CPAL palette
- Ability to copy selection / specific cells from table HOT 1
- native support for arm64 architecture for M1 chip? HOT 3
- Add support for light mode / dark mode CPAL palette switching
- Add support for labeled CPAL palettes HOT 1
- Individual variable sliders per font
- FG listening to changes in the .gggls file
- allow to select skia as blackrenderer backend HOT 2
- Drop FreeType in favor of the HarfBuzz draw API, but using C, not Python pens HOT 17
- Some glyphs fails to render in COLRv1 variable font which renders fine in Chrome HOT 9
- Problem with certain COLRv1 glyphs in Noto-COLRv1.ttf HOT 14
- Export as SVG, PDF, Print HOT 2
- Zoom reset keyboard shortcut
- Feature request: Duplicate open file from within FontGoggles HOT 1
- Feature request: paragraph text view with line-breaking HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fontgoggles.