Giter Site home page Giter Site logo

Comments (11)

khaledhosny avatar khaledhosny commented on June 4, 2024 4

Looks like python-bidi will need to be modified to fix the original issue anyway, as it removes Boundary Neutral characters unconditionally.

I tried to play with the code, but this seems to be more involved than I thought it would be. So, here is how I’d do the segmentation:

  1. Resolve script for characters with Inherit and Common script property (see https://unicode.org/reports/tr24/#Common):
    • Inherit (usually combining marks) take the script of the preceding character.
    • Common also take the script of preceding character, additionally paired characters like brackets might be better if they take the same script e.g. latin (ARABIC) latin, the brackets should both take Latin script instead of the first taking Latin and the 2nd taking Arabic.
    • Any remaining unresolved characters at the start of the string take the script of the next resolved character.
  2. Run bidi algorithm, segment into runs based on bidi level, and reorder the runs (but not the characters inside the each run). Run direction depends on its bidi level (odd RTL, even LTR).
  3. Split the text into same script and direction, pass this to HarfBuzz. Text passed to HarfBuzz should always be in logical order.

from fontgoggles.

typoman avatar typoman commented on June 4, 2024 1

Yes, your observation is correct! ( ͡° ͜ʖ ͡°)

from fontgoggles.

justvanrossum avatar justvanrossum commented on June 4, 2024 1

I've implemented @khaledhosny's scheme, more or less.

Regarding paired characters: I made "opening" chars (mirrored chars with category Ps) look back, and "closing" chars (mirrored chars with category Pe) look forward.

I considered doing pair matching, but it felt weird to do that level of text parsing to get script info. I ignored Pi and Pf as it's less clear which is opening and which is closing.

I ended up with a relatively simple scheme, that seems to match CoreText, at least in the examples that I tried. I bet it is far from perfect, but wow, BiDi is complex, and segmenting, too :)

I still use python-bidi to get bidi levels, but I don't use its reordering anymore.

from fontgoggles.

khaledhosny avatar khaledhosny commented on June 4, 2024

BiDi code shouldn't re order the chracters (or do mirroring), it should resolve the bidi levels and this can be used to split the text into runs and pass each run to HarfBuzz with the respective direction.

from fontgoggles.

justvanrossum avatar justvanrossum commented on June 4, 2024

Thanks for your feedback. These are my first steps in BiDi land, so please bear with me as I learn :)

(I'm using the python-bidi package, which does reordering, but also gives embedding information.)

I had some sort of segmenting working (still with reordering), but I felt I didn't know what I was doing so I gave up on that for now.

What I perceived as a problem with my approach to segmenting is that I lost HB's automatic script detection for the whole string. IBM-Plex-Sans-Arabic for example as a separate set of numerals to match the Arabic script, but a segment with numerals only won't be recognized as Arabic.

On the other hand, mixing Arabic and Latin currently doesn't work properly either, and one has to manually override the Script as Arabic.

Taking the Plex example: would it be acceptable if we let HB do the script detection per segment, and therefore by default show default numerals despite the overall script is Arabic? The user would have to set the script explicitly to Arabic to see the correct numerals.

I'm afraid of the added layer of complexity of having that work automatically out of the box. Or is script detection for the whole string doable?

Perhaps I'm framing the question wrong, but either way, I'd very much appreciate your input.

from fontgoggles.

khaledhosny avatar khaledhosny commented on June 4, 2024

Text should be segmented into runs that has the same direction, script and langauge. The first can (and should) be done automatically, language need to be set by the user.

Current bidi option will reorder the text then ask HarfBuzz to shape it in LTR direction, but this will certainly give wrong output at certain cases. I’m looking into that code now trying to see if it can be improved.

from fontgoggles.

justvanrossum avatar justvanrossum commented on June 4, 2024

Ok, thanks. Probably best to try to approach this from the Tests angle, and not how it's used in the application, as that complex and messy.

python-bidi helps with the direction detection. I'm curious about how to add script detection to the mix.

from fontgoggles.

justvanrossum avatar justvanrossum commented on June 4, 2024

Thanks so much, I'll see how far I can get with this.

from fontgoggles.

khaledhosny avatar khaledhosny commented on June 4, 2024

I considered doing pair matching, but it felt weird to do that level of text parsing to get script info. I ignored Pi and Pf as it's less clear which is opening and which is closing.

That is OK I guess, not many applications/layout libraries do paired characters matching, and the few I know do have a hard-coded small subset if paired characters to handle.

from fontgoggles.

adrientetar avatar adrientetar commented on June 4, 2024

I still use python-bidi to get bidi levels, but I don't use its reordering anymore.

Maybe something that could be contributed/go into a library?

from fontgoggles.

justvanrossum avatar justvanrossum commented on June 4, 2024

Maybe something that could be contributed/go into a library?

An API like that would be a good addition for python-bidi itself, but it seems to be barely maintained, and only super trivial PRs seem to get merged. Forking may be a good idea, or offering the author help, but I don't have enough interest and time at the moment to do that myself.

python-bidi contains a lot of complex but useful code, but its outside API is too primitive: it's focused on reordering an input string, and doesn't nicely expose its internals. I had to copy and alter some code before if was useful for FontGoggles.

from fontgoggles.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.