Giter Site home page Giter Site logo

Comments (15)

astiob avatar astiob commented on September 23, 2024 2

Discovery of the century incoming…
You know what, I feel silly. This is so simple and stupid; how did we spend years not realizing this?

GDI doesn’t do inline font fallback. At all.

GDI uses font linking—and nothing more.

The Uniscribe-based code path does inline font fallback, and its fallback choice can differ from GDI’s font linking.

My proof:

  • Microsoft’s docs (which gave me the idea):

    • This mentions:

      ExtTextOut will use Uniscribe when necessary resulting in font fallback. The ETO_IGNORELANGUAGE flag will inhibit this behavior and should not be passed.

      implying that without Uniscribe, font fallback doesn’t happen.

    • This mentions font linking together with GDI, whereas for font fallback, it mentions .NET and Uniscribe but not GDI.

      It also mentions:

      Font linking […] can be used [to] prevent […] text from being displayed as a default glyph (called tofu).

      implying that without font linking, undefined glyphs will indeed be displayed as tofu, not fall back to other fonts.

      (It does talk of how font linking “takes priority over font fallback”, but the way it is described in the next sentence makes me think that perhaps this is meant to say “font substitution”, which is GDI’s mechanism for defining font aliases, described further down the page, or perhaps this means something else completely.)

  • Targeted test (after reading those docs):

    • Arial has no font linking defined on my machine. Arial has a surprisingly wide glyph coverage, but it lacks a glyph for U+2025 TWO DOT LEADER in the General Punctuation block and it lacks Japanese glyphs. Arial has GPOS kerning, which makes it easy to tell when Uniscribe is activated by including the heavily-kerned string “WAT.” in the test. As this test reaffirms, General Punctuation isn’t treated as a “complex script”, but Japanese kana is.

    • Tahoma has a long list of linked fonts on my machine, the first of them being MS UI Gothic. Tahoma also has a surprisingly wide glyph coverage, and it does have a glyph for U+2025 TWO DOT LEADER, but it lacks a glyph for U+2196 NORTH WEST ARROW in the Arrows block. On the other hand, MS UI Gothic does have a glyph for that arrow. The Arrows block isn’t treated as a “complex script”. Tahoma doesn’t have kerning outside of Arabic.

    This ASS:

    Dialogue: 0,0:00:00.00,0:00:10.00,Default,,0,0,0,,{\an3\fs96\fnArial}Arial アリアル ‥↖WAT.
    Dialogue: 0,0:00:00.00,0:00:10.00,Default,,0,0,0,,{\an3\fs96\fnArial}Arial ‥↖WAT.
    Dialogue: 0,0:00:00.00,0:00:10.00,Default,,0,0,0,,{\an3\fs96\fnTahoma}Tahoma アリアル ‥↖WAT.
    Dialogue: 0,0:00:00.00,0:00:10.00,Default,,0,0,0,,{\an3\fs96\fnTahoma}Tahoma ‥↖WAT.
    Dialogue: 0,0:00:00.00,0:00:10.00,Default,,0,0,0,,{\an3\fs96\fnTahoma}Tahoma/MS UI Gothic ‥{\fs79.53398\fnMS UI Gothic}↖{\fs96\fnTahoma}WAT.
    

    (where the \fs for MS UI Gothic is calculated from Tahoma’s \fs and both fonts’ metrics in such a manner that the em size in pixels stays constant, as documented for font linking)

    displays:

    A screenshot of the above ASS.

    As we can see, kerning is applied in the bottommost line (with the Japanese) but not the one above it (without the Japanese), so one is rendered by Uniscribe and the other by GDI itself. The two glyphs absent from Arial are shown in the Uniscribe rendering but replaced by tofu in the GDI rendering. In the Tahoma lines, all glyphs are visible but the arrow uses different glyphs in Uniscribe and in GDI. GDI’s glyph exactly matches the explicitly-requested MS UI Gothic glyph, so it must be that GDI is applying font linking whereas Uniscribe is applying font fallback that isn’t based on font linking.

from libass.

astiob avatar astiob commented on September 23, 2024 1

Fallback works normally (for \h, Cyrillic, hiragana, and code points in the Latin Extended-A block that don’t have dedicated glyphs in the font) in runs rendered by Uniscribe (e. g. if hiragana is included), but absolutely everything (\h, Cyrillic and Latin alike) is displayed as .notdef in runs rendered by GDI.

from libass.

astiob avatar astiob commented on September 23, 2024 1

Uniscribe proper actually allows enabling and disabling both font linking and font fallback

Well, further tests suggest that while the API allows this, SSA_LINK actually does nothing. So we’re back to what I said earlier: modern Uniscribe doesn’t do font linking. I’m not sure whether it’s me or Microsoft* that is doing something wrong, but this is the effective behaviour I’m seeing.

* Wouldn’t be a huge surprise, seeing as e. g. GetCharacterPlacement actually completely ignores GCP_USEKERNING and always applies Uniscribe’s kerning except in ancient Windows without an extended language support pack**. This thread suggests SSA_LINK did do something 18 years ago, so it could easily be that Microsoft has changed something and rendered it ineffective since then.

** Yes, this whole GDI/Uniscribe business depends on whether the language support pack is installed. That is, the delegation to Uniscribe only happens when the pack is installed. Fun. I haven’t been able to find conclusively what this pack is, but my guess is it’s one of (or perhaps a DLL included in both of) the optional installs available via checkboxes in old Regional Settings (for those who don’t know: [1], [2], [3 with pictures]). My hopeful understanding is that it’s been included unconditionally since Vista, but it was still optional in XP. And XP was still popular in the early softsub era. Welp.

from libass.

frozenpandaman avatar frozenpandaman commented on September 23, 2024

Following since this is my screenshot + the error I ran into. Thanks for making the issue!

from libass.

TheOneric avatar TheOneric commented on September 23, 2024

Do the events’ Styles have any restrictive Encoding value set or so, which would block the usual Arial fallback other releases depend on (#42)? (ideally just provide a fully functioning file with all headers and Styles needed for the relevant Events)

Also, as this might be related to Encoding: which *VSFilter does the screenshot come from? At least MPC-HC ISR’s Encoding handling diverges from guliverkli(2)- and xy-VSFilter (and from libass too).

from libass.

astiob avatar astiob commented on September 23, 2024

The Encoding is 1 in the sample archive. A space is displayed when the font is not installed, but indeed “aaa” is shown in XySubFilter when it is. I haven’t found an explanation yet.

from libass.

TheOneric avatar TheOneric commented on September 23, 2024

Apologies, I somehow missed that the full script and not just the font is attached.

from libass.

astiob avatar astiob commented on September 23, 2024

Except that in another anomaly, I wanted to double-check that it’s using Uniscribe by verifying that I see kerning, but it doesn’t seem to be applying any kerning at all! But it should, per #237… According to fontTools’ TTX, the font has a version 0 kern table with a single format 0 subtable with coverage 1 (“horizontal”), which Microsoft says is supported on Windows 🤯

from libass.

astiob avatar astiob commented on September 23, 2024

Resaving from FontForge with the kerning additionally saved to GPOS confirms that kerning works and thus those are, indeed, Uniscribe runs. I’m surprised that kern wasn’t enough, but it’s probably irrelevant to this issue; I’ll just make note of this observation in #237.

from libass.

astiob avatar astiob commented on September 23, 2024

I’ve spent this hour transplanting bits and pieces from the font in #42 to the font here in hopes of finding which particular piece is preventing fallback/substitution, but nothing has helped so far :-(

from libass.

astiob avatar astiob commented on September 23, 2024

Finally managed to get the font to let \h fall back to another font:

It happens when I remove U+0022 QUOTATION MARK (and not any other particular code point, although I haven’t exhaustively tried each) from the font’s Windows cmap 🤯

I truly have no idea why this is. At first glance it might seem related to the quotation marks surrounding the \h in the ASS, but the behaviour (fallback or no fallback) stays the same even if I remove the quotation marks from the ASS.

from libass.

astiob avatar astiob commented on September 23, 2024

My gut feeling is that perhaps GDI probes some hardcoded list of code points (which includes U+0022) and if it sees glyphs for all of them, it blindly assumes the font supports a whole subset of Unicode (which includes U+00A0) and skips font fallback for any code points in that subset. And it’s weird, because like, don’t ASCII-only fonts exist? Why would you assume that the non-ASCII NBSP is supported?

In fact, I’m able to produce similar behaviour by removing all code points except the quotation mark and the Latin letters:

  • quotation mark, all uppercase & lowercase letters present: GDI uses this font and renders everything else (spaces and apostrophes) as .notdef
  • quotation mark, all uppercase letters present (but not lowercase): GDI uses this font, renders the lowercase letters as .notdef, but uses a substitute font for \h
  • further remove Y: same as above
  • remove X instead (even if Y is kept): GDI rejects this font entirely and renders everything in Arial

from libass.

astiob avatar astiob commented on September 23, 2024
  • quotation mark, all uppercase letters present (but not lowercase): GDI uses this font, renders the lowercase letters as .notdef, but uses a substitute font for \h
  • further remove Y: same as above

And now I can’t reproduce this! It worked just a few minutes ago! Now it’s rejecting the font and using Arial instead. I’ve been restarting MPC-HC between tests to clear any possible per-process caches, but maybe that’s not enough? Or did I make a mistake somewhere?

Meanwhile, a font with full sets of both uppercase and lowercase letters but without the quotation mark (or any other code points) is accepted, and \h uses a substitute font.

from libass.

astiob avatar astiob commented on September 23, 2024

Finally managed to get the font to let \h fall back to another font:

It happens when I remove U+0022 QUOTATION MARK (and not any other particular code point, although I haven’t exhaustively tried each) from the font’s Windows cmap 🤯

As noted in #237 (comment), U+0022 is actually among the four magical-constant code points that cause GDI to switch to Uniscribe if the font lacks glyphs for them. So the fallback works in this case because the string is rendered by Uniscribe, just as in the other Uniscribe cases in the earlier test above.

from libass.

astiob avatar astiob commented on September 23, 2024

It does talk of how font linking “takes priority over font fallback”, but the way it is described in the next sentence makes me think that perhaps this is meant to say “font substitution”, which is GDI’s mechanism for defining font aliases, described further down the page, or perhaps this means something else completely.

Turns out Uniscribe proper actually allows enabling and disabling both font linking and font fallback, but GDI’s various entry points configure it differently. TextOut (which VSFilter uses) enables font fallback, whereas GetCharacterPlacement (which VSFilter doesn’t use) doesn’t. My first thought was that perhaps this text means that if Uniscribe is called with both flags set, then linking indeed takes precedence over fallback, but some reverse engineering suggests that all of these paths enable linking, and yet it isn’t happening.

from libass.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.