Comments (15)
Discovery of the century incoming…
You know what, I feel silly. This is so simple and stupid; how did we spend years not realizing this?
GDI doesn’t do inline font fallback. At all.
GDI uses font linking—and nothing more.
The Uniscribe-based code path does inline font fallback, and its fallback choice can differ from GDI’s font linking.
My proof:
-
Microsoft’s docs (which gave me the idea):
-
This mentions:
ExtTextOut will use Uniscribe when necessary resulting in font fallback. The ETO_IGNORELANGUAGE flag will inhibit this behavior and should not be passed.
implying that without Uniscribe, font fallback doesn’t happen.
-
This mentions font linking together with GDI, whereas for font fallback, it mentions .NET and Uniscribe but not GDI.
It also mentions:
Font linking […] can be used [to] prevent […] text from being displayed as a default glyph (called tofu).
implying that without font linking, undefined glyphs will indeed be displayed as tofu, not fall back to other fonts.
(It does talk of how font linking “takes priority over font fallback”, but the way it is described in the next sentence makes me think that perhaps this is meant to say “font substitution”, which is GDI’s mechanism for defining font aliases, described further down the page, or perhaps this means something else completely.)
-
-
Targeted test (after reading those docs):
-
Arial has no font linking defined on my machine. Arial has a surprisingly wide glyph coverage, but it lacks a glyph for U+2025 TWO DOT LEADER in the General Punctuation block and it lacks Japanese glyphs. Arial has GPOS kerning, which makes it easy to tell when Uniscribe is activated by including the heavily-kerned string “WAT.” in the test. As this test reaffirms, General Punctuation isn’t treated as a “complex script”, but Japanese kana is.
-
Tahoma has a long list of linked fonts on my machine, the first of them being MS UI Gothic. Tahoma also has a surprisingly wide glyph coverage, and it does have a glyph for U+2025 TWO DOT LEADER, but it lacks a glyph for U+2196 NORTH WEST ARROW in the Arrows block. On the other hand, MS UI Gothic does have a glyph for that arrow. The Arrows block isn’t treated as a “complex script”. Tahoma doesn’t have kerning outside of Arabic.
This ASS:
Dialogue: 0,0:00:00.00,0:00:10.00,Default,,0,0,0,,{\an3\fs96\fnArial}Arial アリアル ‥↖WAT. Dialogue: 0,0:00:00.00,0:00:10.00,Default,,0,0,0,,{\an3\fs96\fnArial}Arial ‥↖WAT. Dialogue: 0,0:00:00.00,0:00:10.00,Default,,0,0,0,,{\an3\fs96\fnTahoma}Tahoma アリアル ‥↖WAT. Dialogue: 0,0:00:00.00,0:00:10.00,Default,,0,0,0,,{\an3\fs96\fnTahoma}Tahoma ‥↖WAT. Dialogue: 0,0:00:00.00,0:00:10.00,Default,,0,0,0,,{\an3\fs96\fnTahoma}Tahoma/MS UI Gothic ‥{\fs79.53398\fnMS UI Gothic}↖{\fs96\fnTahoma}WAT.
(where the
\fs
for MS UI Gothic is calculated from Tahoma’s\fs
and both fonts’ metrics in such a manner that the em size in pixels stays constant, as documented for font linking)displays:
As we can see, kerning is applied in the bottommost line (with the Japanese) but not the one above it (without the Japanese), so one is rendered by Uniscribe and the other by GDI itself. The two glyphs absent from Arial are shown in the Uniscribe rendering but replaced by tofu in the GDI rendering. In the Tahoma lines, all glyphs are visible but the arrow uses different glyphs in Uniscribe and in GDI. GDI’s glyph exactly matches the explicitly-requested MS UI Gothic glyph, so it must be that GDI is applying font linking whereas Uniscribe is applying font fallback that isn’t based on font linking.
-
from libass.
Fallback works normally (for \h
, Cyrillic, hiragana, and code points in the Latin Extended-A block that don’t have dedicated glyphs in the font) in runs rendered by Uniscribe (e. g. if hiragana is included), but absolutely everything (\h
, Cyrillic and Latin alike) is displayed as .notdef
in runs rendered by GDI.
from libass.
Uniscribe proper actually allows enabling and disabling both font linking and font fallback
Well, further tests suggest that while the API allows this, SSA_LINK
actually does nothing. So we’re back to what I said earlier: modern Uniscribe doesn’t do font linking. I’m not sure whether it’s me or Microsoft* that is doing something wrong, but this is the effective behaviour I’m seeing.
* Wouldn’t be a huge surprise, seeing as e. g. GetCharacterPlacement
actually completely ignores GCP_USEKERNING
and always applies Uniscribe’s kerning except in ancient Windows without an extended language support pack**. This thread suggests SSA_LINK
did do something 18 years ago, so it could easily be that Microsoft has changed something and rendered it ineffective since then.
** Yes, this whole GDI/Uniscribe business depends on whether the language support pack is installed. That is, the delegation to Uniscribe only happens when the pack is installed. Fun. I haven’t been able to find conclusively what this pack is, but my guess is it’s one of (or perhaps a DLL included in both of) the optional installs available via checkboxes in old Regional Settings (for those who don’t know: [1], [2], [3 with pictures]). My hopeful understanding is that it’s been included unconditionally since Vista, but it was still optional in XP. And XP was still popular in the early softsub era. Welp.
from libass.
Following since this is my screenshot + the error I ran into. Thanks for making the issue!
from libass.
Do the events’ Styles have any restrictive Encoding
value set or so, which would block the usual Arial fallback other releases depend on (#42)? (ideally just provide a fully functioning file with all headers and Styles needed for the relevant Events)
Also, as this might be related to Encoding
: which *VSFilter does the screenshot come from? At least MPC-HC ISR’s Encoding
handling diverges from guliverkli(2)- and xy-VSFilter (and from libass too).
from libass.
The Encoding
is 1 in the sample archive. A space is displayed when the font is not installed, but indeed “aaa” is shown in XySubFilter when it is. I haven’t found an explanation yet.
from libass.
Apologies, I somehow missed that the full script and not just the font is attached.
from libass.
Except that in another anomaly, I wanted to double-check that it’s using Uniscribe by verifying that I see kerning, but it doesn’t seem to be applying any kerning at all! But it should, per #237… According to fontTools’ TTX, the font has a version 0 kern
table with a single format 0 subtable with coverage
1 (“horizontal”), which Microsoft says is supported on Windows 🤯
from libass.
Resaving from FontForge with the kerning additionally saved to GPOS
confirms that kerning works and thus those are, indeed, Uniscribe runs. I’m surprised that kern
wasn’t enough, but it’s probably irrelevant to this issue; I’ll just make note of this observation in #237.
from libass.
I’ve spent this hour transplanting bits and pieces from the font in #42 to the font here in hopes of finding which particular piece is preventing fallback/substitution, but nothing has helped so far :-(
from libass.
Finally managed to get the font to let \h
fall back to another font:
It happens when I remove U+0022 QUOTATION MARK (and not any other particular code point, although I haven’t exhaustively tried each) from the font’s Windows cmap 🤯
I truly have no idea why this is. At first glance it might seem related to the quotation marks surrounding the \h
in the ASS, but the behaviour (fallback or no fallback) stays the same even if I remove the quotation marks from the ASS.
from libass.
My gut feeling is that perhaps GDI probes some hardcoded list of code points (which includes U+0022) and if it sees glyphs for all of them, it blindly assumes the font supports a whole subset of Unicode (which includes U+00A0) and skips font fallback for any code points in that subset. And it’s weird, because like, don’t ASCII-only fonts exist? Why would you assume that the non-ASCII NBSP is supported?
In fact, I’m able to produce similar behaviour by removing all code points except the quotation mark and the Latin letters:
- quotation mark, all uppercase & lowercase letters present: GDI uses this font and renders everything else (spaces and apostrophes) as .notdef
- quotation mark, all uppercase letters present (but not lowercase): GDI uses this font, renders the lowercase letters as .notdef, but uses a substitute font for
\h
- further remove
Y
: same as above - remove
X
instead (even ifY
is kept): GDI rejects this font entirely and renders everything in Arial
from libass.
- quotation mark, all uppercase letters present (but not lowercase): GDI uses this font, renders the lowercase letters as .notdef, but uses a substitute font for \h
- further remove Y: same as above
And now I can’t reproduce this! It worked just a few minutes ago! Now it’s rejecting the font and using Arial instead. I’ve been restarting MPC-HC between tests to clear any possible per-process caches, but maybe that’s not enough? Or did I make a mistake somewhere?
Meanwhile, a font with full sets of both uppercase and lowercase letters but without the quotation mark (or any other code points) is accepted, and \h
uses a substitute font.
from libass.
Finally managed to get the font to let
\h
fall back to another font:It happens when I remove U+0022 QUOTATION MARK (and not any other particular code point, although I haven’t exhaustively tried each) from the font’s Windows cmap 🤯
As noted in #237 (comment), U+0022 is actually among the four magical-constant code points that cause GDI to switch to Uniscribe if the font lacks glyphs for them. So the fallback works in this case because the string is rendered by Uniscribe, just as in the other Uniscribe cases in the earlier test above.
from libass.
It does talk of how font linking “takes priority over font fallback”, but the way it is described in the next sentence makes me think that perhaps this is meant to say “font substitution”, which is GDI’s mechanism for defining font aliases, described further down the page, or perhaps this means something else completely.
Turns out Uniscribe proper actually allows enabling and disabling both font linking and font fallback, but GDI’s various entry points configure it differently. TextOut
(which VSFilter uses) enables font fallback, whereas GetCharacterPlacement
(which VSFilter doesn’t use) doesn’t. My first thought was that perhaps this text means that if Uniscribe is called with both flags set, then linking indeed takes precedence over fallback, but some reverse engineering suggests that all of these paths enable linking, and yet it isn’t happening.
from libass.
Related Issues (20)
- Rendering: Wrong font used for mpv OSD on Fedora 39+ HOT 2
- Multiple \pos tags in one line, any way to use them? HOT 2
- Inline fallback fonts should be sized to main font’s EM height, not line height
- How to blend ASS_Image to a rgba bitmap HOT 5
- Rendering: natural line break punctuation position in RTL languages HOT 2
- Consider adding SHSTK support HOT 4
- [DirectWrite] Does not select the right font when 2 fonts have similar attributes HOT 4
- API to discard older events from memory HOT 2
- Rendering: Different case for a non-ASCII character doesn't find the font
- Rendering: Difference in font size with Roboto Medium in VSFilter and libass HOT 3
- checkasm struggles with PIC on (64-bit) Haiku HOT 13
- Separate muxed/memory fonts from system fonts
- build: Windows 10 + msys2 (with winiconv): passing argument 2 of 'iconv' from incompatible pointer type HOT 2
- warning: 'calloc' sizes specified with 'sizeof' in the earlier argument and not in the later argumentbuild: HOT 3
- Rendering: Incorrect font variant being selected HOT 13
- Overhaul fontselect: individual, cached, full-detail queries HOT 6
- Rendering: Start Delay Issue in libass Rendering Karaoke Subtitles HOT 10
- Discuss: -ffast-math and other math optimization flags HOT 10
- How to select Cascadia Mono font? HOT 8
- [DirectWrite] libass doesn't always find a fallback font HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libass.