Giter Site home page Giter Site logo

Comments (92)

kenlunde avatar kenlunde commented on June 29, 2024 1

Please continue to be patient with regard to proper HK support in Source Han Sans / Noto Sans CJK. Several factors are contributing to the delay, one of which is adhering to the forthcoming HKSCS-2015 standard.

from source-han-sans.

ShikiSuen avatar ShikiSuen commented on June 29, 2024

Could anybody tell me whether Korean Hanja Glyphs follows Kangxi Dictionary–style glyphs or not?

from source-han-sans.

DerkZech avatar DerkZech commented on June 29, 2024

@ShikiSuen The standardized glyphs in South Korea are not identical to the gylphs used in Kangxi Dictionary, but they can be considered the "closest" relatives when compared to the standards in China, Japan, and Taiwan.

A. Comparison of modern glyph standards:
d27c35ca-19ca-11e4-9b88-cfe1791f75b0

Note: some of the variations above are not solely due to differences in "glyph standard" (字形標準), instead they can be considered different variant characters (異體字).

B. Kangxi Dictionary form vs. South Korean standard
1

As you can see, the character forms in Kangxi Dictionary (KD) are often inconsistent, such as 像 and 象 and the four short horizontal strokes in the 雨-top. On the other hand, the South Korean hanja standard has eliminated much of these inconsistencies along with the taboo characters (避諱字) in KD. Another minor but significant change in the Korean standard is the consistent and more apparent effect of "broken strokes" (斷筆) for "bent" strokes (折筆). A print-specific broken stroke refers to a stroke that is written as one stroke but appears as two strokes in print. For example, if you compare 巛, 幺, and 鼠 in SK standard and KD forms, you'll notice that the KD forms 斷筆 are either less obvious (reduced) or just non-existent. The differences between the two 瓜 above, however, are not due to a print-specific 斷筆, because the two strokes of 瓜 in the Korean and Japanese standards are not only printed but also written. A more appropriate comparison of a printed 斷筆 in 瓜 would look something like this instead:

2

(left: w/o 斷筆; right: w/ 斷筆)

In summary, while the Korean standard of hanja is closest to the character forms in Kangxi Dictionary, the two are still non-identical. The Kangxi Dictionary forms, moreover, are rather inconsistent and less predictable due to the taboo characters. In fact, in terms of scale and purpose, the KD forms per se can hardly be compared to the national standard in practice today. In this sense, the KD forms cannot really be considered a "standard."

By the way, I don't think this conversation is relevant to the intended subject here, so you might want to discuss this elsewhere.

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

The lack of use in Korea, at least over the past 100 years, no doubt explains the more conservative nature of their standard shapes. Greater usage of a script tends to result in greater innovations, which translate into changes.

from source-han-sans.

tamcy avatar tamcy commented on June 29, 2024

Just want to add that in the above screenshot, "并/研" on the "H" row are the reference styles released by the Office of the Government Chief Information Officer (OGCIO). Their form are the same as J/C/T if Education Bureau's "List of Graphemes of Commonly-used Chinese Characters" is considered. I believe this is one of the challenges if a HK variant of the font is to be developed.

from source-han-sans.

orangeparanoid avatar orangeparanoid commented on June 29, 2024

I agree that a Hong Kong font should be different.

I can find the only open-to-the-public authoritative source for Hong Kong Chinese characters here:
(The Chinese words are images. So, they won't be affected by the font installed on my computer.)

http://lcprichi.hkbu.edu.hk/

(You can copy and paste a Chinese character in the 輸入單字 box to get the result.)

適合小學生學習的中文科常用字,共3000個。本字表內的所有單字採用由華康科技(香港)有限公司贊助的「華康香港標準宋體」,符合香港教育學院《常用字字形表》所列的標準字形,使用者可參考本字表內單字的寫法作為教學用途。

According to this site run by Hong Kong Baptist University, this list uses a commercial font from 華康科技(香港). The font follows 常用字字形表 (Common Word List). The site says, for teaching purposes, the way of handwriting can be followed for reference.

This site is "A Study of the Chinese Characters Recommended for the Subject of Chinese Language in Primary Schools" sponsored by the 優質教育基金 (Quality Education Fund).

About Quality Education Fund:
In October 1997, the Chief Executive announced in his Policy Address the establishment of the Quality Education Fund (QEF) to finance projects for the promotion of quality education in Hong Kong.
http://www.qef.org.hk/english/aboutus/objective_scope.html

That means, in my opinion, this site provides a "somewhat standard" supported by the HKSAR Government since the Government supports this site financially. (although the HKSAR Government did not announce a standard and force people to follow it.)

This site provides 小學中文科常用字表 (Common Word List for Primary School Chinese).

寺: The second horizontal stroke is longer than the first horizontal stroke on the top. This is the Hong Kong way of writing. 待侍特等 are affected.

Other different Hong Kong VS Taiwan characters include:
黃統充溫令戶直

I agree that 常用字字形表 (Common Word List) is the only viable option in terms of "Hong Kong standard" as there are no other Hong Kong standards currently implemented or suggested by the government.

Teachers have a headache when they see the current Taiwan font commonly used on Windows. The Taiwan font is just different from how teachers in Hong Kong teach the kids to write the Chinese characters.

The Chinese teachers suffer from the pervasive Taiwan font. The teachers who use Chinese to teach Social Studies and Mathematics are also affected.

Can we agree on using this "somewhat standard" indirectly financially supported by the HKSAR Government?

http://lcprichi.hkbu.edu.hk/

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

I just modified the fifth paragraph in the initial post for this Issue to clarify the plan for supporting HK.

Also, all of the discussions in this and other Issues have been very helpful.

from source-han-sans.

extc avatar extc commented on June 29, 2024

The 小學中文科常用字研究 (http://lcprichi.hkbu.edu.hk/) was intended to modify 《小學常用字表》 2600 字 as stated in the "研究方法" section of the website. It is published in 2003, it suggested which word should be included (that is the scope) in the Primary School Chinese language learning; while the HK standard we are talking about is 《常用字字形表》, which was the writing form or writing style stardard. The English name of this book is "List of Graphemes of Commonly-used Chinese Characters".

The 小學學習字詞表 published in 2007 was a result of updating the 小學常用字表 in 1990 as stated in the '前言' of the book:
建基於大型語言調查,參考各地最新研究成果,循科學方法更新學習詞匯,以適應社會語言發展情況和小學生的學習需要。本字詞表包含的字詞,比 1990 年《小學常用字表》增加 605 字、刪減 34 字,比 1996 年《小學教學參考詞語表 (試用)》增加 3,911 詞語、刪減 909 詞語。

The book included 常用字字形表 as appendix. 常用字字形表 is re-arranged in 2012 with 'Cantonese and Putonghua Pronounciation and Simple English Explanations'. You can get the description under the Education Bureau, HK website:
http://www.edb.gov.hk/tc/curriculum-development/kla/chi-edu/resources/primary/lang/curriculum-materials.html

Here is the content of the description in the Microsoft Word document:
《常用字字形表》於1986年由前教育署語文教育學院編訂,收4,721字;經過多次修訂,至1993年已增收至4,759字。這次據1993年修訂本重排,按音義必須完全相同的嚴格標準檢視列出的異體字,對原版中意義和用法有些微差異的字體,不再當作異體字處理;字形表重排本增收至4,762字。
字形表原為手寫本,這次為方便讀者檢索,以電腦造字重排。在尊重原有研究的前提下,依照原版體例和系統僅作個別調整、補充。
最後,依照原版部首,按歷次修訂校正的筆畫,重排字序和字碼,並以雙色排印,附編原版《異體字表》,以方便讀者查考。
《常用字字形表》刊行二十年,已成為香港教師、家長、編輯共用的參考字形規範。這次配合《香港小學學習字詞表(試用)》的編印而重排作附錄,期望能把這作用發揮得更好。

Therefore, one should consider to get a copy of this book or have a look at the official lexical_ch website:
http://www.edbchinese.hk/lexlist_ch/

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

One consequence of splitting TW and HK is that the Version 1.000 fonts that include "TWHK" in their names (PostScript and Family) will be changed to instead use "TW" in Version 1.001. This change will affect existing documents. But, making such a change earlier in a font's lifetime is less painful than doing so at a later time.

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

FYI, the rendering at http://lcprichi.hkbu.edu.hk/ is based on a previous version of the 《常用字字形表》 "List of Graphemes of Commonly-used Chinese Characters", namely the 2000 version.

In 2007, the officially suggested strokes were changed to conform more to the commonly accepted forms, and the new list was merged into 小學學習字詞表 which can be found in http://www.edbchinese.hk/lexlist_ch/.

Numerous seminars for teachers were held in Hong Kong detailing the reasons for the change, e.g. the revival of the bottom tick for the word 求 which had a straight stroke downward in the 2000 versions and before. Practically no one wrote the word without the bottom tick, so the tick was added back. Ditto for 潛 where practically no one wrote the word with the top right hand component having the vertical stroke cross the first line, so the more common form (vertical stroke meets first-line, i.e. = Taiwanese version) was adopted as "suggested form" instead.

小學學習字詞表 is provided by the Education Bureau directly, thus it is also of more relevance than the one provided by HKBU (http://lcprichi.hkbu.edu.hk/) by the QEF.

from source-han-sans.

justinrleung avatar justinrleung commented on June 29, 2024

Other than the List of Graphemes of Commonly-used Chinese Characters, Hong Kong has other guidelines, which are specifically for computer fonts: the Reference Guide on Song Style (Print Style) Character Glyphs for Chinese Computer Systems in Hong Kong and the Reference Guide on Kai Style Character Glyphs for Chinese Computer Systems in Hong Kong (http://www.ogcio.gov.hk/en/business/tech_promotion/ccli/cliac/glyphs_guidelines.htm). The glyphs in these two guidelines differ slightly from the List of Graphemes, but they may be more relevant.

from source-han-sans.

ShikiSuen avatar ShikiSuen commented on June 29, 2024

@justinrleung
Could you please also post the Traditional Chinese version of that guideline webpage?

from source-han-sans.

justinrleung avatar justinrleung commented on June 29, 2024

Traditional Chinese: http://www.ogcio.gov.hk/tc/business/tech_promotion/ccli/cliac/glyphs_guidelines.htm
Simplified Chinese: http://www.ogcio.gov.hk/sc/business/tech_promotion/ccli/cliac/glyphs_guidelines.htm

from source-han-sans.

ShikiSuen avatar ShikiSuen commented on June 29, 2024

@justinrleung
Thanks for your offer (which could let me know more on HongKong glyph standard) even though I still prefer CNS11643.

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

There are notable discrepancies between the List and the Guideline; Approved Primary and Secondary School textbooks for the Chinese Language subject require strict adherence to the forms in the List instead of the guideline. The guideline is merely of reference nature and provides a few technical requirements (說 rendered as 説 to work around side-effect of source separation rule). Should there be discrepancy, the form in the List.

Note the version of the list, the 2007 version was not published separately but is integrated into 香港小學學習字詞表 Lexical Lists for Chinese Learning in Hong Kong (Available Online via http://www.edbchinese.hk/lexlist_ch/)

An added a note, in the file https://github.com/adobe-fonts/source-han-sans/blob/master/SourceHanSans_TWHK_sequences.txt, under the section #HK
9F9C FE00; Standardized_Variants; CID+47471
is actually for Taiwan.
Hong Kong should actually use
9F9C FE01.

Source [a]: http://www.edbchinese.hk/lexlist_ch/
Source [b]: http://www.edbchinese.hk/lexlist_ch/charform.swf?cidx=4762

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

Same for 6148 FE00
Should use something similar to 6148 FE01 (top part radical)
Source: http://www.edbchinese.hk/lexlist_ch/charform.swf?cidx=1401

from source-han-sans.

tamcy avatar tamcy commented on June 29, 2024

Some of the forms supplied by EDB is really surprising. I never saw the "standard" form of "於" and "慈" being used in real life. In this case I would oppose using such glyphs in the font for practical reason. After all, the commonly written form of "於" and "慈" are still accepted as a variant form.

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

@tamcy well the 茲 variant of 慈/滋 is used in all approved Chinese textbooks for primary and secondary school, though I clearly remember the teacher remarking she has been writing the 兹 version for her whole life while marking our dictation.

First, a bit of history on etymology:

According to Shuowenjiezi, (authored in the Han dynasty),

慈,…从心茲聲… (composed of 心, 茲 is the sound)
滋,…从水茲聲… (composed of 水, 茲 is the sound)

However, according to the Kangxi Dictionary (AD 1716) entry 玆 [3], it explicitly states that 玆 and 茲 are same sound but different meaning. It also quotes sources where the characters have been messed up, and the dictionary tries to distinguish it. Note that the entry for 玆 lacks the dots for 玄, because the word 玄 was Kangxi's personal name. It quotes Yupian saying 玆 is also written as 滋 (dotless 玄).

On the other hand, the main entry for 滋 [1] writes it with a dot. An inclusion of the dot likely implies it recognizes that 滋 does not compose of two 玄, but explicitly the form 兹. As the Kangxi dictionary was press printed, and the fact that the dot is consistently printed in the 滋 entry, it is clear that there was a deliberate effort to create two different word casts (字粒, the carvings used in a letterpress priting). This would imply a recognition that there exists two characters, one as 氵 + 兹, and one as 氵 + 玆.

Futhermore, 兹 does not exist as another character. However, the right hand side of the seal script version of 滋 and the seal script version of 玆 are exactly the same. This would imply that 玆 and 兹 are the same words.

This suggests that the Kangxi dictionary is contradicting itself into what exactly the right hand side of 滋 is supposed to be.

To note, 茲 is often hand-written as 兹 in Regular script, of which the Ming/Song Script is based on, which may suggest that Kangxi Dictionary had possibly, at some point of the editorial process, been mislead by the more common form, and used the incorrect seal script character.

According to Shuowenjiezi (Duan Yucai notated) (AD1907) (《說文解字注》段玉裁, Qing dynasty) writes 慈,从心茲聲 . A more explicit text exists at the 滋 entry, saying that 滋,从水茲聲 (composed of 水, 茲 is the sound), and pointing out that various dictionaries have got their Seal Script form incorrect, and thus are using the wrong form "玆" [2].

Most of the words with the ⺿ radical before the MOE standardization were often written as 丷 by people and calligraphers alike. During the MOE standardization process, characters whose etymology related to grass were assigned a standard form using the ⺿ radical, and those that were not (e.g. 夢) using the ⻀ or 䒑. However, the MOE notes during the standardization process explicitly said that characters containing 茲 should be written as 兹, and the single word with ⺿ radical. This was likely seen as an anomaly or unnecessary inconsistency by the people who drafted the local list, as the 滋慈 was written with a ⺿ radical from the first version.

Meanwhile for 於, not many other commonly used character uses 人 on the top right hand side. Personally, I have always written it with 丿一 instead of 人. All characters that consist of 方 on the left have the 人 part written as 丿一, such as 旗 / 族. Although the character 於's roots have nothing to do with 方 and 人, many other words such as 差 and 羞 have completely different structure in seal script, but have been assigned the same clerical script component simply for consistency and ease of writing. Using 人 at the top right of 於 would be inconsistent with all other words composed of 方 and 人; and differentiating it with words of different origin would provide little added benefit in understanding of the word. My guess is it was an explicit change to ease learning.

Teachers in Hong Kong in general adhered to the MOE standard until the promulgation of the local standard, but even so teachers are not mandated to correct minor differing forms. Also, no law requires media use appropriate fonts unlike in PRC. Given that the fonts used nowadays are primarily based on the forms standardized by Taiwan, it's possible that normal people would not see these forms printed outside of the education context. Proper fonts strictly adhering to the local standard were never populized, and an open-source font adhering to the standards are lacking.

It is debatable for certain characters of whether they should stick to the standard or not; I think it would be better to adhere to the standard for the first revision, then change certain characters to other forms based on actual use (with statistical justifications).

[1] http://www.kangxizidian.com/kangxi/0642.gif
[2] http://www.gg-art.com/imgbook/view.php?word=%D7%CC&bookid=53&book_name=%CB%B5%CE%C4%BD%E2%D7%D6%D7%A2
[3] http://www.kangxizidian.com/kangxi/0725.gif

(Note I have used the scanned form intentionally; many online versions often messed up the characters and used them interchangeably, and did not make any sense)

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

@hfhchan You are seriously misunderstanding the scope and purpose of the Standardized Variants.

<9F9C FE00> corresponds to U+F907, which corresponds to Hong Kong SCS 0x8BF8. See:

9F9C FE00; CJK COMPATIBILITY IDEOGRAPH-F907;
9F9C FE01; CJK COMPATIBILITY IDEOGRAPH-F908;

U+9F9C (as a bare characters, and not as part of a Standardized Variant) corresponds to Big Five 0xC074.

Both glyphs should be available for Hong Kong use: U+9F9C is included by virtue of being included in Big Five, which forms the foundation for Hong Kong SCS; and <9F9C FE00> (U+F907) is included because it is part of Hong Kong SCS proper.

About <6148 FE01>, such a Standardized Variant does not exist, and one is not necessary. There is a one-to-one correspondence between a CJK Compatibility Ideograph and a Standardized Variant, meaning that there are exactly 1,002 of each. If a single CJK Compatibility Ideograph has multiple sources, and if the representative glyphs for the sources differ, it is up to the implementation to use the appropriate glyph. In this way, the situation is no different than CJK Unified Ideographs.

<6148 FE00> corresponds to U+2F8A6, which corresponds to Hong Kong SCS 0xFC77, and U+6148 corresponds to Big Five 0xB74F. In other words, both glyphs are expected to be in Hong Kong fonts, but at different code points.

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

Thanks to Dr. Lu (IRG Rapporteur), I am now in possession of the printed version of 香港小學學習字詞表. It seems that pp 432 through 579 will be most helpful.

hksar-2007

from source-han-sans.

justinrleung avatar justinrleung commented on June 29, 2024

I have a print copy of 常用字字形表 (2012), which is the same list in the appendix of 香港小學學習字詞表. I haven't examined it with the print copy of the 香港小學學習字詞表, but I've found minor discrepancies between the glyphs in the online version of 香港小學學習字詞表 and the print copy of 常用字字形表 (2012).

(1) The middle dot in 必 is different.

香港小學學習字詞表:
image

常用字字形表 (2012):
image

This difference also appears in some characters that have the 必 component, like 蜜 and 泌:

image
image
image
image

It is interesting to note that there is no difference in other characters with the 必 component, like 密 and 祕:

image
image
image
image

(2) The left dot/slash in 示 is different.

image
image

This difference also appears in some characters that have the 示 component, like 票, 祭, 漂, 剽, 瞟, 驃, 鏢, 襟:

image
image
image
image
image
image
image
image
image
image
image
image
image
image
image

It is interesting to note that there is no difference in other characters with the 示 component, like 禁, 禦, 飄, 嫖, 標, 鰾:

image
image
image
image
image
image
image
image
image
image
image
image

It is also interesting to note that the 2012 常用字字形表 still has a left dot in 瓢:

image
image

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

I just checked all of the examples that you posted, and when compared to those on pp 432 through 579 in the printed version of 香港小學學習字詞表 that I now have (dated 2007), they are the same.

from source-han-sans.

beachmat avatar beachmat commented on June 29, 2024

So are the Hong Kong issues solely to do with the characters in Hong Kong SCS? Or are there characters in Big 5 that also need different glyphs for Taiwan and Hong Kong?

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

In both big5 and hkscs. The CLIAC is currently reviewing the industry guidelines. I think the irg docs have mention of this.

from source-han-sans.

beachmat avatar beachmat commented on June 29, 2024

So basically you should use a HK font only for Hong Kong traditional text, and not for Taiwan traditional? I know that characters outside Big 5 come up often in Taiwanese text. In the past we used to use font vendor extensions. When it happened the other day I used a HK font which I guess was a mistake, because it would change some Big 5 glyphs as well, yes?

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

yes, you are right. however the glyph differences are somewhat minor. usually the differences are just distracting.

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

to note that the taiwan glyph standard is merely a standard, it is not that prevalent in daily use. In Taiwan and Hong Kong, the Monotype and Dynacomware fonts are usually used, which resemble (but not exact matches of) the glyph standard for Hong Kong and the glyphs used in traditional movable-type printing presses in non-simplified Chinese areas.

from source-han-sans.

beachmat avatar beachmat commented on June 29, 2024

So if I understand you correctly, the Big 5 glyphs in Monotype and Dyna fonts are what are used in both Hong Kong and Taiwan, and Taiwanese people don't worry too much if they follow a Hong Kong style and don't match the Taiwan MOE standard? If so, then the issues Ken mentions are basically to do with the HKSCS characters?

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

The Big5 glyphs in Monotype and Dyna fonts are what are used in both Hong Kong and Taiwan

Yes. Though the monotype and dynacomware fonts for both markets usually include both Big-5 and HKSCS characters.

Taiwanese people don't worry too much if they don't match the Taiwan MOE standard?

Yes, at least the fonts used in normal contexts, i.e. newspapers, press and advertisements, do not adhere to the Taiwan MOE standard (and will more closely resemble the Hong Kong glyph standards if they have used the Monotype fonts).

If so, then the issues Ken mentions are basically to do with the HKSCS characters?

No. Hong Kong standardized glyphs for many characters encoded by Big5 are also different than the MOE standardized glyphs from Taiwan. Such differences range from stroke differences to (systematic) use of different radicals/components in characters.

from source-han-sans.

beachmat avatar beachmat commented on June 29, 2024

Right, thanks. So basically there are differences in Big 5 characters for HK and TW, but Taiwanese people tolerate HK forms out of necessity? In that case, they should like Source Sans Han/Noto Sans, I guess. But just to make it more complicated, there is also the issue that Taiwan MOE forms are unpopular in Taiwan, yes?

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

Put it this way, the Taiwan MOE standardized form, established circa 80s, has favored certain character forms to better reflect their etymology. This has been absurd to the level that involved in the importing of a word component that originally only existed in the Running script, into the Regular script and Ming/Song script. It was seen as minor but distracting attempts to re-engineer written Chinese. Therefore, the uptake of the MOE standardized forms had been low due to its deviance from the customs and norms.

It is not until recently that this form was used in casual contexts as this form is used by the system default font (Microsoft Jhenghei) for Windows Vista and up. However, these standardized forms are rarely seen in advertisements and newspapers due to the criticism that these forms are neither natural or not aesthetically suitable. Source Han Sans effectively brings the MOE standardized forms to other platforms. (To note, it was pointed out that in an older thread that the traditional Kang-xi resembling fonts are more preferred by Taiwan people, and another argued the MOE forms are better. Nevertheless, the Kang-xi resembling fonts (provided by Dynacomware) are more popular in Taiwan. Therefore which is more beautiful is likely a preference issue.)

Meanwhile, the HK government has taken a much softer approach to the standardization in the 90s. Glyph differences from norms are generally at the stroke level. The HKSARG has refrained from swapping out components that are different to normal writing conventions. Thus, the HK forms are very likely to be more similar to the actual forms used by the Taiwanese people, especially those who have been educated before the introduction of the MOE standard into the primary and secondary curriculum.

The situation is never that clear cut, as the huge number of Chinese users means a huge variation in writing habits and preference of forms. The perceived majority preference could also be affected by confirmation bias. It would be infeasible to make everyone happy. Therefore I guess the approach Source Han Sans can take is to follow the national standards as strictly as possible where it matters most.

from source-han-sans.

kcwu avatar kcwu commented on June 29, 2024

From the other point of view:
Taiwan MOE standardized form is already taught in school more than twenty years. No matter the design is good or bad, reasonable or not, middle age or younger people are familiar with TW MOE form and portions of them feel this form is better/standard.

It's true that it is not so popular used. But to some extent it's chicken and eggs problem -- some people desired TW MOE form fonts but there are very few choices available. Since later 90s, less font foundries in Taiwan created new fonts. People are (kind of) forced to use non-TW MOE form fonts. And needless to say users' first font choice -- OS default font, Ming in Windows at that time -- is non-TW MOE form.
BTW, the second popular choice -- Standard-Kai 標楷體 in Windows, is using TW MOE form.
(I understand TW MOE form has some drawback, but I'm not here to argue that)

My main point is that TW MOE form is not so rare used and it may be not so fair to judge only by existing usage.


Sometimes there is an opinion to use HK form for TW or vice versa. No matter HK is softer, no matter TW MOE form, Kang-xi or Dyna is better or not, there are a small subset of component/glyphs conflicting between TW and HK. In that subset, both sides are very unhappy the writing style of the other side. No simple solution (single font) to fit all. That's the problem that this issue (#48) trying to solve.

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

Even with Hong Kong, the chicken and egg problem has occured, some parts of the Hong Kong standard have not found mainstream use, partially because lack of fonts that use these forms.

A good thing for Taiwan is that the new "Ming font" PMingLiu font in Windows Vista and up are in the TW MOE form. But bad thing is no font for Hong Kong is provided with any operating system. As also mentioned, the mainstream font used for Regular Script is Standard-Kai 標楷體 in Hong Kong as well, which have systematic differences with local norms.

So all and all, it is a good thing that Adobe is now experimenting with producing a font that respect the HK form separate to the TW version, so both sides can exist peacefully. :)

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

I can confirm that the scope of the Hong Kong glyph issue covers both Big Five and Hong Kong SCS proper.

The first stage will be to build an experimental Hong Kong font that repurposes existing glyphs to the extent possible. This will help to gauge the effectiveness of having a separate Hong Kong instance.

One particular difficulty in handling two types of Traditional Chinese in a Pan-CJK font is that the few applications that support the 'locl' GSUB feature have only one notion of Traditional Chinese, meaning that language-tagging cannot support both. In OpenType, both can be supported because there are separate language tags for Traditional Chinese in Taiwan (ZHT) and Hong Kong (ZHH). So, at a practical level, language-tagging cannot be relied upon to differentiate the two types of Traditional Chinese. Instead, selecting separate fonts, each of which include a different default 'cmap' table, must be used.

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

Suppose I use and specify the font as "Source Han Sans", locl aware applications will always render as the Taiwanese version, unless I explicity specify the font to "Source Han Sans HK", and have the corresponding Subset OTF installed?

In OpenType, both can be supported because there are separate language tags for Traditional Chinese in Taiwan (ZHT) and Hong Kong (ZHH).

Meanwhile, the Language-specific OpenType/CFF, OpenType/CFF Collection (OTC) and Super OpenType/CFF Collection (Super OTC) can contain the HK Glyphs, but if such, they will not contain the TW Glyphs?

from source-han-sans.

beachmat avatar beachmat commented on June 29, 2024

I believe the default language for "Source Han Sans" (ie not language-specific) is Japanese. Yes it would be good if InDesign distinguished between ZH TW and HK. But I guess there's other software that doesn't make the distinction either.
Something else that still confuses me a little. When I got a missing glyph in a Big5 font with some Taiwanese text the other day, how come HK fonts had that glyph? I thought HKSCS was mostly for Cantonese. Is there overlap between HKSCS and other big 5 extensions, and font vendor extensions, eg Monotype?

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

@hfhchan: The experimental HK fonts will include the full glyph set, but the 'cmap' table will prefer HK forms when available, or the closest glyph for HK use. For experimental purposes, it is premature to define an HK subset. Also, my intention is to eventually support both TW and HK in the same glyph set, which means separate language-specific HK fonts and separate HK font instances in the OTCs.

I plan to file a bug against InDesign in order for it to distinguish the two types of Traditional Chinese via the distinct OpenType language tags: ZHT and ZHH.

@beachmat: Source Han Sans has no default language, at least starting from the Version 1.001 release. The default language depends on which language-specific OTF you use, or which OTC font instance you choose. About your second paragraph above, a screenshot would be helpful, along with details about the other HK fonts and the text you are trying to render. I strongly suspect that PUA code points may be involved. As of Unicode Version 5.2, PUA code points are not required for Hong Kong SCS, though there may be some lingering font implementations out there.

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

@beachmat

Something else that still confuses me a little. When I got a missing glyph in a Big5 font with some Taiwanese text the other day, how come HK fonts had that glyph?

A font that claims to be for Big5 doesn't necessarily include glyphs for all characters in Big5. Also, the Taiwanese text could have actually contained characters in Unicode that are not in Big5. Since commercial HK fonts usually have full coverage of Big5 and HKSCS, it's possible that your word processor or browser chose an HK font as the fallback. The vast number of similar words and cognates separately encoded in Unicode means that a user could have easily typed an unintended character that doesn't exist in Big5 (or the "big5 supporting font").

I thought HKSCS was mostly for Cantonese. Is there overlap between HKSCS and other big 5 extensions, and font vendor extensions, eg Monotype?

HKSCS contains certain characters that are in use in Hong Kong, which happen to include some very common simplified Chinese characters, obscure characters for names, and also invented colloquial characters, which represent sounds in Cantonese.

There are of course overlaps between HKSCS and other Big5 extensions, such as ETen: they may share the same overlapping codepoints, so certain text in Big5-HKSCS may show a different character if it were treated as Big5-ETen. The character 恒 in Big5-Eten would have a different codepoint in Big5-HKSCS. The HKSCS proper contains more characters than ETen proper, but the codepoints of characters are different.

Monotype has its own Big5 extension encompassing 471 user defined characters. However, modern operating systems use Unicode as the underlying basis for text processing, and browsers will convert your text into Unicode (if it is not already) before passing the text to render. Therefore, it is highly unlikely your issue has anything to do with extensions of Big5.

from source-han-sans.

beachmat avatar beachmat commented on June 29, 2024

Thanks for the informative replies. A couple of characters that have come up in Taiwanese text in the last couple of days are U+7740 and U+7ED2.

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

U+7ED2 (绒) is a simplified Chinese character. The traditional Chinese character should be 絨.

U+7740 (着) is both a simplified and traditional Chinese character. Its etymology is a calligraphic variant of 著. The Taiwan MOE regards it as an variant of 著, discourages its use and thus is not included in the Big-5 Character set. Hong Kong and PRC disagree and 着 is included in HKSCS and also in the GB standards.

In PRC and in HK, 著 and 着 are used for distinct meanings:
著 is used in the context of written works, e.g. "author 著者", "work(s) 著作", "famous 著名"
着 is used as preposition, e.g. "looking at 看着", or as a verb, e.g. "catch fire 着火", "apply color 着色", "lay hands on (meaning to start) 着手"

In HK, occasionally, the word 著 is used instead of 着, especially in older digital/digitized text, likely due to the widespread use of Big5 proper. However, the opposite is regarded as a mistake. In PRC, any swapping is regarding as a mistake.

from source-han-sans.

beachmat avatar beachmat commented on June 29, 2024

Thanks. Yes I came across another one which was simplified, so typing errors I guess.

from source-han-sans.

beachmat avatar beachmat commented on June 29, 2024

Is there a reliable way to tell if a font is designed for Hong Kong or Taiwan? Would someone be kind enough to indicate a few characters with the relevant differences?

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

Because the extent to which TW and HK glyph standards are covered by fonts is all over the map, the easiest way to ascertain whether a font was intended for use in TW or HK is to check the Unicode coverage. If there are no or very little Extension A (in the BMP) or Extension B (in Plane 2) code points, then the font is likely to be designed for use in TW, because most TW fonts adhere to Big Five, whose hanzi are all within the URO (except for two that are CJK Compatibility Ideographs). Fonts for HK, which support Hong Kong SCS, include over 500 Extension A characters, along with nearly 2,000 Extension B ones.

from source-han-sans.

orangeparanoid avatar orangeparanoid commented on June 29, 2024

hong_kong_fonts

from source-han-sans.

orangeparanoid avatar orangeparanoid commented on June 29, 2024

I made a chart to show the differences between some of the characters. (Taiwan versus Hong Kong)

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

@beachmat unfortunately no.

Foreword: I will refer to Taiwan and Hong Kong as specific "regions", and call the combination of the two regions as the "Traditional Chinese (font) market". I will also refer to the glyph shape in the Hong Kong Education Bureau's official reference materials for schools as the Hong Kong standard due to its de-facto nature.

First off, "designed for a specific region (Taiwan, Hong Kong)" can consist of very different criteria: coverage of regionally commonly used characters, and adherence to regional standard and/or regional norms. Note: I myself deem strict adherence to a Taiwan standard entails non-suitability for Hong Kong. The folks on the Noto issue tracker seems to disagree.

Second, most commercial fonts for the Traditional Chinese market do not distinguish between the two regions, and/thus do not adhere specifically to any regional standard. The regional norms may deviate slightly due to the education, but the commonly used forms are always recognizable from both regions. Popular commercial fonts usually choose the glyph shape that "just fits", balancing aesthetics, readability, etymology, and traditions at their own will.

Codepoint Coverage
As mentioned by @kenlunde, you can check the code-point coverage. If the code-point coverage covers only Big5 or (especially) Big5E, one can conclude it was designed primarily for the Taiwan market. If the code-point coverage includes characters in HKSCS, one could argue it is designed for the Hong Kong market as well.

Note: Microsoft Jhenghei and Source Han Sans (TW) were designed to adhere to the MOE standard, and they also cover words in HKSCS. My opinion is that these fonts are only as suitable to Hong Kong just as much as PRC / Japanese / Korean fonts would be -- not suitable.

Glyph Shapes
As @orangeparanoid listed, there are numerous differences in the glyph shapes. However, some of these examples may not be suitable to draw conclusions.

If the font follow the glyphs tagged "Taiwan" in all first 7 rows on the left, the last 3 rows on the left, and the first row on the right, the font is very likely to be targeting Taiwan market only. These are glyphical features that have virtually never existed in printed material for the last thousand years, until after they were engineered into the standardized glyphs by Taiwan. These glyph forms are not in widespread use in Hong Kong.

The other rows, however, have little value for drawing conclusions.

First, fonts that use glyphs tagged as "Taiwan" in the other rows are glyphical differences that have existed in fonts dating back at least centuries. Appearance of such forms would not suggest it targets any particular region.

Second, the glyph forms tagged as "Hong Kong" are indeed the forms specified by the Hong Kong standard, but are also used by the PRC and other regions. Most of these forms tagged as "Hong Kong" are strongly similar to the forms that have been used in mainstream print for centuries. These glyph forms are nearly universally used in popular commercial fonts targeting the Traditional Chinese market.

Despite popular commercial fonts likely having a closer resemblance to the Hong Kong forms, the deviation from the Taiwan glyph form is due to the principles in which the regional standards were derived, and does not indicate a preference or strict adherence to the Hong Kong standard. Font vendors choose norms/tradition over standards or etymology in varying degrees: Song (serif) fonts typically follow handwriting norms when concerning the radical 壬: it is rendered with a top slanting stroke similar to the PRC standard, while the Taiwan and HK standard illustrate a horizontal stroke. A popular font, MSungHK, follows traditional printed Song in the radical 呈. It usually changes the bottom component of the character 呈 to have a longer middle stroke than the bottom, but when this character is the component of 鐵, the bottom component is exchanged to 王.

Fonts strictly adhering to the Hong Kong standard are extremely rare. Currently the only commercially available fonts adhering to the Hong Kong standard are those from DynacomWare and have "香港標準" in their font name. To find fonts that have been specifically targeting the Hong Kong standards, one can observe the word 畢、於 and 潛:

For 畢:
There is one big downward stroke in the Hong Kong standard like the PRC, while the Taiwanese representative glyph separate the downward stroke for 田 and the bottom component. The Hong Kong standard also breaks the component just under 田 into two separate "十" instead using one horizontal stroke across, which is not found in any other existing regional standard.

For 於:
The top right hand side should be similar to the top of 旗, instead of consisting of 人 like as in the PRC / Taiwan standardized glyphs.

For 潛:
You can check for a protrusion on the top right hand component. In pre-2007 version of the Hong Kong standard, the downward left stroke overlaps the top horizontal stroke, instead of touching it as seen in any other regions. This feature is not present in the current industry guideline[*], and has been removed from the Hong Kong standard since version 2007. However, the currently available commercially available fonts haven't been updated (yet).

The last stroke of the top right hand component of 潛 can also be compared:
HK Version: http://pic.zdic.net/song/hk/100/1d/6F5B.gif
TW Version: http://pic.zdic.net/song/tw/100/1d/6F5B.gif

However, it is unknown if all of these distinguishing features are here to stay: these features diverge from traditional/modern print and/or handwriting to different lengths. The Hong Kong government is currently revising the standard glyph forms with the two main font vendors DynacomWare and Monotype. The rarer and awkward glyphical features could potentially be purged.

[*] The HKSAR OGCIO also provides a set of guidelines of glyph shapes for the font industry. However, various glyph shapes departed from the standard by the Hong Kong Education Bureau, and the specification does not set out any criteria for conformance. As such the guideline has only ever been used by the government, when they compiled a Unicode font to cover the characters in HKSCS for reference.

from source-han-sans.

orangeparanoid avatar orangeparanoid commented on June 29, 2024

I have checked both publicly available websites:

國字標準字體楷書母稿
http://www.edu.tw/files/site_content/M0001/mu/c5.htm?open
http://www.edu.tw/files/site_content/M0001/mu/mua.htm?open

香港小學學習字詞表
http://www.edbchinese.hk/lexlist_ch/

My chart to show the differences between the characters are correct. I know that it is hard to see a commercial font following a Hong Kong standard. If Source Han Sans HK follows the Education Bureau's 香港小學學習字詞表, I know that the primary school teachers and students will benefit.

For Source Han Sans, I do not know what "standard" or "convention" will be adopted at last. There appears to be no information as far as I know.

from source-han-sans.

beachmat avatar beachmat commented on June 29, 2024

Thanks both for your informative replies. Very helpful. Orangeparanoid, if you're able to provide those characters as live text, that would be useful, but obviously don't spend too much time on it!

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

Issue #23 is consolidated here.

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

At this point, we're targeting the next update to be Version 2.000, which will make somewhat extensive adjustments to the glyph set, such as a greater degree of glyph-sharing across languages (particularly between JP and CN), which is intended to free up CIDs with which to address issues with existing TW glyphs that require glyphs to be added, along with addressing the HK issue that also requires glyphs to be added.

In other words, I am no longer planning to deploy experimental HK fonts, but instead to target the Version 2.000 update deploying the actual HK fonts and font instances, thus adding a fifth language.

from source-han-sans.

RyanChng avatar RyanChng commented on June 29, 2024

That would be good! By the way @kenlunde when is the target release date for 2.000?

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

The earliest would be closer to the end of this year. A lot of planning is involved.

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

In order to guard against stale URLs, I am making the Hong Kong Glyph Guidelines PDF file available here.

from source-han-sans.

orangeparanoid avatar orangeparanoid commented on June 29, 2024

@kenlunde As in http://blogs.adobe.com/CCJKType/2015/08/irg44.html
, is there any plan to follow:
IRG N2074:
http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg44/IRGN2074.pdf
?

I reviewed some HKSCS characters and found Simplified Chinese characters (rarely / never used in Hong Kong newspaper, e.g. 见 HKSCS:8BE5. 見 is the usual form in Hong Kong). Will such Simplified Chinese characters be included in Source Han Sans HK? Both 见 and 見 in the same HK font?

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

@orangeparanoid: As stated in that blog article, One reason for my interest is that I plan to support HKCS 2015 in the Source Han Sans Version 2.000 glyph set, which effectively means that Version 2.000 development is effectively on hold until HKCS 2015 has been finalized.

One of the problems with HKSCS is that it implicitly included Big Five as part of its scope. The main problem with that was that there was no way to specify the HK form of a character that is included in Big Five. That will change with HKCS. Also, the fact that 见 is included in HKSCS (and likely in HKCS) doesn't mean that 見 will be excluded, because it is include in Big Five.

from source-han-sans.

RyanChng avatar RyanChng commented on June 29, 2024

Just wondering, to what extent would Pingfang HK be useful as a design reference? (Given that we have no other Heiti style reference for HK as far as I know of.)

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

@RyanChng: I barely trust standards (because I have found plenty of errors in both regional and international standards), which necessarily serve as primary references, and I have even less trust in fonts. With that being said, PingFang HK may serve as a point of comparison, but will not be used as a reference per se.

from source-han-sans.

RyanChng avatar RyanChng commented on June 29, 2024

@kenlunde: I see. What do you intend to use as a reference? The Songti and Kaiti standards, perhaps?

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

I am planning to use the forthcoming HKCS 2015 standard as the primary reference.

from source-han-sans.

RyanChng avatar RyanChng commented on June 29, 2024

I see. Do keep us posted if possible! Thank you =)

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

That is the plan. 👍

from source-han-sans.

beachmat avatar beachmat commented on June 29, 2024

But will there still be an interim release before version 2 for the HKSCS as it stands?

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

@beachmat: No, because doing so represents a huge amount of work that would effectively need to be redone after all of the HKCS 2015 details are available.

from source-han-sans.

beachmat avatar beachmat commented on June 29, 2024

Thanks, and sorry, you made that clear earlier in the thread.

from source-han-sans.

orangeparanoid avatar orangeparanoid commented on June 29, 2024

@kenlunde,

I appreciate your work on designing the HK font. (hopefully to be released in 2016 or 2017)

I recommend that someone / some people from Hong Kong should be hired to test if the HK font really works as expected, by actually viewing traditional Chinese characters on the screen.

As Adobe works with a Mainland China font company, Mainland Chinese citizens don't use the HK font / traditional Chinese characters. Traditional Chinese characters are banned, except in Hong Kong and Macau.

Mainland Chinese citizens literally are not able to tell if traditional Chinese characters are odd, strange or natural. These people simply use the simplified Chinese characters instead.

When Adobe asks Mainland Chinese citizens to test the HK font, chances are that some strange results can happen. Odd problems are just undetected.

Ideally, revising the same font less frequently can bring some convenience to the users. (even if the first HK font release is postponed)

from source-han-sans.

extc avatar extc commented on June 29, 2024

Agree. Once the font is incorporated into Android N, it is difficult to change as the system font folder is marked read-only in ROM. One need to root the system and overwrite it.

from source-han-sans.

orangeparanoid avatar orangeparanoid commented on June 29, 2024

I just thought of more horizontal extensions for consideration.

horizontal_extensions

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

@orangeparanoid: You may not completely understand the meaning of horizontal extension, because the only characters in your list that would qualify, because they do not yet have an H-source reference, are U+7CA4 粤 and U+865A 虚.

For the ones that share the same code point but have different glyphs for HK and CN, it would require disunification, meaning a separate code point for one of the characters in the glyph pair, to simultaneously support both glyphs for HK.

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

@orangeparanoid rest be assured, the "HK preferred" glyphs and code-points you have quoted are included in HKSCS 2016.

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

@kenlunde Rather than support the full of HKSCS, I have recently compiled a corpus of Hanzi in use in Hong Kong, based on a one year scrape of news articles (local news, sports, entertainment & commentary) from on.cc, the most popular news site in Hong Kong. I wonder if you are interested in it?

The complete corpus covers 8,648 distinct characters (including latin, kana, zhuyin and emoji). 447 characters are identified as typos (盬鹽, 恴意), simplified characters (铜銅), uncommon variants (俢修), or (true) source-separated equivalents (况況).

On random sampling, I deem the quality more representative and practical than the one Google compiled for their Google Fonts.

from source-han-sans.

orangeparanoid avatar orangeparanoid commented on June 29, 2024

@kenlunde
My correction: The HK preferred 黃 (U+9EC3):
horizontal_extensions_corr

For U+9EC4, if the user changes the font from CN to HK, the user still sees U+9EC4, instead of U+9EC3. I guess, this is the expected behaviour of the font and the expected odd situation. Since these characters do not belong to the horizontal extension, there is no way to get back the HK preferred 黃 (U+9EC3).

Should (or could or will) the font be responsible for handling this situation? Should (or could or will) Unicode handle it? I am confused.

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

@orangeparanoid The font is generally not expected to handle this situation.

For U+9EC4, if the user changes the font from CN to HK, the user still sees U+9EC4, instead of U+9EC3. I guess, this is the expected behaviour of the font and the expected odd situation.

Yes, you are correct. The user / system should run simp <=> trad conversion routine to change U+9EC4 into the U+9EC3 before rendering using the fonts. (Even though U+9EC4 and U+9EC3 are not strictly simplified vs traditional character according to mainland Chinese standards, they are often treated as such in Taiwan and Hong Kong and other parts of the world.)

Since these characters do not belong to the horizontal extension, there is no way to get back the HK preferred 黃 (U+9EC3).

You misunderstand what horizontal extension means. Horizontal Extension means that the regional standards determine that an existing glyph or character in Unicode is deemed useful / used in that locale. Thus, they add that existing glyph or character in Unicode into their own regional standard. Horizontal Extension only cues font implementors that a certain character / glyph is useful in a particular locale. Horizontal Extension does not change the rendering of the character in any way.

HKSCS already includes U+9EC3. If HKSCS horizontally extends to U+9EC4, U+9EC4 will still be displayed with the missing stroke. U+9EC4 will not be converted or display as U+9EC3 or vice versa.

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

@orangeparanoid in reality, some fonts may choose to display U+9EC3 and U+9EC4 as 黃. These fonts are built to target printing in Taiwan/Hong Kong, where the existence of U+9EC4 instead of U+9EC3 is not desired and regarded as an error. If 黄 is required, the typesetter simply changes the font.

However, Source Han Sans is a general font for screen viewing and as system default font. User cannot easily change system font. Discerning such difference may be useful or required in certain circumstances. Therefore, Source Han Sans should and will render U+9EC3 and U+9EC4 differently, same as SimSun (default font for Windows, zh-CN) and PMingLiU (default font for Windows, zh-TW/zh-HK).

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

@hfhchan: Actually, a horizontal extension can change the rendering of a character, at least for the region for which the horizontal extension is applied. Of course, the representative glyph that is associated with the horizontal extension must be unifiable with the character itself, and supplying representative glyphs is one of the requirements for horizontal extensions.

from source-han-sans.

orangeparanoid avatar orangeparanoid commented on June 29, 2024

The life of initially happy users of the Source Han Sans HK font:

Use case 1:
A nine-year-old kid said, "Miss Wong, I saw 黄 (with the top of 共)instead of 黃 (with the top 廿一) on the mobile phone. You marked my 黄 as the wrong word in the dictation. It is unfair. I shall tell my Dad."
  Miss Wong said, "Don't learn Chinese from the computer guys."
  The kid's Dad wrote a letter to the school principal to complain about the poor teaching standard of the Chinese teacher the following day.
  Two months later, the teacher got fired.

Use case 2:
  An elderly woman applied for a job. She wrote 黄 (with the top of 共)according to printed information on the job ad she saw on the mobile phone. The boss of the company saw this word and commented, "Our company won't hire careless people." This boss clearly expected 黃 (with the top 廿一) .
  This elderly woman could not get her job and remained unemployed.

I am sorry that the users relied on the font so much that using the character can bring some consequences. The users in both cases hated the font and the mobile phone so much.

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

@orangeparanoid well these hypothetical problems would also occur for Taiwanese user or Japanese user too. Both locales prefer 黃.

As a Hong Kong user I can tell you this is not a problem. All popular traditional Chinese keyboard on Android and iOS output 黃 (u+9ec3) instead of 黄 (u+9ec4).

How u+9ec4 displays is not a concern of average Hong Kong, Taiwanese or Japanese user.

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

@orangeparanoid

你所提出的假设情況基本上不会发生。首先,港台日都是使用“黃”而非“黄”,其他地方都相安无事,何以认为香港用家会有这些问题呢。

第二,身为香港人,我可以肯定告訴您,香港使用的输入法均會输出 U+9ec3 黃,而非 u+9ec4 黄。

U+9ec4 是怎么样字,港台日的人才不会理。

再者,在香港教育局出版的教材里,黄是简化字,不能算错字。老師被投诉,活该。老師被炒,这故事太夸张了呗。

from source-han-sans.

orangeparanoid avatar orangeparanoid commented on June 29, 2024

My thoughts are:

(1) Responsibility sharing
Should the font/ Unicode/ horizontal extension handle the situation when one glyph is preferred? If not, the users (including adults and hundreds and thousands of primary/ secondary school kids) should be solely responsible for distinguishing the characters. Okay, don't learn Chinese from computer guys. That's fine. (Never blame any font developer.)

(2a) Consider possible/ potential consequences in theory and in reality
Should Information Technology help to reduce or solve the problem? What is the purpose of the font Source Han Sans HK? For adults only? Not for school kids? School kids use mobile phones too (sadly). You cannot prevent Use Case 1 from happening. (regardless of whether you think it is possible or not)

The mobile phone's font seemingly becomes the authoritative source of reference. What will happen? Less educated people use mobile phones too. You cannot prevent Use Case 2 from happening. (regardless of whether you think it is possible or not)

(2b) How users input the characters
In practice, sadly, it is possible. Here's how:

Not everybody uses Cangjie. Some people use Pinyin for typing Chinese.

"Teaching Chinese in Putonghua (Pinyin)" is the somewhat mandatory practice in both primary and secondary schools, as reported in the news. (Please check with individual schools.)

When using "Chinese (Simplified) - Microsoft Pinyin New Experience Input Style" on Windows, typing the Pinyin 'huang' gives the Mainland China preferred character U+9EC4.

Now, with the voice input, it is also possible to get U+9EC4.

It is theoretically possible for individual schools to accept the Mainland China preferred character U+9EC4. In practice, please check with individual primary/ secondary schools.

(3) Confusion in official documents I can find
The Mainland China preferred character U+9EC4 does exist in files in Hong Kong Legco: (An example to show that confusion can happen.)
Example:
立法會交通事務委員會 (Legco Panel on Transport)
檢討沒有遵從交通燈指示的罪行
http://www.legco.gov.hk/yr06-07/chinese/panels/tp/papers/tp0323cb1-1190-1-c.pdf
[Screenshot]
legco_doc

(4) New Shape of the Character (not just simplified characters)
I know that U+9EC4 is the 新字形, invented by some Mainland Chinese citizens (Mmmmm, with no obvious speed improvement when a person reads the character) .

(I am sorry that I don't have the names of the inventors available).

U+9EC4 is not a "simplified character" in the official China sense but a "China'ized" traditional Chinese character (新字形 or New Shape of the Character). U+9EC4 is, however, Mainland China preferred.

To add more confusion, not all characters in 新字形 or New Shape of the Character are different from the HK convention. Some are even the same as characters in the HK convention. (That's why I once said hiring Mainland China citizens to make the HK font can create odd problems, I am afraid. 新字形 is another source of confusion.)

(5) How to check
To see the differences in practice, please refer to Xian Dai Han Yu Ci Dian (现代汉语词典 ISBN 7-100-03477-9 which is a dictionary partly or fully edited by the ruling Communist Party) for more information.

Then, get a dictionary in Hong Kong, e.g. 朗文中學生中文新詞典 ISBN 962 00 0022 6 (The shapes of characters are printed based on 香港語文教育學院's works). Compare and contrast. (I don't know if it is illegal to bring 朗文中學生中文新詞典 to Mainland China.)

In Source Han Sans HK, add the horizontal extension of U+9EC4 and U+9EC3? (Is it justifiable enough for this to happen?) The examples I quoted also include 澳、奧.

(6) So much work
I think, with so much work, the release date of the HK font should possibly be in 2018 or later, to preserve the quality of the HK font, in my opinion.

Thanks for understanding.

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

(1) absolutely not. the person inputting the character should ensure he/she is inputting the character. It would create more confusion of source han sans displays U+9EC4 as U+9EC3 in the Hong Kong context. If I open the same document on Microsoft Windows, U+9EC4 would change to use the mainland-preferred glyph because Microsoft Windows does not come with Source Han Sans! The integrity of the shape of a particular "code-point" across operating systems and regions is more important than the "correctness" from a "character" standpoint. There is no point in Source Han Sans changing the glyph if other fonts show something different.

The user cannot check if he/she is inputting the right character if the two codepoint display the same too.

(2) Source Han Sans is a general purpose font. It should not attempt to "correct" any variants. Such "corrective" fonts should be, and have always been, reserved for digital printing or amateur "抗眼殘" use only.

If you correct 黃 to 黄, then should it correct 裡 to 裏? It quickly becomes a slippery slope once you start "correct" characters.

(3) That's the problem of our government. Most people don't care anyway.

(4) 黄 with a missing stroke is not a new invention. In calligraphy, 黄 is more common than 黃. The actual terminology for the character shape is not important -- anyhow, the teacher should explain it is not INCORRECT. From the education system perspective, it is a simplified character. Furthermore, HKEAA accepts simplified characters in its exams. If the teacher bans simplified characters, (out of some political ideology), it is his/her problem.

(5) There is no need. These differences have been fully reflected in HKSCS. The CLIAC has already decided that no two code-points should display the same in the HKSCS, including all horizontal extensions. Therefore, even if 澳、奧 was horizontally extended on the part of the CLIAC, the mainland-preferred glyph should be kept.

Another example is 悅 (U+6085) and 悦 (U+60A6). The CLIAC has already decided that the difference between 悅 and 悦 should be preserved in the upcoming HKSCS-2015. Previously, the Guidelines on Character Glyphs for Chinese Computer Systems had requested otherwise, but only for characters including 兌/兑. It was deemed an oversight and has since been corrected.

It is the responsibility for the person inputting to check if he/she is inputting the right character.

When using "Chinese (Simplified) - Microsoft Pinyin New Experience Input Style" on Windows, typing the Pinyin 'huang' gives the Mainland China preferred character U+9EC4.

He/she should use "Chinese (Traditional, Hong Kong) - Microsoft Bopomofo" and choose Hanyu Pinyin instead of Zhuyin Fuhao in the settings. Or, he/she can use pinyinput.com which is made by Hong Kong people.

Now, with the voice input, it is also possible to get U+9EC4.

That is the problem of the speech recognition software. You should rightfully file a complaint to the product. That is a gross negligence to the customs and preferences of Hong Kong.

"Teaching Chinese in Putonghua (Pinyin)" is the somewhat mandatory practice in both primary and secondary schools, as reported in the news. (Please check with individual schools.)

According to the EDB, schools are "strongly recommended" to use Traditional Chinese characters. As said, on the part of the EDB, 黄 is a simplified character. The EDB officially refuses to examine (審核) any education material that is written in simplified characters -- that means it will not be included in Recommended Textbook List (適用書目表), which also means schools may be held responsible if they use textbooks with improper material. Anyhow, 黃 and 黄 are both accepted in the public exams -- so this is not even a problem.

I am now proposing an OpenType flag through other channels, which asks for the unification of different character forms activated by necessity to replace glyphs by their preferred variants. This is a similar solution used by Japan to preserve their different glyph forms specified in different versions of JIS.

from source-han-sans.

orangeparanoid avatar orangeparanoid commented on June 29, 2024

@hfhchan, actually not everybody understands my points.

I am writing from a user experience perspective (Not necessarily a political perspective. Beware.). Every time people think of differences between HK and Mainland China, people immediately think of political issues. (Oh, not in my points. I don't say every Communist Party product is bad.)

I would like to talk about the user experience and the HK conventions. (No particular political implication. Users just don't use some characters in some ways.)

I am not insisting on doing/ changing/ correcting some things. Not that something must happen to suit the taste of users. Yes, users can be ignored entirely. No problem.

Just make the font developers happy. It is entirely okay. Nothing wrong of the developers.

(I am not starting a flame war. I don't start one.)

I am suggesting the social cost involved. (If the HK font possibly saved some social cost...)

There are 337 558 primary school students in 2015/ 2016. (Source:
http://www.edb.gov.hk/en/about-edb/publications-stat/figures/pri.html
)

Let's assume (without any questionnaires for evidence) 10 per cent of these students asked the teachers about the legitimacy of (hand)writing according to the mobile phone's font. There would be about 33755 students asking this type of questions. Let's say each of them spent 1 minute asking this type of questions. 33755 minutes of time would be spent on asking. Let's say the teacher spent 2 minutes explaining the issue to each of them individually. The total time spent would be 33755 + (33755 * 2) minutes. Let's be positive. 30 students were in one class. 33755/ 30 = about 1126 classes. These classes were just 10 per cent of the entire primary school student enrollment. Let's say each class asked once only.

The total time spent on asking and answering would be 1126 + (1126 * 2) = 3378 minutes.

Then, assume the teacher had to respond to allegations of poor teaching standard and had to spend 15 minutes responding to parents. Out of 1126 classes, let's say 500 parents complained about the issue of mobile phone's font not being accepted as the correct answer. They complained and spent three minutes each time. The amount of complaint time spent would be 500 * 3 = 1500 minutes. The amount of time responding to the complaints would be 500 * 15 = 7500 minutes. The entire complaint process would require 1500 + 7500 = 9000 minutes.

If the initial process of teaching, asking and answering was done pretty well, the complaint time would be saved. The total time spent would be hopefully only 3378 minutes. (Just >56 hours)

If the initial process was not done well (this attracted complaints), the total time spent would be 3378 + 9000 = 12378 minutes. (>206 hours)

If you think this time spent was unavoidable...

In practice, for HK primary school teachers, will they accept the Mainland China preferred U+9EC4? You could conduct a survey.

I mentioned earlier. Mobile phone users can be entirely ignored. Don't care about wasting people's time. Nothing wrong of the developers. Just for your information, the font can cost some social cost.

I am not an expert in computing. I can do the maths. In the calculation, I did not count the time spent in non-primary-school industries. The time spent (wasted if you like) would be much larger than 206 hours.

If you think my calculation is faulty, I would love to learn more and improve myself. I like learning and improving.

Thanks for the time spent if you read.

from source-han-sans.

tamcy avatar tamcy commented on June 29, 2024

As I am really no expert on this issue, I am not sure if I understand @orangeparanoid correctly.

IIUC, "Horizontal extension" is used to tag a codepoint in Unicode that is deemed useful in a particular region, but is lacking the corresponding source reference (kIRG_HSource for HK) due to some reasons.

For example, "兌" (U+514C) is the preferred form in Taiwan, while "兑" (U+5151) is the preferred form in Hong Kong. As "兌" is encoded in Big5 (codepoint: A749) but "兑" isn't, "兌" (U+514C) is supplied with a corresponding kIRG_HSource, while "兑" has none, despite it being the actual preferred form in HK. This can lead to misunderstanding that "兑" is not used in HK. With the release of the upcoming Hong Kong Character Set 201x, "兑" will come with a kIRG_HSource to identify that this character is in fact used by HK, so font product targeting HK should support it.

In this sense,
a. "Horizontal extension" is merely a mechanism to add source information to an existing codepoint. It doesn't define any relationship with another codepoint whatsoever. It also doesn't mean to map the actual presentation (glyph) of codepoint B to codepoint A.
b. As 黃 (U+9ec3) is already the preferred form in HK with a kIRG_HSource, I don't understand why @orangeparanoid would want 黄 (U+9ec4) to be added via "horizontal extension". As said, "horizontal extension" is used to mark a codepoint as useful in a particular region, and what he suggested ("黃" should be shown even user enters "黄" because "黃" is the preferred form) clearly contradicts it.
c. "Horizontal extension" is not established by the font product. This means that his "horizontal extension" suggestion should actually be raised to CLIAC, not here.
d. As mentioned in this document, modifying the glyph form of the character mapped in Big5 (e.g. requireing 兌 U+514C to be rendered the same as 兑 U+5151) would result in violation of the encoding principal of ISO/IEC10646, causing inconsistency in documents produced at different times. I believe the same would certainly happen if Source Han Sans HK changes the glyph of 黄 U+9EC4 to the same as 黃 U+9EC3 as you suggested. For instance, user will no longer view @orangeparanoid's (and my) issue concerning 黃 and 黄 properly with this hypothetical Source Han Sans HK.

(And no, students are not supposed to learn Chinese writing via mobile phone, which UI probably isn't presented in Regular Script 楷書.)

from source-han-sans.

orangeparanoid avatar orangeparanoid commented on June 29, 2024

I am not sure if it is appropriate to provide the opinion here.

The MingLiU font on Windows has indeed caused some social costs to users. Okay, primary school teachers print notes, exercises, and test/ exam papers. Some teachers try their best to avoid MingLiU. If unavoidable, teachers print and change the character by pen, before photocopying.

MingLiU font:
https://www.microsoft.com/typography/fonts/family.aspx?FID=245

Example, 等:The top part underneath 竹 being 士, not 土

I am sure that Source Han Sans HK will make it 土, according to the HK conventions. Don't think MingLiU is the best font users expect when handling documents.

Let's assume, it takes 1 minute to change the character by pen. There are 10 exercises each subject each month. There are 4 subjects. The time spent is 1 * 10 *4 = 40 minutes per month. There are 6 levels from Primary 1 to Primary 6. For one school, the time spent is 40 *6 = 240 minutes per month.

There are 572 primary schools in Hong Kong. When all teachers do the same, the time spent per month is 240 * 572 = 137280 minutes per month. Do this for eight months. The time spent is 137280 * 8 = 1098240
minutes. (This is 18304 hours. Amazing.)

Back to the horizontal extensions issue, if this feature does not help in the Use Cases I suggested, I have no more comments.

I can be wrong. I can be misunderstanding the usefulness of horizontal extensions in the Use Cases.

Thanks for the any helpful advice and for the time spent.

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

I am not insisting on doing/ changing/ correcting some things. Not that something must happen to suit the taste of users. Yes, users can be ignored entirely. No problem.

Just make the font developers happy. It is entirely okay. Nothing wrong of the developers.

@orangeparanoid 您好似將我反對將兩個碼位區分字形,簡單化為造字廠商懶惰。對不起,我可能之前語氣不好。

我覺得您所提出的假設不會發生,也未曾發生。就算發生,也是教育體制的問題,不是字體的過失。可能就咁誤導左您,令你覺得係造字廠商懶惰不肯行多一步解決問題。

但我想澄清幾點。

據我了解,你提議無論是 U+9EC3 和 U+9EC4 都顯示為黃?

我反對的原因如下:
(1)當年製作ISO/IEC 10646的時候(1999年),中日韓台一致決定現時20292個字符。換句話說,亦即係當年覺得有區分的需要。至少喺Unicode嘅層面,佢地係繁簡關係 (國際碼將Big-5 與GBK所使用字符差異直接視為繁簡關係)。U+9EC3 和 U+9EC4 視為不同的字,能夠分開輸入,這個是編碼層次的問題,不是字體層面的問題。中日韓台將四區文字編在一區,這個是歷史事實。有人主張應該將所有不同的寫法統一使用同一個碼位,有人主張中日韓台四區的字無論多像都應該獲分配不同的碼位。最後的結果就是一些字分開編碼,一些字統一編碼。已成定局,唯今之計就是輸入時要多啲留意。

(2)台日兩區都是以黃為標準。他們也沒有人要求將 U+9EC3 和 U+9EC4 都顯示為黃。您所形容的問題,台日兩區理論上都會發生。

(3)假設只有 Source Han Sans HK 將 U+9EC3 和 U+9EC4 都顯示為黃,您整嘅文件傳送到另一部無 Source Han Sans HK 字體嘅電腦時,如果您錯誤輸入U+9EC4 ,對方見到嘅就會係黄,而唔係黃。如果將兩個字設計成同一個樣,會令到你無法確保你打啱字。字符顯示唔一致,會造成混亂,大幅增加校對所需時間與專業知識。

(4)Word裏的搜尋功能針對「字符」而非「字」,如果搜尋框內輸入 U+9EC4 ,是無法找到U+9EC3 的。如果 U+9EC3 和 U+9EC4 望落去一樣樣,恐怕會引起混亂。

(5)@tamcy 都幫我講埋,Horizontal Extension 嘅事,唔係話 Adobe 想點就點,香港負責向 ISO/IEC 10646 提出 Horizontal Extension 嘅代表團就係香港特別行政區政府政府資訊科技總監辦公室中文界面諮詢委員會。如果您對Horizontal Extension 有咩提議,向港府提出比較適當。

(6)兩個字符設計成同一個樣會違反 ISO/IEC 10646 嘅編碼原則。即使中文界面諮詢委員會決定Horizontally Extend 將 U+9EC4 編入港區使用,都會保存 U+9EC4 從三筆草花頭嘅字形。

(7)如果中文界面諮詢委員會將 U+9EC4 標為港區使用,或會使人以為 U+9EC4 在本港一般行文可以接受。

(8)一般用途嘅字體嘅作用,就係要區分唔同嘅「字符」。印刷用嘅字體嘅作用,就係要區分唔同嘅「字」。如果您想不論編碼都顯示「正確」的字形,應當採用針對印刷用嘅字體。

(9)如果真的要想學生解釋,就解釋話黄係異體字,亦係簡化字。呢個係教育工作者嘅責任。要知道,電腦的中文系統不是為港人而設,而是為了中日韓台港澳越的需要而設,中間必有取捨。

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

@orangeparanoid 我反而想提議,應當關注一下中文界面諮詢委員會的參考字形、教育局小學生字詞表裏的字形與香港通行的印刷宋體(包括蒙納宋、華康儷宋)字形有不同大大小小的差異。基本上中文界面諮詢委員會的參考字形、教育局小學生字詞表裏的字形都是閉門造車。蒙納宋、華康儷宋字形是香港設計師80年代針對香港通行寫法而製作的,香港人習慣使用丢而不用丟,用兹不用茲(滋右邊),插字右邊用千不用干,叟字一筆棟穿,老字帶鉤,都能在字體裏充分反映,此等寫法卻在兩官方參考中標為「異體」。

中文書寫有其變化規則,不能兒戲的說,老字下方的匕在甲骨文是拐杖的形體,所以不帶鉤。根本是穿鑿附會,拐杖會彎?
流右下是川的變體,所以不鉤。鉤與不鉤會影響字義咩?上至隋朝都是帶鉤的。

有相同寫法的只有台灣教育部標準字形。世事無咁多巧合,查實當年推出常用字字形表,係應台灣教育部標準字形而製作的。中文界面諮詢委員會的參考字形亦係用華康的細明體微調而成。

求字、也字、鹿下比、蠶上旡、選上巳、旨上匕等字宋體楷體不鉤,也是無中生有、莫名其妙的寫法。

香港細路寫字越來越樣衰,這個基本上是公認的事實。唔想一些平衡美感盡失的楷書與宋體殘害香港人下一代的審美觀,我想需要更多人向政府提出建議,還原楷書宋體應有的筆法。

from source-han-sans.

orangeparanoid avatar orangeparanoid commented on June 29, 2024

I would say based on what I understand here, the Chinese characters issues will be endless because of the laissez-faire policy previously adopted for many years (not thinking more about education, etc.). Existing usage is not changeable unless things break. Social costs will remain social costs: Users themselves will always be solely responsible for using the Chinese characters/ Unicode/ font/ Chinese input methods.

Sorry for bringing the issues in the wrong place. Social costs will remain social costs and the costs will just exist. It's better not to break things.

(If a genius of Chinese characters+ Unicode+ font+ Chinese input methods+human language education+not breaking things ever proposed a brilliant solution...)

For me, better do something else for saving the characters I want to use.

from source-han-sans.

tamcy avatar tamcy commented on June 29, 2024

@orangeparanoid I can't speak for the admin, but I think it is completely OK to propose 黄 U+9EC4 to look the same as 黃 U+9EC3 (whether it will get accepted is another question). But "horizontal extension" is probably not what you want, and this leads to confusion.

We are not living in a perfect world. Variants do exist. Just pay a visit to Wong Tai Sin and you will find different variants of 黃:

wts

So even your "social cost math" stands, the cost isn't solely contributed by the unreleased Source Han Sans HK. Student will probably ask about why the character they saw on newspaper / publications / television programmes / road signs / buildings isn't exactly the same as what they learnt anyway. And I am quite sure that this won't be "fixed" by merely mapping the glyph of 黃 to 黄 in SHS-HK.

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

Thanks to all for the recent discussions. One of the reasons for the delay in deploying HK versions of this typeface design is to make sure that it is done right. I considered remapping existing glyphs to get part of the way there, but I concluded that deploying a full solution, though at a later date, was an overall better solution and less confusing.

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

Hong Kong SCS-2016 was published this month, and while it includes representative glyph code charts for HKSCS-2016 proper, representative glyph code charts for the Big Five subset are forthcoming. These materials will serve as the basis for the HK fonts and font instances that will be deployed as part of the Version 2.000 update.

from source-han-sans.

hfhchan avatar hfhchan commented on June 29, 2024

Note: The government has promised to clarify that the new glyphs are "reference glyphs" and not intended to be "prescriptive glyphs" via written correspondence. Factually speaking, there is nothing "representative" about the glyphs in the charts, but anyways.

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

@hfhchan: Of course, but they certainly serve as a guide for font developers. As I have stated before, if one follows representative glyphs too closely, the end result will be a clone of the typeface that was used for the representative glyphs.

from source-han-sans.

c933103 avatar c933103 commented on June 29, 2024

https://www.ogcio.gov.hk/en/business/tech_promotion/ccli/cliac/reference_glyphs.htm Information linked from this webpage, Reference Glyphs for Chinese Computer Systems in Hong Kong, have also been updated.

from source-han-sans.

kenlunde avatar kenlunde commented on June 29, 2024

@c933103: Yes, I am aware of this, and used it last month to go through the Source Han Sans glyphs to formulate a plan for proper HK support.

from source-han-sans.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.