hfhchan / irg-ws2015 Goto Github PK

View Code? Open in Web Editor NEW

6.0 4.0 0.0 1.38 GB

PHP 100.00%

irg-ws2015's People

Contributors

Stargazers

Watchers

irg-ws2015's Issues

00765 UTC-01217, 00777 UTC-01219

00765 UTC-01217

00777 UTC-01219

Henry’s Comment:
00765 UTC-01217 UNIFY WITH 00777 UTC-01219 (keep 00777).

UTC’s Comment:
IDSes are ⿰土⿱𥃭木 and ⿰土⿱直木. The UTC does not agree.

Henry’s Additional Comments:
The sources provided by UTC are as follows:

Given the evidence provided by the UTC, it is rather obvious from the context that they should be referring to the same person, and thus be the same character.

A Wikipedia article shows that according to 《明實錄》, the name of the person is 志㙞. If this can be confirmed by any expert familiar with the members of the Ming dynasty imperial family, then the two corrupted forms (UTC-01217, UTC-01219) should be unified to 㙞 (U+365E) (via IVD).

《明實錄》 can be found on **研究所歷史語言研究所明實錄、朝鮮王朝實錄、清實錄資料庫:

Action Item
Postpone or Withdraw

00069 UTC-02765

Henry's Comment
= 𠘧 (U+20627) (variant, protrousion of strokes.)

UK's Comment
Disagree. Non-cognate, and stroke variation is significant.

Henry's Comments
Below is UK's evidence of 00069:

Below is HYDZD evidence of 𠘧 (U+20627):

First, the pronunciation are the same.
Second, the second meaning of 00069 is the same as the strict meaning of U+20627.
Third, the first meaning of 00069 is used as a grammatical suffix to show extent. Very likely, it is a character borrowed for its sound.
Lastly, the top left part shape of 00069 matches the Shuowen shape of U+20627. 00069 is likely simply another transcription of the same character.

It is likely they are the same character.

Action Item
Unify or Postpone.

02315 T13-2D5B

The left hand side is not Claw 爪 but 心 (忄). 悑 (U+6091) is already encoded.

According to Kangxi, 悑 is the variant of 怖, which is synonymous with 懼.

Action Item:
Unify/IVD to U+6091.

Unification of 夸 and 𡗢

𡗢 is a common historical variant form of 夸.

Existing Disunified Strict Semantic Variants:
夸 U+5938 = 𡗢 U+215E2
洿 U+6D3F = 𣴰 U+23D30
誇 U+8A87 = 𧧳 U+279F3
跨 U+8DE8 = 𨀗 U+28017
䠸 U+4838 = 𨉀 U+28240
鮬 U+9BAC = 𩶮 U+29DAE

Affected Characters:

00375 USAT09219

Action Item: Unify (IVD) to 刳 U+5233
01648 USAT06769

Action Item: Unify (IVD) to 胯 U+80EF
05026 USAT09817

Action Item: Unify (IVD) to 骻 U+9ABB
03426 USAT06768

Action Item: Unify (IVD) to 胯 U+80EF (ref: issue #23)

03798 UTC-01941

Henry's Comment
03798 UTC-01941 UNIFY to 褝 (U+891D)

UTC, HK Comment
Unify with U+891D 褝

UK's Comment
Disagree. We think that the unification of the components 単单 for U+7985 禅 was a mistake, and causes problems for users when a default font shows an unacceptable glyph form. The G-source glyph for U+891D has 単 on the right, so font developers will follow this glyph form when designing fonts for PRC, but the G-source glyph form for U+891D is unacceptable as the simplified form of U+891D 襌.
Therefore we strongly think it will serve users best to encode UTC-01941 (⿰衤单) as a separate character.

Henry's Comment
The above problem will occur for U+20219 𠈙 and U+2548E 𥒎 also, even though they occur in Extension B. However, China has agreed to correct multiple erroneous glyphs in GE standard in IRGN 2170 (involving U+8669 and U+3B9D). Therefore, the rejection to correct U+891D in line with China's normalized glyphs should not be accepted.

Action Item
Unify

Multiple Withdrawn Characters/Glyphs

Multiple Characters/Glyphs were withdrawn in WS 2015 v2 IRGN2155 UK Review, but were not reflected in the Working Set:

00123 UTC-01423: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00130 UTC-01318: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00138 UTC-01391: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00296 UTC-01326: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00345 UTC-01329: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00348 UTC-01330: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00475 UTC-01337: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00524 UTC-01421: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00542 UTC-01338: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00560 UTC-01339: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00561 UTC-01369: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00662 UTC-01342: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00814 UTC-01345: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00827 UTC-01372: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00857 UTC-01353: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00863 UTC-01355: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00866 UTC-01441: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00871 UTC-01354: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00874 UTC-01356: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00898 UTC-01358: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00933 UTC-01437: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00938 UTC-01362: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00953 UTC-01367: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00966 UTC-01368: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00969 UTC-01363: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
00970 UTC-01366: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
01152 UTC-01428: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
01186 UTC-01349: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
01194 UTC-01371: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
01319 UTC-01314: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
01320 UTC-01373: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
01331 UTC-01374: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
01336 UTC-01387: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
01493 UTC-01390: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
01520 UTC-01377: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
01533 UTC-01341: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
01703 UTC-01442: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
01721 UTC-01379: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
01850 UTC-01384: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
02187 UTC-01378: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
02193 UTC-01386: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
02333 UTC-01388: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
02387 UTC-01480: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
02446 UTC-01400: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
02541 UTC-01392: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
02563 UTC-01457: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
02641 UTC-01399: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
02684 UTC-01393: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
02843 UTC-01396: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
02936 UTC-01397: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
03026 UTC-01398: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review; SC - 21.
03332 UTC-01401: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
03349 UTC-01402: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
03355 UTC-01403: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
03384 UTC-01405: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
03487 UTC-01406: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
03670 UTC-01404: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
03679 UTC-01411: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
03774 UTC-01412: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
03775 UTC-01413: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
03835 UTC-01415: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
03969 UTC-01416: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
03981 UTC-01418: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
04685 UTC-01424: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
05061 UTC-01425: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
05094 UTC-01408: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
05222 UTC-01350: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.

Unification of 耎 and 䎡

䎡 is a common variant of 耎.

Encoded Characters with IDS containing 䎡:
U+43A1 䎡 = U+800E 耎
U+24322 𤌢 = U+7157 煗
U+24B81 𤮁 = U+3F32 㼲
U+25C47 𥱇 = U+25BEC 𥯬
U+273E6 𧏦 = U+8761 蝡
U+28AB3 𨪳 = U+28A30 𨨰
U+28EE2 𨻢 = U+967E 陾
U+29C4A 𩱊 = U+29C44 𩱄
U+2C8BD 𬢽 (UNKNOWN ORIGIN - JK-65739)

Affected Characters:

01592 GHZR31640.06 IVD 㬉 (耎 ~ 䎡)
03004 GHZR52807.16 IVD 稬 (耎 ~ 䎡)
03447 GHZR42254.03 IVD 腝 (耎 ~ 䎡)
00208 USAT09305 IVD 偄 (耎 ~ 䎡)
01275 GHZR42502.05

Action: Withdraw / IVD 愞. (䙳 on the right hand side is an error form of 䎡. 䙳 = 票)

03800 UTC-01942

UTC, HK Comments
Unify with U+2B304 𫌄

UK Comment
Agree.

Henry's Comment
Disagree.

The pronunciation of U+2B304 𫌄 is given to be tươm, 叁 to be tam, and 參 to be tham/sam/sâm/khươm on the Nom Foundation Nom Lookup Tool. It is hightly probable that the phonetic of U+2B304 𫌄 is 叁 instead of 參.

Before the phonetic of U+2B304 𫌄 can be truly confirmed, U+2B304 and 03800 should not be unified.

Action Item
Postpone or Disunify.

Unification of 幸, 㚔 and 羍

In UCS, there are numerous examples of 幸, 㚔 and 羍 disunified.
There are three different etymologies for characters that contain 幸.

(1) U+3694 㚔 (niè, handcuffs).
Examples of characters include (usually as a semantic component):

U+57F7 執 / U+2163A 𡘺 / U+21655 𡙕 / U+2065C 𠙜 / U+26383 𦎃
U+5831 報 / U+21648 𡙈
U+776A 睪 / U+251E1 𥇡
U+25216 𥈖
U+23582 𣖂
U+20DBF 𠶿
U+2676F 𦝯
U+260A1 𦂡
U+26051 𦁑
(etc)

(2) U+21D18 𡴘 (xìng, fortune)
Examples of characters include (usually as a phonetic component):

U+200B7 𠂷 (alternative transcription) of 𡴘
U+5548 啈 / U+20D43 𠵃
U+6DAC 涬 / U+23DDF 𣷟
U+46ED 䛭 / U+27A2B 𧨫
U+7DC8 緈 / U+2609C 𦂜 / U+260C9 𦃉
(etc)

(3) U+7F8D 羍 (dá, small sheep)
Examples of characters include (usually as a phonetic component):

U+5548 啈 / U+20D43 𠵃
U+9054 達 / U+9039 逹

From the above examples, it can be shown that the shape difference between 幸, 㚔 and 羍 is not generally representative of a systematic semantic difference in Kaishu. In the case of 啈/𠵃/𠶿, the dictionary meaning/pronunciation of the characters is actually opposite to the normative meaning of its phonetic/semantic symbol.

Actually, these three forms were never really distinguished in handwriting. It would be distinguished by context. There is no need for multiple variants of the same character to be encoded. Trying to map every single variant into a dictionary into UCS would only cause confusion to the exact semantic meaning. IVD can be used for the preservation of exact shape.

The following characters are the semantically equivalent to corresponding encoded ideographs, and thus should be unified in WS2015:

00682 UTC-01451
Semantic Origin: 㚔 - handcuffs
UNIFY TO U+5709 圉
00195 USAT09927
Semantic Origin: 𡴘 (phonetic)
UNIFY TO U+5016 倖
00882 USAT09928
Semantic Origin: 𡴘 (phonetic)
UNIFY TO U+5A5E 婞
03375 GHZR52968.20
Semantic Origin: 㚔 (semantic of phonetic)
UNIFY TO U+26525 𦔥
03469 GHZR42270.04
Semantic Origin: 㚔 (semantic of phonetic)
UNIFY TO U+443E 䐾
02677 T13-2E70
Semantic Origin: 㚔 (semantic)
UNIFY TO U+24FF9 𤿹

Zhuang Character Normalization Issues

There are multiple normalization issues with the Zhuang characters submitted by the Guangxi University. Such as, 橫 should always be changed to 提 on the left side, but they are not in the Zhuang characters. In many cases, the evidence submitted is in the correct normalized form, but the font provided by Guangxi University is not.
Once they are coded, it is very troublesome to change the representative glyph. Therefore, it is suggested that Guangxi University normalize the Zhuang characters properly before their submission.

01594 G_Z3561201: left side 星 does not follow PRC conventions - should be a 提 not 橫
01618 G_Z3551104: left side 星 does not follow PRC conventions - should be a 提 not 橫
04883 G_Z0721301: Does not match PRC conventions. Compare with 養.
02445 G_Z1402302: Does not match PRC conventions. Second stroke of 馬 should not be joined with the 6th stroke.
05239 G_Z0211201: Does not match PRC conventions, last stroke of left component should be 點
03354 G_Z2231201: Does not match PRC conventions, last stroke of left component should be 點
00315 G_Z3842301: Does not match PRC Conventions, last stroke of left component should be 點
04621 G_Z2382304: Does not match PRC Conventions: The third stroke of 犬 should be 點, not 捺.
00065 G_Z1652501: Does not match PRC Conventions: The last stroke of 及 should be 點, not 捺; or the structure should be changed to enclosure.
00264 G_Z4291302: Does not match PRC Conventions: Right hand side should be 尨 (⿷尤彡).
00523 G_Z2042303: Does not match PRC Conventions: last stroke of top left component should be 點.
00629 G_Z2302202: Does not match PRC Conventions: last stroke of top left component should be 點.
00534 G_Z1592101: Does not match PRC Conventions: last stroke of left component should be 點.
00536 G_Z0811201: Does not match PRC Conventions: last stroke of left component should be 提.
00659 G_Z1831401: Does not match PRC Conventions: last stroke of top left component should be 點.
01147 G_PGLG2017 doesn't match PRC conventions, last stroke of left component should be 點.
03527 G_Z2181407 doesn't match PRC conventions; last stroke of left component should be 提.
01149 G_Z3112502 doesn't match PRC conventions, last stroke of left component should be 點.
01150 G_Z0431401 doesn't match PRC conventions, last stroke of left component should be 點.
03665 G_Z1501101 doesn't match PRC conventions, last stroke of left component should be 提.
03805 G_Z1202503 doesn't match PRC conventions, last stroke of left component should be 點.
03961 G_Z2582201 doesn't match PRC conventions, fourth stroke of left component should be 點.
03974 G_Z1412404 doesn't match PRC conventions, last stroke of left component should be 提.
05311 G_PGLG3052 doesn't match PRC conventions, fifth stroke of left component should be 點.
02577 G_Z1432204 doesn't match PRC conventions, last stroke of left component should be 點.
02581 G_Z2782104 doesn't match PRC conventions, last stroke of left component should be 點.
02328 G_Z1602601 doesn't match PRC conventions, last stroke of left component should be 點.
03144 G_Z3651201 doesn't match PRC conventions, last stroke of left component should be 點.

02179 UTC-02651

Henry's Comment
02179 UTC-02651 = 𤇆 (U+241C6) / 烟 (因 ~ 囙 -- 第一批异体字整理表)

Japan's Comment
Unify with U+241C6 𤇆

UK's Comment
Disagree. We do not believe that 回 and 囙 are unifiable components.

Henry's Comment
It is common for the middle of 回 to be written as 囙 in print, such as:

although the reverse is rather uncommon.

In the evidence provided, it is given that 02179 is a variant of 烟:

To also quote from MOE Dictionary (http://dict2.variants.moe.edu.tw/variants/rbt/word_attribute.rbt?quote_code=QTAyNDIwLTAwMQ):

𤇆 is a variant of 烟 and that the pair 因 / 囙 is included in 《第一批异体字整理表》. Therefore, the equivalence relationship between 02179 and 𤇆 is beyond reasonable doubt.

Suggested Action Item
Unify/IVD

00063 UTC-01316

IRGN2155CommentsToIRGN2107 (Chen Zhixiang)’s Comment

Henry’s Comment

WITHDRAW, reference IRGN2155CommentsToIRGN2107, OR
unify to U+2B85C

UTC’s Comment:
Disagree. Character is attested in two separate sources, and the right-hand side components are not unifiable.

Henry’s Additional Comment:
The value of encoding erroneous transcriptions already identified by the Chinese experts should be justified. E.g. the “A Concordance to Fascicle Three of the Inscriptions from the Yin Ruins” is so academically significant that its error forms should be encoded as is, in similar respect to the Kangxi Dictionary and/or Hanyu Dazidian.

"One-off Corruptions"

Owing to IRGN2211 Section B Item 3 “One-off corruptions found on tombstone carvings”, the following characters should be rejected (or unified):

SN / Source / Treatment / Reason
02854 T13-2F48 IVD碑碑別字新編
02286 T13-2D55 IVD 燦碑別字新編
02270 TE-6F6B IVD 瞧廣碑別字
02246 T13-2D4B IVD 照偏類碑別字
02821 T13-2F42 IVD 穎碑別字新編
02812 T13-2F3F IVD 智碑別字新編
02804 T13-2F3D IVD 矢碑別字新編
02734 T13-2F29 IVD 旹廣碑別字
02750 T13-2F2F IVD 督碑別字新編
02154 T13-2D32 IVD 灮廣碑別字
02088 T13-2D27 IVD 澡碑別字新編
02086 T13-2D24 IVD 㴱碑別字新編
04289 T13-3138 IVD 溯廣碑別字
04299 T13-313C IVD 逮廣碑別字
02073 T13-2D25 IVD 淄廣碑別字
02574 T13-2E49 IVD 暴偏類碑別字
02061 T13-2D21 IVD 潰偏類碑別字
01948 T13-2C54 IVD 步碑別字新編
02564 T13-2E48 IVD 當廣碑別字
02557 T13-2E42 IVD 星碑別字新編
01931 T13-2C4C IVD氤碑別字新編
01933 T13-2C4D IVD氤碑別字新編
01934 T13-2C4E IVD氤碑別字新編
01929 T13-2C4A IVD 氣偏類碑別字
02448 T13-2D79 IVD 玉偏類碑別字
02453 T13-2D7A IVD珍偏類碑別字
02437 T13-2D76 IVD 敵碑別字新編
04003 T13-3128 IVD 敗偏類碑別字
03968 T13-3124 IVD 短碑別字新編
03803 T13-3075 IVD 福碑別字新編
03777 T13-3074 IVD 礼碑別字新編
03490 T13-304F IVD 歸偏類碑別字
03594 T13-3063 IVD 𤼲廣碑別字
03593 T13-3062 IVD 𤼲偏類碑別字
03367 T13-3044 IVD 孝廣碑別字
03522 T13-3054 IVD 朝偏類碑別字
01949 T13-2C53 IVD沉廣碑別字
02642 T13-2E5F IVD 癸碑別字新編, missing stroke
02644 T13-2E60 IVD 癸碑別字新編, protruding stroke

03610 T13-3068 IVD 發/彂金石文字辨異
02036 T13-2C71 IVD 淑金石文字辨異
04000 T13-3129 IVD 真金石文字辨異
02845 T13-2F47 IVD 碑金石文字辨異
02225 T13-2D45 IVD 婆金石文字辨異
02097 T13-2D2B IVD 演金石文字辨異
02598 T13-2E51 IVD 病金石文字辨異

Unification of various forms of 丘

It is suggested that all historical variants should be unified to the corresponding normalized form (closest variant form?).

The normalized form should be defined as follows:

丘 (丠/㐀)
虛/虚 (𧆳/虗)

Affected Characters:

04681 GHZR74469.18
IVD 𨼋
04932 GHZR84846.02
IVD 駈 / 𩢩
04704 GHZR74481.06
IVD 04703 GHZR74481.05

00597 UTC-02810

UTC's Comment
Unify with U+2D227 𭈧

UK's Comment
Agree.

Henry's Comment
Disagree. 叁 and 参 are cognate but have different semantics in modern day. The phonetic of U+2D227 should be confirmed to be 参 instead of 叁 before unification.

Action Item
Disunify or Postpone.

Unification of 犮 and 叐

Henry’s Comment:
00698 UTC-01204 UNIFY TO 坺 U+577A
00861 USAT90292 UNIFY TO 妭 U+59AD

UTC’s Comment:
IDS is ⿰土叐. The UTC does not agree.

Henry’s Supplementary Information:
叐 and 犮 are variant forms of the same component*. In another version of 弇山堂別集卷三十三, the character 坺 (U+577A) is used instead:

Source: http://ctext.org/library.pl?if=gb&file=58118&page=103, which is 《欽定四庫全書》本·史部五·雜史類。

*: Referencing variants of拔 (U+62D4) from the MOE Dictionary:

The source quoted by MOE Dictionary has been omitted for brevity.

A list of coded characters containing 叐 are listed as follows:

U+53D0 叐: Kanxi / Hanyu Dazidian: variant of 犮
U+39DE 㧞: variant of 拔based on evidence of 拔 in MOE Dictionary
U+47E6 䟦: variant of 跋 based on evidence of 跋 in MOE Dictionary
U+2209B 𢂛: unverifiable; G4K-sourced character; original sources could not be found
U+2342A 𣐪: TF-sourced character; pronunciation same as 柭
U+24923 𤤣: TF-sourced character; pronunciation same as 𤤒
U+2595C 𥥜: variant of 突 according to Hanyu Dazidian
U+25FC8 𥿈: variant of 𥿈 @ MOE Dictionary, HYD and Koseki
U+26B5E 𦭞: variant of 𦭞 @ HYD
U+296BF 𩚿: variant of 飫 @ MOE Dictionary, HYD and Koseki
U+2989A 𩢚: variant of 䮂 @ KX, MOE, HYD, Koseki
U+2C4B2 : TC-sourced character; pronunciation same as 祓
U+2CAC6 : TD-sourced character; pronunciation same as 鈸
U+2CF74 : variant of 伏 according to SAT Database
U+2D71E : used in name of person in SAT database or as a phonetic transcription; meaning could not be verified.
U+2D805: variant of 戾 according to Mojikiban
U+2DF7D: variant of 鉢 according to SAT Database, which is variant of 盋 according to PRC 第一批异体字整理表, Koseki & HYD
U+2E0BA: variant of 秡 according to Mojikiban “読み・字形による類推” with remark “地名外字”
U+2E2DF: variant of 黻 according to footnote in SAT Database.
U+2E3DF: variant of 沷 according to footnote in SAT Database. The next character was 若, so a possible phenomenon of a “類化”

To summarize, in majority of cases, 叐 component is a strict variant of 犮. In 2 cases it is equivalent to 犬, which is a property also shared by 犮 itself. In 2 cases the source of this character could not be verified, and in 1 case there were multiple sources to show that it was a “corruption” of another similar shaped component.

Therefore I think there is enough evidence to regard叐 as unifiable to 犮.

In fact, this rule is regarded as a normalization by ROK in IRGN2154:

Conclusion:

00698 UTC-01204 unify/IVD to 坺 U+577A
00861 USAT90292 unify/IVD to 妭 U+59AD

Unification of 肉 and ⺼

There are a large number of characters where the U+2EBC has been replaced with the full 肉 radical. IRG should consider to allow encoding these characters as IVD as there are 500+ characters with U+2EBC. Variants using 肉 on the left are sufficiently rare in modern usage.
It is suggested that only IVD if 肉 on left; disunify if 肉 is at the bottom. Zhuang characters in particular should be studied on a case-by-case basis because 肉 may play a phonetic part instead of semantic part. Those with 肉 on the bottom have been included for completeness.

00282 USAT08988 IVD 䏌: 肉 vs U+2EBC
02178 USAT08962 IVD 炙
02670 T13-2E6E IVD 䏢
03390 USAT06462 IVD 肊: text indicates 乙 as phonetic; 肉 vs U+2EBC
03392 USAT09914 IVD 肋: 肉 vs U+2EBC
03393 USAT06266 IVD 肙: 肉 vs U+2EBC
03395 USAT08303 IVD 𦘺: 肉 vs U+2EBC
03397 USAT10232 IVD 肧: 肉 vs U+2EBC
03403 USAT05646 IVD 肴: 肉 vs U+2EBC, top 爻 (ref: 希~𢁫 // 04335)
03405 USAT90295 IVD 股/肢: 肉 vs U+2EBC
03407 USAT08919 IVD 育: 肉 vs U+2EBC
03411 USAT06030 IVD 背: 肉 vs U+2EBC
03414 USAT08746 IVD 䏣: 肉 vs U+2EBC
03416 USAT90297 IVD 胜: 肉 vs U+2EBC
03419 USAT06375 IVD 肺: 肉 vs U+2EBC; IDS: ⿰肉巿
03420 USAT10231 IVD 胎: 肉 vs U+2EBC
03422 USAT06361 IVD 胷: 肉 vs U+2EBC
03423 USAT08968 IVD 脃: 肉 vs U+2EBC
03426 USAT06768 IVD 胯: 肉 vs U+2EBC; (𡗢 ~ 夸 // 誇 ~ 𧧳 // 跨 ~ 𨀗 // 䠸 ~ 𨉀 // 鮬 ~ 𩶮 // 洿 ~ 𣴰);SC - 6
03437 USAT07202 IVD 腴: 肉 vs U+2EBC; SC = 9
03463 GHZR10093.03 IVD 𦠒
03470 USAT90298 IVD 臗

Zhuang Character Normalization Issue (Components)

The following submitted Zhuang characters do not use components that are considered standard by PRC. PRC may wish to say that Zhuang characters are not expected to be normalized. However, Ideographs used by Han languages and dialects may use the same character, if such evidence is discovered, then those characters will likely be unified (or IVD) with the Zhuang characters. Then, the existing Zhuang glyghs will need to be modified to use the PRC normalized form, creating lots of problems for existing fonts.
Therefore, it is wise to normalize the glyphs to use the PRC normalized form first. If the explicit form in the dictionary wish to be preserved, IVD should be registered after the normalized form is encoded.

00980 G_Z3951603
Consider normalize the right hand side to 兒 (U+5152) instead of 𫤘 (U+2B918).
00224 G_Z2281101
Consider normalize to 觉 as 覚 is not a normalized form.
02271 G_Z2302301
Consider normalize to 觉 as 覚 is not a normalized form.
04111 G_Z1231301
The phonetic of the character is 忝 (tim1). It should be normalized to 忝.

The confusion of 氺 and 心 for 忝 is common:
00739 G_Z1191201
same as above; then unify to 𡍞 (U+2135E)
03110 G_Z4442201
斉 is not the PRC normative form. Consider normalize to 齐.
03187 G_Z0671501
埀 is not PRC normative form. Consider normalize to 垂.
05030 G_Z3621104
悪 is not the normative form. Consider normalize to 惡.
悪 and 惡 are unifyable, IRG#47
03008 G_Z3271501
宻 is not the PRC normative form. Consider normalize to 密.

Please also refer to IRGN2154 ROK Normalization Rule 5-1:
03140 G_Z4491201
𪮫 is not the PRC normative form. Consider normalize to 撒.

Unification of 冊 and 𠕋

There are various ways to which 冊 can be written but do not result in any etymological difference.

Existing UCV Rules

Proposed UCV Rule

The shape difference between 冊 and 𠕋 could be considered rather "large" for inter-locale unification. Thus, this unification could be restricted to IVD only.

Affected Character:
02460 T13-2D7C

03555 UTC-01950

WS2015v3.0 Discussion Record
IRGN2179PostponedV3.0
pending for solutions (not unified by U+82B2, G source of U+82B2 may be changed), irg47.
unified by U+82B2, irg46.

Henry's Comment

U+82B2 is used in 第一批异体字整理表 and 《通用規範漢字表》異體字 to mean 花;
Keep G-source of U+82B2 unchanged.
03555 should be separately encoded.

UK Response
Strongly agree!

Action Item
Suggested to annotate the Code Charts with the correct semantics

00198 UTC-01573

Henry’s Comment:
00198 UTC-01573 = 𠈇/𠉦, corrupted form of 𠈇U+20207 /𠉦U+20266

UK’s Comment:
Disagree. UTC-01573 is a variant form of U+5BBF 宿, but U+20207 𠈇 is a variant form of U+5919 夙, so they are different characters and cannot be unified.

Henry’s Further Comments:
夙 is often exchanged with 宿 in old Hanzi. Furthermore, the phonetic of 宿 is 𠈇 (stricter transcription -- 𠉦). The fact that the source says 00198 should be read as “U+5BBF 宿” does not necessarily contradict that 00198 can be unified with 𠈇U+20207 /𠉦U+20266.

Refer to the following source by Chinese University of Hong Kong (http://humanum.arts.cuhk.edu.hk/Lexis/lexi-mf/search.php?word=%E5%AE%BF):

Please note that the presence of the 宀 does not affect its meeting; the form with or without is found in Oracle Bone evidence and also Jianbo Wenzi.

Given that,
(1) 𠈇U+20207 and 𠉦 U+20266 are variants of U+5BBF 宿 (per CUHK source)
(2) U+5BBF 宿 is variant with U+5919 夙 (per CUHK source)
(3) U+5919 夙 is variant of 𠈇U+20207 and 𠉦U+20266 (provided by UK)
(4) 00198 is variant of U+5BBF 宿 (provided by UK)
(5) 00198 is virtually indistinguishable with U+20266 𠉦, (and very likely referring to the same Oracle Bone/Old Hanzi glyph)

Action Item
Unify with U+20266 𠉦.

Character Stroke Count for 丽

丽 is present in the Kangxi Dictionary with a SC = 7. The radical is Dot (丶). Therefore, the total SC should be 8.

Iit is also counted as total SC = 8 when on top of 鹿

However, U+4E3D in the Code Charts has an SC = 6 (total SC = 7)

Affected Characters as follows:

02746 UTC-01877
04094 UTC-02120
00376 UTC-01690
03824 UTC-02112
03793 UTC-01940
01398 UTC-01809

Action Item
Add 丽 to IRGN954AR with total SC of 8.

04924 T13-314F

The glyph image and the IDS/SC submitted by TCA is not an accurate transcription of the evidence:

艺 is a character invented as a simplified glyph of 藝. The right hand side is not 艺 but 𠃟.

Thus, it should be transcripted to ⿰馬𠃟.

𠃟 is an alternative transcription of 也:

Action Item

Correct glyph to ⿰馬𠃟.
Unify (IVD) to U+99B3 馳.

04035 UTC-01167

Henry’s Comment:
04035 UTC-01167: more evidence? (賔=賓) no ⿰貝賓 exists yet.

UTC’s Comment:
The UTC does not agree. The supplied evidence is clear.

Henry’s Additional Comments:
賔 is a strict transcription of the character which is more popularly written as “賓” in modern times. The evidence from Grammata Serica Recensa shows only the transcripted form and not the original Oracle Bone or Bronze or Seal Script:

This transcription has caught my attention because no character composed of ⿰貝賓 exists yet – so it is possible that UTC-01167 is an incorrect transcription of a certain character, or the character has completely vanished in medieval and modern usage so no modern transcription exists.

If possible, the original source that this transcripted character was based on should be provided, so the transcription can be verified.

Nevertheless, given the historical significance of Grammata Serica Recensa, unless the original oracle/bronze/seal sources show clear evidence that this character has been horribly mis-transcripted, the character in its current form is still worth encoding.

Action Item
Postpone or Keep

01416 UTC-02632

WS2015 v.3 Discussion Record
unified by U+2BF4A (GZFY-00688) for IVS, irg47.

Henry's Comment
unified with U+632C for IVS, irg47. NOT unified with U+2BF4A (GZFY-00688).

UK Response
Disagree. It is a non-unifiable variant of U+632C, and should be encoded separately. As reported in IRGN2108Andrew_WG2N4682.pdf, the glyph form of U+2BF4A is incorrect, and should be corrected to ⿰
扌学, so UTC-02632 cannot be unified with U+2BF4A.

Henry's Additional Comment
According to my meeting notes, the resolution was unification with U+632C for IVS instead of U+2BF4A (GZFY-00688).

--
Additional Info:

Unification of 取 and ⿰耳𡿨

Henry's Comment
取 and ⿰耳𡿨 should be unified because they are common systematic variations in the past.

Affected Characters in WS2015:

03991 GHZR63859.13

Action: Unify to 䝒 (U+4752)
04966 GHZR84879.02

Action: Unify to 驟 (U+9A5F) (i.e. confirm suggestion in IRG#46)

00346 G_Z1841301

The 几 (U+51E0) is the phonetic, not 𠘧 (U+20627). The character 00346 cannot be changed to use 𠘧 (U+20627). It should use 几 (with hook).
The presence or absence of the hook is not a location variant form issue. Refer to U+28972 where the hook is present even when the component is situated at the top:

05197 UTC-02557

Henry's Comment
Does not match PRC conventions: right hand side should be normalized to 恒; unify to 05198.

UK Response
Disagree. Hanyu Dazidian has separate entries for both characters.

Henry's Response
恒 and 恆 are unifyable.

05541 UTC-02614

Henry's Comment
IVD to U+2B733 (冉 ~ 冄)? PRC Conventions prefer 冉 > 冄.

UK's Comment
Disagree. We do not think that 冄 and 冉 are unifiable components, and 《漢語大字典》 has separate entries for both U+4DB2 䶲 and U+2A6AE 𪚮 and the two corresponding simplified forms.

Henry's Comment
The unification of 冉 and 冄 should be considered unifiable for IVD. There are already many exact equivalents encoded in URO, and many more can be encoded.

Examples of exact equivalents:
U+5189 冉 = U+5184 冄
U+67DF 柟 = U+678F 枏
U+8043 聃 = U+803C 耼
U+86BA 蚺 = U+86A6 蚦
U+88A1 袡 = U+887B 衻
U+9AEF 髯 = U+9AE5 髥
U+59CC 姌 = U+36A9 㚩
U+8211 舑 = U+4459 䑙
U+82D2 苒 = U+44A3 䒣
U+279A6 𧦦 = U+46C1 䛁
U+294FF 𩓿 = U+4AC7 䫇
U+5465 呥 = U+20BCD 𠯍
[...]
U+722F 爯 = U+2DDAA 𭶪
U+7A31 稱 = U+2E0CE 𮃎

Action Item
Unify via IVD.

hfhchan / irg-ws2015 Goto Github PK

irg-ws2015's People

Contributors

Stargazers

Watchers

irg-ws2015's Issues

Recommend Projects

Recommend Topics

Recommend Org