irg-ws2015's People
irg-ws2015's Issues
00765 UTC-01217, 00777 UTC-01219
Henry’s Comment:
00765 UTC-01217 UNIFY WITH 00777 UTC-01219 (keep 00777).
UTC’s Comment:
IDSes are ⿰土⿱𥃭木 and ⿰土⿱直木. The UTC does not agree.
Henry’s Additional Comments:
The sources provided by UTC are as follows:
Given the evidence provided by the UTC, it is rather obvious from the context that they should be referring to the same person, and thus be the same character.
A Wikipedia article shows that according to 《明實錄》, the name of the person is 志㙞. If this can be confirmed by any expert familiar with the members of the Ming dynasty imperial family, then the two corrupted forms (UTC-01217, UTC-01219) should be unified to 㙞 (U+365E) (via IVD).
《明實錄》 can be found on **研究所 歷史語言研究所 明實錄、朝鮮王朝實錄、清實錄資料庫:
Action Item
Postpone or Withdraw
00069 UTC-02765
Henry's Comment
= 𠘧 (U+20627) (variant, protrousion of strokes.)
UK's Comment
Disagree. Non-cognate, and stroke variation is significant.
Henry's Comments
Below is UK's evidence of 00069:
Below is HYDZD evidence of 𠘧 (U+20627):
First, the pronunciation are the same.
Second, the second meaning of 00069 is the same as the strict meaning of U+20627.
Third, the first meaning of 00069 is used as a grammatical suffix to show extent. Very likely, it is a character borrowed for its sound.
Lastly, the top left part shape of 00069 matches the Shuowen shape of U+20627. 00069 is likely simply another transcription of the same character.
It is likely they are the same character.
Action Item
Unify or Postpone.
02315 T13-2D5B
Unification of 夸 and 𡗢
𡗢 is a common historical variant form of 夸.
Existing Disunified Strict Semantic Variants:
夸 U+5938 = 𡗢 U+215E2
洿 U+6D3F = 𣴰 U+23D30
誇 U+8A87 = 𧧳 U+279F3
跨 U+8DE8 = 𨀗 U+28017
䠸 U+4838 = 𨉀 U+28240
鮬 U+9BAC = 𩶮 U+29DAE
Affected Characters:
-
03426 USAT06768
Action Item: Unify (IVD) to 胯 U+80EF (ref: issue #23)
03798 UTC-01941
Henry's Comment
03798 UTC-01941 UNIFY to 褝 (U+891D)
UTC, HK Comment
Unify with U+891D 褝
UK's Comment
Disagree. We think that the unification of the components 単 单 for U+7985 禅 was a mistake, and causes problems for users when a default font shows an unacceptable glyph form. The G-source glyph for U+891D has 単 on the right, so font developers will follow this glyph form when designing fonts for PRC, but the G-source glyph form for U+891D is unacceptable as the simplified form of U+891D 襌.
Therefore we strongly think it will serve users best to encode UTC-01941 (⿰衤单) as a separate character.
Henry's Comment
The above problem will occur for U+20219 𠈙 and U+2548E 𥒎 also, even though they occur in Extension B. However, China has agreed to correct multiple erroneous glyphs in GE standard in IRGN 2170 (involving U+8669 and U+3B9D). Therefore, the rejection to correct U+891D in line with China's normalized glyphs should not be accepted.
Action Item
Unify
Multiple Withdrawn Characters/Glyphs
Multiple Characters/Glyphs were withdrawn in WS 2015 v2 IRGN2155 UK Review, but were not reflected in the Working Set:
- 00123 UTC-01423: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00130 UTC-01318: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00138 UTC-01391: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00296 UTC-01326: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00345 UTC-01329: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00348 UTC-01330: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00475 UTC-01337: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00524 UTC-01421: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00542 UTC-01338: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00560 UTC-01339: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00561 UTC-01369: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00662 UTC-01342: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00814 UTC-01345: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00827 UTC-01372: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00857 UTC-01353: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00863 UTC-01355: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00866 UTC-01441: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00871 UTC-01354: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00874 UTC-01356: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00898 UTC-01358: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00933 UTC-01437: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00938 UTC-01362: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00953 UTC-01367: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00966 UTC-01368: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00969 UTC-01363: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 00970 UTC-01366: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 01152 UTC-01428: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 01186 UTC-01349: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 01194 UTC-01371: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 01319 UTC-01314: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 01320 UTC-01373: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 01331 UTC-01374: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 01336 UTC-01387: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 01493 UTC-01390: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 01520 UTC-01377: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 01533 UTC-01341: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 01703 UTC-01442: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 01721 UTC-01379: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 01850 UTC-01384: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 02187 UTC-01378: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 02193 UTC-01386: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 02333 UTC-01388: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 02387 UTC-01480: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 02446 UTC-01400: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 02541 UTC-01392: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 02563 UTC-01457: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 02641 UTC-01399: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 02684 UTC-01393: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 02843 UTC-01396: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 02936 UTC-01397: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 03026 UTC-01398: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review; SC - 21.
- 03332 UTC-01401: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 03349 UTC-01402: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 03355 UTC-01403: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 03384 UTC-01405: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 03487 UTC-01406: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 03670 UTC-01404: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 03679 UTC-01411: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 03774 UTC-01412: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 03775 UTC-01413: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 03835 UTC-01415: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 03969 UTC-01416: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 03981 UTC-01418: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 04685 UTC-01424: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 05061 UTC-01425: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 05094 UTC-01408: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
- 05222 UTC-01350: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
Unification of 耎 and 䎡
䎡 is a common variant of 耎.
Encoded Characters with IDS containing 䎡:
U+43A1 䎡 = U+800E 耎
U+24322 𤌢 = U+7157 煗
U+24B81 𤮁 = U+3F32 㼲
U+25C47 𥱇 = U+25BEC 𥯬
U+273E6 𧏦 = U+8761 蝡
U+28AB3 𨪳 = U+28A30 𨨰
U+28EE2 𨻢 = U+967E 陾
U+29C4A 𩱊 = U+29C44 𩱄
U+2C8BD 𬢽 (UNKNOWN ORIGIN - JK-65739)
Affected Characters:
03800 UTC-01942
UTC, HK Comments
Unify with U+2B304 𫌄
UK Comment
Agree.
Henry's Comment
Disagree.
The pronunciation of U+2B304 𫌄 is given to be tươm, 叁 to be tam, and 參 to be tham/sam/sâm/khươm on the Nom Foundation Nom Lookup Tool. It is hightly probable that the phonetic of U+2B304 𫌄 is 叁 instead of 參.
Before the phonetic of U+2B304 𫌄 can be truly confirmed, U+2B304 and 03800 should not be unified.
Action Item
Postpone or Disunify.
Unification of 幸, 㚔 and 羍
In UCS, there are numerous examples of 幸, 㚔 and 羍 disunified.
There are three different etymologies for characters that contain 幸.
(1) U+3694 㚔 (niè, handcuffs).
Examples of characters include (usually as a semantic component):
- U+57F7 執 / U+2163A 𡘺 / U+21655 𡙕 / U+2065C 𠙜 / U+26383 𦎃
- U+5831 報 / U+21648 𡙈
- U+776A 睪 / U+251E1 𥇡
- U+25216 𥈖
- U+23582 𣖂
- U+20DBF 𠶿
- U+2676F 𦝯
- U+260A1 𦂡
- U+26051 𦁑
(etc)
(2) U+21D18 𡴘 (xìng, fortune)
Examples of characters include (usually as a phonetic component):
- U+200B7 𠂷 (alternative transcription) of 𡴘
- U+5548 啈 / U+20D43 𠵃
- U+6DAC 涬 / U+23DDF 𣷟
- U+46ED 䛭 / U+27A2B 𧨫
- U+7DC8 緈 / U+2609C 𦂜 / U+260C9 𦃉
(etc)
(3) U+7F8D 羍 (dá, small sheep)
Examples of characters include (usually as a phonetic component):
- U+5548 啈 / U+20D43 𠵃
- U+9054 達 / U+9039 逹
From the above examples, it can be shown that the shape difference between 幸, 㚔 and 羍 is not generally representative of a systematic semantic difference in Kaishu. In the case of 啈/𠵃/𠶿, the dictionary meaning/pronunciation of the characters is actually opposite to the normative meaning of its phonetic/semantic symbol.
Actually, these three forms were never really distinguished in handwriting. It would be distinguished by context. There is no need for multiple variants of the same character to be encoded. Trying to map every single variant into a dictionary into UCS would only cause confusion to the exact semantic meaning. IVD can be used for the preservation of exact shape.
The following characters are the semantically equivalent to corresponding encoded ideographs, and thus should be unified in WS2015:
-
00682 UTC-01451
Semantic Origin: 㚔 - handcuffs
UNIFY TO U+5709 圉 -
00195 USAT09927
Semantic Origin: 𡴘 (phonetic)
UNIFY TO U+5016 倖 -
00882 USAT09928
Semantic Origin: 𡴘 (phonetic)
UNIFY TO U+5A5E 婞 -
03375 GHZR52968.20
Semantic Origin: 㚔 (semantic of phonetic)
UNIFY TO U+26525 𦔥 -
03469 GHZR42270.04
Semantic Origin: 㚔 (semantic of phonetic)
UNIFY TO U+443E 䐾 -
02677 T13-2E70
Semantic Origin: 㚔 (semantic)
UNIFY TO U+24FF9 𤿹
Zhuang Character Normalization Issues
There are multiple normalization issues with the Zhuang characters submitted by the Guangxi University. Such as, 橫 should always be changed to 提 on the left side, but they are not in the Zhuang characters. In many cases, the evidence submitted is in the correct normalized form, but the font provided by Guangxi University is not.
Once they are coded, it is very troublesome to change the representative glyph. Therefore, it is suggested that Guangxi University normalize the Zhuang characters properly before their submission.
01594 G_Z3561201: left side 星 does not follow PRC conventions - should be a 提 not 橫
01618 G_Z3551104: left side 星 does not follow PRC conventions - should be a 提 not 橫
04883 G_Z0721301: Does not match PRC conventions. Compare with 養.
02445 G_Z1402302: Does not match PRC conventions. Second stroke of 馬 should not be joined with the 6th stroke.
05239 G_Z0211201: Does not match PRC conventions, last stroke of left component should be 點
03354 G_Z2231201: Does not match PRC conventions, last stroke of left component should be 點
00315 G_Z3842301: Does not match PRC Conventions, last stroke of left component should be 點
04621 G_Z2382304: Does not match PRC Conventions: The third stroke of 犬 should be 點, not 捺.
00065 G_Z1652501: Does not match PRC Conventions: The last stroke of 及 should be 點, not 捺; or the structure should be changed to enclosure.
00264 G_Z4291302: Does not match PRC Conventions: Right hand side should be 尨 (⿷尤彡).
00523 G_Z2042303: Does not match PRC Conventions: last stroke of top left component should be 點.
00629 G_Z2302202: Does not match PRC Conventions: last stroke of top left component should be 點.
00534 G_Z1592101: Does not match PRC Conventions: last stroke of left component should be 點.
00536 G_Z0811201: Does not match PRC Conventions: last stroke of left component should be 提.
00659 G_Z1831401: Does not match PRC Conventions: last stroke of top left component should be 點.
01147 G_PGLG2017 doesn't match PRC conventions, last stroke of left component should be 點.
03527 G_Z2181407 doesn't match PRC conventions; last stroke of left component should be 提.
01149 G_Z3112502 doesn't match PRC conventions, last stroke of left component should be 點.
01150 G_Z0431401 doesn't match PRC conventions, last stroke of left component should be 點.
03665 G_Z1501101 doesn't match PRC conventions, last stroke of left component should be 提.
03805 G_Z1202503 doesn't match PRC conventions, last stroke of left component should be 點.
03961 G_Z2582201 doesn't match PRC conventions, fourth stroke of left component should be 點.
03974 G_Z1412404 doesn't match PRC conventions, last stroke of left component should be 提.
05311 G_PGLG3052 doesn't match PRC conventions, fifth stroke of left component should be 點.
02577 G_Z1432204 doesn't match PRC conventions, last stroke of left component should be 點.
02581 G_Z2782104 doesn't match PRC conventions, last stroke of left component should be 點.
02328 G_Z1602601 doesn't match PRC conventions, last stroke of left component should be 點.
03144 G_Z3651201 doesn't match PRC conventions, last stroke of left component should be 點.
02179 UTC-02651
Henry's Comment
02179 UTC-02651 = 𤇆 (U+241C6) / 烟 (因 ~ 囙 -- 第一批异体字整理表)
Japan's Comment
Unify with U+241C6 𤇆
UK's Comment
Disagree. We do not believe that 回 and 囙 are unifiable components.
Henry's Comment
It is common for the middle of 回 to be written as 囙 in print, such as:
although the reverse is rather uncommon.
In the evidence provided, it is given that 02179 is a variant of 烟:
To also quote from MOE Dictionary (http://dict2.variants.moe.edu.tw/variants/rbt/word_attribute.rbt?quote_code=QTAyNDIwLTAwMQ):
𤇆 is a variant of 烟 and that the pair 因 / 囙 is included in 《第一批异体字整理表》. Therefore, the equivalence relationship between 02179 and 𤇆 is beyond reasonable doubt.
Suggested Action Item
Unify/IVD
00063 UTC-01316
IRGN2155CommentsToIRGN2107 (Chen Zhixiang)’s Comment
Henry’s Comment
- WITHDRAW, reference IRGN2155CommentsToIRGN2107, OR
- unify to U+2B85C
UTC’s Comment:
Disagree. Character is attested in two separate sources, and the right-hand side components are not unifiable.
Henry’s Additional Comment:
The value of encoding erroneous transcriptions already identified by the Chinese experts should be justified. E.g. the “A Concordance to Fascicle Three of the Inscriptions from the Yin Ruins” is so academically significant that its error forms should be encoded as is, in similar respect to the Kangxi Dictionary and/or Hanyu Dazidian.
"One-off Corruptions"
Owing to IRGN2211 Section B Item 3 “One-off corruptions found on tombstone carvings”, the following characters should be rejected (or unified):
SN / Source / Treatment / Reason
02854 T13-2F48 IVD碑 碑別字新編
02286 T13-2D55 IVD 燦 碑別字新編
02270 TE-6F6B IVD 瞧 廣碑別字
02246 T13-2D4B IVD 照 偏類碑別字
02821 T13-2F42 IVD 穎 碑別字新編
02812 T13-2F3F IVD 智 碑別字新編
02804 T13-2F3D IVD 矢 碑別字新編
02734 T13-2F29 IVD 旹 廣碑別字
02750 T13-2F2F IVD 督 碑別字新編
02154 T13-2D32 IVD 灮 廣碑別字
02088 T13-2D27 IVD 澡 碑別字新編
02086 T13-2D24 IVD 㴱 碑別字新編
04289 T13-3138 IVD 溯 廣碑別字
04299 T13-313C IVD 逮 廣碑別字
02073 T13-2D25 IVD 淄 廣碑別字
02574 T13-2E49 IVD 暴 偏類碑別字
02061 T13-2D21 IVD 潰 偏類碑別字
01948 T13-2C54 IVD 步 碑別字新編
02564 T13-2E48 IVD 當 廣碑別字
02557 T13-2E42 IVD 星 碑別字新編
01931 T13-2C4C IVD氤 碑別字新編
01933 T13-2C4D IVD氤 碑別字新編
01934 T13-2C4E IVD氤 碑別字新編
01929 T13-2C4A IVD 氣 偏類碑別字
02448 T13-2D79 IVD 玉 偏類碑別字
02453 T13-2D7A IVD珍 偏類碑別字
02437 T13-2D76 IVD 敵 碑別字新編
04003 T13-3128 IVD 敗 偏類碑別字
03968 T13-3124 IVD 短 碑別字新編
03803 T13-3075 IVD 福 碑別字新編
03777 T13-3074 IVD 礼 碑別字新編
03490 T13-304F IVD 歸 偏類碑別字
03594 T13-3063 IVD 𤼲 廣碑別字
03593 T13-3062 IVD 𤼲 偏類碑別字
03367 T13-3044 IVD 孝 廣碑別字
03522 T13-3054 IVD 朝 偏類碑別字
01949 T13-2C53 IVD沉 廣碑別字
02642 T13-2E5F IVD 癸 碑別字新編, missing stroke
02644 T13-2E60 IVD 癸 碑別字新編, protruding stroke
03610 T13-3068 IVD 發/彂 金石文字辨異
02036 T13-2C71 IVD 淑 金石文字辨異
04000 T13-3129 IVD 真 金石文字辨異
02845 T13-2F47 IVD 碑 金石文字辨異
02225 T13-2D45 IVD 婆 金石文字辨異
02097 T13-2D2B IVD 演 金石文字辨異
02598 T13-2E51 IVD 病 金石文字辨異
Unification of various forms of 丘
It is suggested that all historical variants should be unified to the corresponding normalized form (closest variant form?).
The normalized form should be defined as follows:
- 丘 (丠/㐀)
- 虛/虚 (𧆳/虗)
Affected Characters:
-
04681 GHZR74469.18
IVD 𨼋 -
04932 GHZR84846.02
IVD 駈 / 𩢩 -
04704 GHZR74481.06
IVD 04703 GHZR74481.05
00597 UTC-02810
Unification of 犮 and 叐
Henry’s Comment:
00698 UTC-01204 UNIFY TO 坺 U+577A
00861 USAT90292 UNIFY TO 妭 U+59AD
UTC’s Comment:
IDS is ⿰土叐. The UTC does not agree.
Henry’s Supplementary Information:
叐 and 犮 are variant forms of the same component*. In another version of 弇山堂別集卷三十三, the character 坺 (U+577A) is used instead:
Source: http://ctext.org/library.pl?if=gb&file=58118&page=103, which is 《欽定四庫全書》本·史部五·雜史類。
*: Referencing variants of拔 (U+62D4) from the MOE Dictionary:
The source quoted by MOE Dictionary has been omitted for brevity.
A list of coded characters containing 叐 are listed as follows:
- U+53D0 叐: Kanxi / Hanyu Dazidian: variant of 犮
- U+39DE 㧞: variant of 拔based on evidence of 拔 in MOE Dictionary
- U+47E6 䟦: variant of 跋 based on evidence of 跋 in MOE Dictionary
- U+2209B 𢂛: unverifiable; G4K-sourced character; original sources could not be found
- U+2342A 𣐪: TF-sourced character; pronunciation same as 柭
- U+24923 𤤣: TF-sourced character; pronunciation same as 𤤒
- U+2595C 𥥜: variant of 突 according to Hanyu Dazidian
- U+25FC8 𥿈: variant of 𥿈 @ MOE Dictionary, HYD and Koseki
- U+26B5E 𦭞: variant of 𦭞 @ HYD
- U+296BF 𩚿: variant of 飫 @ MOE Dictionary, HYD and Koseki
- U+2989A 𩢚: variant of 䮂 @ KX, MOE, HYD, Koseki
- U+2C4B2 : TC-sourced character; pronunciation same as 祓
- U+2CAC6 : TD-sourced character; pronunciation same as 鈸
- U+2CF74 : variant of 伏 according to SAT Database
- U+2D71E : used in name of person in SAT database or as a phonetic transcription; meaning could not be verified.
- U+2D805: variant of 戾 according to Mojikiban
- U+2DF7D: variant of 鉢 according to SAT Database, which is variant of 盋 according to PRC 第一批异体字整理表, Koseki & HYD
- U+2E0BA: variant of 秡 according to Mojikiban “読み・字形による類推” with remark “地名外字”
- U+2E2DF: variant of 黻 according to footnote in SAT Database.
- U+2E3DF: variant of 沷 according to footnote in SAT Database. The next character was 若, so a possible phenomenon of a “類化”
To summarize, in majority of cases, 叐 component is a strict variant of 犮. In 2 cases it is equivalent to 犬, which is a property also shared by 犮 itself. In 2 cases the source of this character could not be verified, and in 1 case there were multiple sources to show that it was a “corruption” of another similar shaped component.
Therefore I think there is enough evidence to regard叐 as unifiable to 犮.
In fact, this rule is regarded as a normalization by ROK in IRGN2154:
Conclusion:
Unification of 肉 and ⺼
There are a large number of characters where the U+2EBC has been replaced with the full 肉 radical. IRG should consider to allow encoding these characters as IVD as there are 500+ characters with U+2EBC. Variants using 肉 on the left are sufficiently rare in modern usage.
It is suggested that only IVD if 肉 on left; disunify if 肉 is at the bottom. Zhuang characters in particular should be studied on a case-by-case basis because 肉 may play a phonetic part instead of semantic part. Those with 肉 on the bottom have been included for completeness.
00282 USAT08988 IVD 䏌: 肉 vs U+2EBC
02178 USAT08962 IVD 炙
02670 T13-2E6E IVD 䏢
03390 USAT06462 IVD 肊: text indicates 乙 as phonetic; 肉 vs U+2EBC
03392 USAT09914 IVD 肋: 肉 vs U+2EBC
03393 USAT06266 IVD 肙: 肉 vs U+2EBC
03395 USAT08303 IVD 𦘺: 肉 vs U+2EBC
03397 USAT10232 IVD 肧: 肉 vs U+2EBC
03403 USAT05646 IVD 肴: 肉 vs U+2EBC, top 爻 (ref: 希~𢁫 // 04335)
03405 USAT90295 IVD 股/肢: 肉 vs U+2EBC
03407 USAT08919 IVD 育: 肉 vs U+2EBC
03411 USAT06030 IVD 背: 肉 vs U+2EBC
03414 USAT08746 IVD 䏣: 肉 vs U+2EBC
03416 USAT90297 IVD 胜: 肉 vs U+2EBC
03419 USAT06375 IVD 肺: 肉 vs U+2EBC; IDS: ⿰肉巿
03420 USAT10231 IVD 胎: 肉 vs U+2EBC
03422 USAT06361 IVD 胷: 肉 vs U+2EBC
03423 USAT08968 IVD 脃: 肉 vs U+2EBC
03426 USAT06768 IVD 胯: 肉 vs U+2EBC; (𡗢 ~ 夸 // 誇 ~ 𧧳 // 跨 ~ 𨀗 // 䠸 ~ 𨉀 // 鮬 ~ 𩶮 // 洿 ~ 𣴰);SC - 6
03437 USAT07202 IVD 腴: 肉 vs U+2EBC; SC = 9
03463 GHZR10093.03 IVD 𦠒
03470 USAT90298 IVD 臗
Zhuang Character Normalization Issue (Components)
The following submitted Zhuang characters do not use components that are considered standard by PRC. PRC may wish to say that Zhuang characters are not expected to be normalized. However, Ideographs used by Han languages and dialects may use the same character, if such evidence is discovered, then those characters will likely be unified (or IVD) with the Zhuang characters. Then, the existing Zhuang glyghs will need to be modified to use the PRC normalized form, creating lots of problems for existing fonts.
Therefore, it is wise to normalize the glyphs to use the PRC normalized form first. If the explicit form in the dictionary wish to be preserved, IVD should be registered after the normalized form is encoded.
-
00980 G_Z3951603
Consider normalize the right hand side to 兒 (U+5152) instead of 𫤘 (U+2B918).
-
00224 G_Z2281101
Consider normalize to 觉 as 覚 is not a normalized form.
-
02271 G_Z2302301
Consider normalize to 觉 as 覚 is not a normalized form.
-
04111 G_Z1231301
The phonetic of the character is 忝 (tim1). It should be normalized to 忝.
The confusion of 氺 and 心 for 忝 is common:
-
03110 G_Z4442201
斉 is not the PRC normative form. Consider normalize to 齐.
-
03187 G_Z0671501
埀 is not PRC normative form. Consider normalize to 垂.
-
05030 G_Z3621104
悪 is not the normative form. Consider normalize to 惡.
悪 and 惡 are unifyable, IRG#47
-
03008 G_Z3271501
宻 is not the PRC normative form. Consider normalize to 密.
Please also refer to IRGN2154 ROK Normalization Rule 5-1:
-
03140 G_Z4491201
𪮫 is not the PRC normative form. Consider normalize to 撒.
Unification of 冊 and 𠕋
There are various ways to which 冊 can be written but do not result in any etymological difference.
The shape difference between 冊 and 𠕋 could be considered rather "large" for inter-locale unification. Thus, this unification could be restricted to IVD only.
03555 UTC-01950
WS2015v3.0 Discussion Record
IRGN2179PostponedV3.0
pending for solutions (not unified by U+82B2, G source of U+82B2 may be changed), irg47.
unified by U+82B2, irg46.
Henry's Comment
- U+82B2 is used in 第一批异体字整理表 and 《通用規範漢字表》異體字 to mean 花;
Keep G-source of U+82B2 unchanged. - 03555 should be separately encoded.
UK Response
Strongly agree!
Action Item
Suggested to annotate the Code Charts with the correct semantics
00198 UTC-01573
Henry’s Comment:
00198 UTC-01573 = 𠈇/𠉦, corrupted form of 𠈇U+20207 /𠉦U+20266
UK’s Comment:
Disagree. UTC-01573 is a variant form of U+5BBF 宿, but U+20207 𠈇 is a variant form of U+5919 夙, so they are different characters and cannot be unified.
Henry’s Further Comments:
夙 is often exchanged with 宿 in old Hanzi. Furthermore, the phonetic of 宿 is 𠈇 (stricter transcription -- 𠉦). The fact that the source says 00198 should be read as “U+5BBF 宿” does not necessarily contradict that 00198 can be unified with 𠈇U+20207 /𠉦U+20266.
Refer to the following source by Chinese University of Hong Kong (http://humanum.arts.cuhk.edu.hk/Lexis/lexi-mf/search.php?word=%E5%AE%BF):
Please note that the presence of the 宀 does not affect its meeting; the form with or without is found in Oracle Bone evidence and also Jianbo Wenzi.
Given that,
(1) 𠈇U+20207 and 𠉦 U+20266 are variants of U+5BBF 宿 (per CUHK source)
(2) U+5BBF 宿 is variant with U+5919 夙 (per CUHK source)
(3) U+5919 夙 is variant of 𠈇U+20207 and 𠉦U+20266 (provided by UK)
(4) 00198 is variant of U+5BBF 宿 (provided by UK)
(5) 00198 is virtually indistinguishable with U+20266 𠉦, (and very likely referring to the same Oracle Bone/Old Hanzi glyph)
Action Item
Unify with U+20266 𠉦.
Character Stroke Count for 丽
丽 is present in the Kangxi Dictionary with a SC = 7. The radical is Dot (丶). Therefore, the total SC should be 8.
Iit is also counted as total SC = 8 when on top of 鹿
However, U+4E3D in the Code Charts has an SC = 6 (total SC = 7)
Affected Characters as follows:
Action Item
Add 丽 to IRGN954AR with total SC of 8.
04924 T13-314F
The glyph image and the IDS/SC submitted by TCA is not an accurate transcription of the evidence:
艺 is a character invented as a simplified glyph of 藝. The right hand side is not 艺 but 𠃟.
Thus, it should be transcripted to ⿰馬𠃟.
𠃟 is an alternative transcription of 也:
Action Item
- Correct glyph to ⿰馬𠃟.
- Unify (IVD) to U+99B3 馳.
04035 UTC-01167
04035 UTC-01167
Henry’s Comment:
04035 UTC-01167: more evidence? (賔=賓) no ⿰貝賓 exists yet.
UTC’s Comment:
The UTC does not agree. The supplied evidence is clear.
Henry’s Additional Comments:
賔 is a strict transcription of the character which is more popularly written as “賓” in modern times. The evidence from Grammata Serica Recensa shows only the transcripted form and not the original Oracle Bone or Bronze or Seal Script:
This transcription has caught my attention because no character composed of ⿰貝賓 exists yet – so it is possible that UTC-01167 is an incorrect transcription of a certain character, or the character has completely vanished in medieval and modern usage so no modern transcription exists.
If possible, the original source that this transcripted character was based on should be provided, so the transcription can be verified.
Nevertheless, given the historical significance of Grammata Serica Recensa, unless the original oracle/bronze/seal sources show clear evidence that this character has been horribly mis-transcripted, the character in its current form is still worth encoding.
Action Item
Postpone or Keep
01416 UTC-02632
WS2015 v.3 Discussion Record
unified by U+2BF4A (GZFY-00688) for IVS, irg47.
Henry's Comment
unified with U+632C for IVS, irg47. NOT unified with U+2BF4A (GZFY-00688).
UK Response
Disagree. It is a non-unifiable variant of U+632C, and should be encoded separately. As reported in IRGN2108Andrew_WG2N4682.pdf, the glyph form of U+2BF4A is incorrect, and should be corrected to ⿰
扌学, so UTC-02632 cannot be unified with U+2BF4A.
Henry's Additional Comment
According to my meeting notes, the resolution was unification with U+632C for IVS instead of U+2BF4A (GZFY-00688).
--
Additional Info:
Unification of 取 and ⿰耳𡿨
00346 G_Z1841301
The 几 (U+51E0) is the phonetic, not 𠘧 (U+20627). The character 00346 cannot be changed to use 𠘧 (U+20627). It should use 几 (with hook).
The presence or absence of the hook is not a location variant form issue. Refer to U+28972 where the hook is present even when the component is situated at the top:
05197 UTC-02557
05541 UTC-02614
Henry's Comment
IVD to U+2B733 (冉 ~ 冄)? PRC Conventions prefer 冉 > 冄.
UK's Comment
Disagree. We do not think that 冄 and 冉 are unifiable components, and 《漢語大字典》 has separate entries for both U+4DB2 䶲 and U+2A6AE 𪚮 and the two corresponding simplified forms.
Henry's Comment
The unification of 冉 and 冄 should be considered unifiable for IVD. There are already many exact equivalents encoded in URO, and many more can be encoded.
Examples of exact equivalents:
U+5189 冉 = U+5184 冄
U+67DF 柟 = U+678F 枏
U+8043 聃 = U+803C 耼
U+86BA 蚺 = U+86A6 蚦
U+88A1 袡 = U+887B 衻
U+9AEF 髯 = U+9AE5 髥
U+59CC 姌 = U+36A9 㚩
U+8211 舑 = U+4459 䑙
U+82D2 苒 = U+44A3 䒣
U+279A6 𧦦 = U+46C1 䛁
U+294FF 𩓿 = U+4AC7 䫇
U+5465 呥 = U+20BCD 𠯍
[...]
U+722F 爯 = U+2DDAA 𭶪
U+7A31 稱 = U+2E0CE 𮃎
Action Item
Unify via IVD.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.