L2/99-316 ======================================================================= Since below e-mail was written, new versions of the JCS documents have been posted on the web as L2/99-310 through L2/99-313 and have been distributed in the October 8 mailing. Arnold ======================================================================= kenw@sybase.com (Kenneth Whistler) on 09/13/99 05:17:02 PM Please respond to unicore@unicode.org To: Multiple Recipients of Unicore cc: kenw@sybase.com Subject: JCS proposal UTC members, Han character experts, please take a look at L2/99-239, L2/99-240, hard copies of which have been distributed in Arnold's latest mailing. These are the left-over characters added to JIS X0213 that the JCS proposal is suggesting to add to the BMP: 314 kanji to be treated as unified ideographs on the BMP. 56 kanji to be treated as compatibility ideographs on the BMP. Implicit in this proposal is a rather significant attempt to do two things: A. Disunify a large number of characters that are currently unified (in order to encode specific glyph variants as characters). B. Break the agreement between IRG and WG2 (and the UTC) not to encode any more ideographs on the BMP. In particular, the 56 "compatibility ideographs" are acknowledged by JCS to be unifiable variants, but which they require to be separate "characters" for legal reasons in Japan. These are thus all 56 acknowledged disunifications. They are comparable to the 20 of the 32 IBM ideographs (FA0E..FA2D) that are non-unique. Their inclusion would be somewhat odious, but not without precedent. There is a definite level of confusion which could set in, however, since many of these Japanese forms are old-style variants that already appear in the K column of the 10646 printing of the unified characters, for example. The 314 "unified ideographs" are more problematical. There is a certain proportion of them which are just valid but rare characters not included in the URO or Vertical Extension A. The question for those is whether they are already part of Vertical Extension B. That question should be determined by the IRG in any case, so it would be premature to try to encode these as a separate set. But also included among the 314 are a bunch of alternate radical forms, including the infamous one-dot walk radical and two alternate forms of the grass radical. There are others mixed in here as well. These all just got encoded as part of the CJK Radicals Supplement, as *radicals*. But the JCS proposal is now claiming them as "UNIFIED IDEOGRAPHS" -- which is so bogus as to be laughable. Were the JCS proposal for these characters to be taken as is, the Unicode Standard would end up with 9 characters for the grass radical: 2EBE CJK RADICAL GRASS ONE 2EBF CJK RADICAL GRASS TWO 2EC0 CJK RADICAL GRASS THREE 2F8B KANGXI RADICAL GRASS 4491 CJK UNIFIED IDEOGRAPH-4491 (variant of 2EBE) 8278 CJK UNIFIED IDEOGRAPH-8278 ("grass", = 2F8B) 8279 CJK UNIFIED IDEOGRAPH-8279 ("grass, radical form", = 2EBE) AB73 JCS: CJK "UNIFIED" IDEOGRAPH-AB73 (= 2EC0, w/ style difference) AB74 JCS: CJK "UNIFIED" IDEOGRAPH-AB74 (= 2EBF, w/ style difference) Have these people no shame? This is what happens when a computing tradition that has never been able to move off ground-zero in associating 1 character to 1 glyph keeps grinding through the endless lists of variants, mistakes, rare, obsolete, nonce, idiosyncratic, and novel ideographs available through the millenia in East Asia. --Ken