Date: Sun Aug 29 2010 - 08:17:00 CDT
On Sun, 29 Aug 2010 14:07:35 +0200
Uriah Eisenstein <email@example.com> wrote:
>UAX #38 (Unihan) defines the kIRG_USource field as a reference into the
>U-source ideograph database described in UTR #45, having the form "UTC
>nnnnn". However, several CJK Compatibility Ideographs are mapped to their
>own code point values, e.g. "U+FA0C kIRG_USource U+FA0C". The formal
>syntax of kIRG_USource allows this, but I've found no explanation as to the
>meaning of such a mapping; there is also no such mapping from a code point
>to another code point.
I think it's good pointing out. U+FA0C was originally
introduced for the round trip conversion with ISO/IEC
10646 versus Big5, but it's slightly difficult to know
such background from the properties in current Unihan.txt.
U+FA0C is still easier example to understand, because
its kDefinition mentions about it. U+FA0D is also
introduced for the compatibility with Big5, but does
not say such.
Recently, CJK compatibility ideographs are proposed to
assign the codepoints for the "characters" whose shape
differences are unifiable with existing characters. And
U+F900 - U+FA0B for KS X 1001:1998 compatibility and
U+FA0C - U+FA0D for Big5 compatibility are exceptional
because their glyph shapes have exactly no differences
with existing characters. Some people expect such info.
For compatibility characters with subtle differences
in their shapes, I'm not sure if the historical back
ground is needed /or not. The compatibility ideographs
introduced for IBM Kanji for Japanese markets have
subtle differences with the exemplification glyphs in
Japanese industrial standards when IBM developed them.
But, in later, newer Japanese industrial standards
recognized that some of them are reasonable to be coded
at different code points. Therefore, Unihan.txt lists
U+FA0F kIBMJapan FA9B
U+FA0F kIRG_JSource 3-2F4B
U+FA0F kIRG_USource U+FA0F
U+FA0F kJIS0213 1,15,43
U+FA0F kRSAdobe_Japan1_6 C+8421+32.3.7 C+8421+150.7.3
I'm not sure if all possible variants for JIS X 0213 can
be recognized with "compatible with IBMJapan".
# I slipped to check who provided the font to print the
# characters introduced for IBM Kanji in ISO/IEC 10646.
Uriah, do you think historical background info about each
compatibility ideographs should be noted in Unihan.txt?
This archive was generated by hypermail 2.1.5 : Sun Aug 29 2010 - 08:22:35 CDT