Re: What does Z variant mean for Han?

From: Jungshik Shin (jshin@mailaps.org)
Date: Tue Jul 09 2002 - 21:31:08 EDT


On Tue, 9 Jul 2002, Eric Muller wrote:

> Let's take the concrete example of U+5516 and U+555E, with kZVariant
> entries pointing at each other in Unihan.txt.

  It seems like those two would have been unified had it not been
for the source separation rule (JIS, GB, and CNS character set standards
encoded them separately.)

> Does Z variant mean that all the glyphs which are acceptable to
> represent U+5516 are also acceptable to represent U+555E, and
> conversely? Of course, some shapes may be more appropriate in some

 My answer would be yes, but I'm not so sure of this as I'm of
the next yes.

> Does it go as far as making folding from one character into the other a
> useful operation, assuming one is not interested in perfect
> round-tripping with other character standards?

  I think the answer to this is definitely yes. There are occasions
this operation becomes very useful. For instance, Korean fonts would
not usually have a separate glyph for U+5516, but virtually every Korean
(who can read Chinese characters) knows that U+5516 and U+555E mean the
same and are variants of each other. Therefore, when rendering text
with U+5516 with a Korean font lacking a separate glyph for U+5516, it'd
be better to represent U+5516 with the glyph for U+555E than rendering
it with '?' unless keeping the distinction between them is required for
some reason.

 Another case when the operation is useufl is web search.
Suppose a Korean want to search the net for '東北亞' (North East Asia)
without knowing that in Japan 唖(U+5516) is used instead of 啞(U+555E)
for Asia (the last character in the word). (S)he would miss most of
documents in Japanese because (s)he uses 啞(U+555E) in place of
唖(U+5516).

  The same would happen for 韓國(U+570B) and 韓国(U+56FD). Although U+570B
and U+56FD are not Z-variant but Simp-Trad. variant, an intelligent
web search engine would (per user's request) relieve users of the chore of
logically ORing two 韓國(U+570B) and 韓国(U+56FD).

  This kind of operation would help get the content through
(working around potential problems arising from IRG not having gone
so far as some people wished they had in Han unification in Ext. B/C)
when getting it through is more important than preserving sometimes
miniscule(semantically irrelevant) differences among variants.

  For the record, I'm not saying U+570B and U+56FD should have unified
by any means. They certainly deserve to be disunified.

  Jungshik Shin



This archive was generated by hypermail 2.1.2 : Tue Jul 09 2002 - 19:47:11 EDT