Re: unified or over-unified (was Unicode CJK Language Myth)

From: Kenichi Handa (
Date: Tue Jun 04 1996 - 21:38:57 EDT writes:
> Ken'ichi Handa writes :
>> 1) Such a character does not exist in Japan (i.e. not being used as a
>> variant of Japanese glyph `choku' in Japan), but of course as far as I
>> know.

> It exists.
> This GB-looking `choku' (ie. without vertical stroke at left) was fairly
> common in Japan before Meiji era(1868-). Even in Meiji era, there exist
> typefaces such as `GYOSHO-TAI' (hand-written style) in `TSUKIJI-KATSUJI'
> (one of the major printing offices existed in Tokyo) movable type set.
> They were used in printing Japanese.
> ...
Thank you very much for the quite surprising (at least for me)
information. I'm ashamed of not knowing this fact, and I express my
apology to those who claimed that Unicode doesn't make "choku" case

>> 2) Even if the glyph has ever existed as Japanese somewhere in Japan,
>> the glyph can't be deduced from the base glyph for Japanese `choku' by
>> the generalization criterion.

> The difference may be viewed as a difference in style(SHO-TAI), not
> in JITAI. (The `TSUKIJI-KATSUJI' case above is exactly so. Other 2
> cases are manuscript vs print, or between manuscripts).

So the actual case for "choku" is not:
        (1) Japanese people regard them as of different JITAI
        (abstract shape), therefore Japanese think they don't have
        a character of Chinese JITAI.
        Chinese people regard them as of the same JITAI.
        (2) Both people regard them as of the same JITAI.

But I still have the following question to Unicode:
        Can all the unified charaters in Unicode be regarded as the
        (2) case? Were all characters unfied after checking it?

If Unicode assures that all unified characters are in the case (2),
then I'll be satisfied with Unicode that it is not worse than

But, as far as I konw, the book The Unicode Standard (Vol.1&2) does
not mention about such a principle. The book says only that
difference in "treatment of a source character set" leads to code
separation. How much the "treatment" is considered is still doubtful.

For instance, Unicode puts different points for U+6384 and U+6451.
Those are unified in JISX0208 but JISX0212 contains U+6451. How can
"source set separation" rule treat this kind of situation?

For instance, Unicode unifies characters in U+80B2, one of them is in
CNS11643-1 (4B3F) and the other is in CNS11643-6 (2D69). I'd like to
ask to Taiwanese people how they will treat Unicode (or ISO10646)
along with CNS series.

Ken'ichi HANDA

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT