Re: Yet another Unihan Q (was Re: Comments on <draft...)

From: jenkins (
Date: Fri Jun 06 1997 - 13:09:35 EDT

On 6/5/97 7:24 PM Adrian Havill ( wrote:

>Jenkins wrote:
>> FACT. It is true that some Unihan characters are typically written
>> differently within the Japanese, Taiwanese, Korean, and Mainland Chinese
>> typographic traditions.
>I'm 90% sure I follow you on this, but I'd like to confirm. Forgive me
>for using Japanese terms (and a non Unicode character set! (^_^)), but
>it's all I know/have!
>Are you referring to 新字体 (Shinjitai) {new character shapes--
>character shapes changed in Japan sometime around 1945 due to national
>language reforms} 旧字体 (Kyuujitai) {old character shapes-- what the
>"new" characters were before they were changed} 簡体字 (Kanjitai)
>{simplified character shapes-- usually refers to PRC China language
>reformed characters} and 繁字体 (Hanjitai) {"luxurious"/complicated
>character shapes-- usually refers to the "unsimplified shapes"-- often
>with Taiwan, Hong Kong, Korea, etc. in mind}

No, not really. The differences I'm referring to aren't so much the
conscious result of formal language reform, but merely divergent
typographic traditions. Unihan does not unify simplifications.

>> E.g., the official "Taiwanese" glyph for U+8349 ("grass") per ISO/IEC
>> 10646 uses four strokes for the "grass" radical, whereas the PRC,
>> Japanese, and Korean glyphs use three. As it happens, Apple's LiSung
>> Light font for Big Five (which follows the "Taiwanese" typographic
>> tradition) uses three strokes.
>> (This is easily confirmed by accessing
>Not so easily confirmed (;_;)-- took me a while to get the CGI program
>to deliver. Unicode's server seems popular/busy.

We got 21000 hits yesterday (and have had 18000 so far today). The
actual problem lies with the CGI program -- or, rather, how it's
accessed. It happens that only one other person was accessing the page
at the same time as you yesterday.

>Also, if ISO/IEC 10646
>uses four strokes, why does the Unicode version use three (according to
>the CGI script)? I was under the impression that they should be the

They are.

Unihan characters are defined by their *mappings*, not their *shapes*.
U+8349 is defined to be equivalent to codepoint 1861 in GB 2312, 1-5777
in CNS 11643, 3380 in JIS X 3380, and 8514 in KS C 5601 -- whatever they
are and whatever they look like.

There are no normative shapes for CJKV ideographs in Unihan -- either in
Unicode or in 10646.

Unicode uses a single glyph when we print the character simply to save
paper. 10646 uses four glyphs to emphasize that the typical glyphs for
the different characters are different in the G, T, J, and K sources.

>Referring to the "Three-Dimensional Conceptual Model" (TUS 2.0, Figure
>6-25), and the rules listed in Table 6-24, does the four stroke "grass"
>(radical #140) versus the three stroke version cause this character to
>be not unified? In other words, while TUS 2.0 has only the 3-stroke
>radical #120, would the characters that use the 4-stroke version be
>added to TUS later? Or would the duplicating of all the characters which
>use this particular radical to a 4-stroke version add too many
>characters and not be justified (as modern Japanese, etc., uses the
>three stroke version)?

Firstly, the three- and four-stroke variations of the grass radical are
universally seen as just a font/Z-variation.

Secondly, deunifying the mappings would mean changing a normative part of
the standard (namely the one that says that the various mappings
indicated have the same target), which cannot be done.

>> FACT. Han unification allows for the possibility that a Japanese user
>> might be required to use a Chinese font to display some Japanese text
>> (e.g., if it uses a rare kanji).
>I hate to be obtuse, but I'm confused. By "[using] a Chinese font" to
>"[use] a rare kanji", do you mean:
>- use another font to get a rare kanji, but having to accept the
>typeface difference (the Z-axis in the 3-D model) that would cause the
>characters to stand out from the surrounding characters. (A rough
>analogy being to having the letter "g" and "d" and "j" in "jackdaws love
>my big sphinx of quartz." * in Arial but the rest of the sentence in
>Helvetica, where the "g", "d", and "j" are the 'rare kanji')

I mean this.

One shouldn't fall into the trap of assuming that the only ideographs in
Unihan used by Japanese to write Japanese are those derived from JIS X
0208 and JIS X 0212.

>Note that just like the English example, substitute characters from a
>different font slapped in the middle of another font works, (Navigator
>3.0 did this for it's Unicode Java fonts) but looks awful. Can't wait
>for Bitstream/Dynalab to perfect their uniform full CJK Han fonts
>(Cyberbit and Co.) to solve this problem.

We have traditionally discouraged the create of "a Unihan font" for
actual use (although we used one for the book) because it glosses over
the different typographic expectations of users in China, Taiwan, Japan,
and Korea. The uniform appearance you get may be the wrong one for all
but one of those regions.

You could still do it, of course -- in TrueType, for instance, you could
have multiple cmaps and allow the user to select between them. This way
you could have a different glyph for the Japanese version of U+8349 than
the Taiwanese one.

>In other words, are you saying that even if the user mixes Chinese fonts
>occasionally with Japanese fonts for a Japanese document encoded in
>plain text Unicode, he/she should expect a change on the Z-axis but
>-should usually expect- (depending on how common/used the character is)
>be able to control/select the abstract shape (the Y-axis) of the
>character-- with certain exceptions such as unavailability of the CNS
>11643 "Y-axis variant" of U+8349?

No. The user cannot select Y-variants by switching fonts. The two
shapes for U+8349 are considered Z-variations.

What the user should expect is that an occasional rare kanji can only
displayed by a font other than the one they want to use.

John H. Jenkins‾tseng

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT