Re: Unicode CJK Language Myth

From: Kenichi Handa (handa@etl.go.jp)
Date: Tue May 28 1996 - 02:19:16 EDT


cherlin@snowcrest.net writes:
> Very well. Under what circumstances would a Japanese see the character
> "choku" rendered incorrectly when using Unicode?

This question shows that you are not familiar with multilingual
environment. Anyway, I've already written one example in my previous
mail as below:

>> Do you mean that one should setup his environment for his locale? How
>> a novice user can setup appropriate fonts when he borrows a terminal
>> of his friend?

> A Japanese person using Unicode-enabled Japanese-language software with
> Unicode-encoded Japanese fonts will continue to see the expected rendering.

Yes, so I do agree that Unicode can be used at least for localization.

> Anybody using Chinese and Japanese fonts together might see either the
> Chinese or Japanese rendering. Presumably someone who uses Chinese fonts
> has a specific reason for doing so--perhaps the person is Chinese, or is a
> scholar of Chinese, or is doing business in China--but whatever the reason,
> that person has to accept that Japanese and Chinese fonts are different,
> and learn the significant differences.

Why should I write again and again that the difference is beyond what
is allowed as font variations?

> So who is left, who could have a valid objection to unifying the glyphs
> into one character?

I have no idea why mine can't be recognized as a valid objection.

> It is an important principle of Unicode that glyphs distinguished in any
> national character set must be assigned distinct characters in Unicode.

I don't think that this principle is that important (as I wrote
before), but I don't object to that for the momemnt because it's far
more harmless than the following logic:

> Since no Japanese or Chinese character set standard distinguishes these two
> glyphs, the conclusion is that nobody feels the need to make the
> distinction at the character level.

Is this truely what all Unicoders have in mind?

You are saying something like that since ASCII does not contain some
greek character, ASCII does not distinguish the character `a' from the
greek character.

It's nonsense to say some character set distinguishing or not two
characters if one is not included in the set. If we dare to say
something, a character set distinguishes characters contained in the
set from all characters not contained in the set.

And, no Japanese character set contain a character which allows
Chinese `choku' variant. In this sence, a character which allows
Chinese `choku' variant is different from the Japanese character which
doesn't allow the variant.

Unicode decided its unifications based on the survey of existing
character sets. The time was very bad because many plans of defining
new character sets of Han characters were still making progress
(e.g. CNS plane 3 and over, GB-???? traditional form charcter sets
corresponding to GB7589 and GB7590 (just like GB12345 corresponds to
GB2312)). These character sets may assign different code points to a
character unified in Unicode. For intance, U-80B2 unifies two
characters which are included in CNS-11643 Plane 1 and Plane 6. This
fact shows that it is quite nonsense to decide that a single charcater
set does not distinguish A and B just because it contains A but not B.

> How do the objectors to Unicode handle the problem of rendering "choku"
> now? Either it doesn't present a real problem, or they use separate
> Japanese and Chinese fonts with incompatible codings. Is this an advantage
> to anyone?

Very simple. Just use two character set Japanese JISX0208 and Chinese
GB2312 (or/and CNS11643) concurrently. There exist no incompatibility
as far as we use internationalized encoding methods (ISO-2022-INT and
X's Compound Text are the examples) and internationalized internal
character representation (Mule's method and X.V11R5's Xsi method are
the examples).

> Have I missed something? Is this style of explanation satisfactory to
> Japanese computer users?

You missed too many things to satisfy us.

I hope you have already understood the following paragraph is nonsense.

> Unicode is essential for global software development, and significant even
> for monolingual products and applications where good typesetting or math
> are of any importance. ASCII and the various 8-bit extensions to ASCII are
> all entirely inadequate for English or any other Latin alphabet language.
> Double-byte encodings of CJKV languages are worse, from a technical point
> of view. They are difficult to process correctly on a computer, are
> incompatible with each other, and do not offer enough character code points
> for scholarly applications. They do nothing to help developers or users to
> display "choku" correctly.

We (at leat mule) have not technical difficutly for handling multiple
double-byte character sets with more-than-16-bit charcater code
internally.

---
Ken'ichi HANDA
handa@etl.go.jp



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT