Re: Problems/Issues with CJK and Unicode

From: John Jenkins (
Date: Fri Apr 07 2000 - 18:56:05 EDT

on 4/7/00 2:02 PM, at wrote:

> 3) Because the elements of the script (the graphemes or the
> hemigrams) were not encoded as the 'characters' of Chinese,
> the majority (only in terms of quantity, not frequency of
> use) of Chinese lexemes cannot be represented by Unicode
> without recourse to the private use area and even then, there
> will still be thousands left out.

Well, and in any event whether these individual hemigrams are appropriate
units for encoding would also be a matter of considerable controversy. It
is certainly true that people use them and are aware of them, but they do
not traditionally form the basis for organizing data about the units of
writing in Chinese, which are usually taken to be the ideographs.

There's also the huge problem of trying to determine a systematic way of
spelling ideographs using hemigrams. The Ideographic Description Sequences
represent a first approximation for that, but the problems involved in using
those for formal encoding are incalculable. This is the main reason why they
have been defined to be *descriptors*, not *encoders*, and why equivalence
is defined only for identical sequences.

