Re: Problems/Issues with CJK and Unicode

From: jon@kanji.com
Date: Fri Apr 07 2000 - 16:54:02 EDT


My problems with CJK Unicode are

1) that often one Han 'character' is mapped to two or three or
   more code points. IOW, CJK unification didn't go far enough.

2) that a Han 'character', i.e. a lexicographic unit (lexeme, a
   dictionary entry), is confused with a 'character' of Latin
   script, or of a syllabic script, like kana. A character of a
   Latin script or a syllabic script goes to make up a
   lexicographic unit (a word, usually). The corresponding
   animal for Chinese would be the graphemes that go to make up
   the lexicographic unit. (The best choice for these might be
   the 2000 or so hemigrams (half graphs) that either stand
   alone as a 'lexeme' or combine with each other to compose all
   Chinese 'lexemes'.)

3) Because the elements of the script (the graphemes or the
   hemigrams) were not encoded as the 'characters' of Chinese,
   the majority (only in terms of quantity, not frequency of
   use) of Chinese lexemes cannot be represented by Unicode
   without recourse to the private use area and even then, there
   will still be thousands left out.

I realize there are practical reasons for the above state of
affairs regarding Unicode CJK, but the above problems
remain.

On a lighter note, I, for one, am extremely pleased that the
Unicode Han index was arranged according to the Kangxi system of
214 classifiers. This is the one system that is shared by all
regions throughout the kanji culture realm and was the proper
choice.

Jon

-- 
Jon Babcock <jon@kanji.com>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT