WG: CJK conversion problem

From: Oliver Steinau (oliver.steinau@STencode.de)
Date: Thu Jul 06 2000 - 17:45:16 EDT


thanks for the quick answer... but still -- how do I treat these characters
*IF* I
encounter them (given that I don't need a round-trip-safe conversion)?


        Those characters are not defined in Unicode 3.0, maybe they will be
defined in Unicode 4.0 or later.

Also those characters are seldom used and are not supported by almost all of
existing environments.

So, don't worry about them :).


>I need to write a program to convert sort of all kinds of CJK encodings to
>Unicode (UTF-8, to be precise). I got the Unihan3.0 file from
>and made a quick analysis of the CNS characters in there. I used the
>and kIRG_TSource tags, converted the codes to r/c and printed the result.
>I must admit that I don't understand the results I get:
>There are entries for plane 3, rows > 66 (which, according to Ken Lunde's
>are not defined; plane 3 stops at row 66); OTOH, I found quite some
>missing from plane 4, and almost all from planes 5, 6, 7, and 15.
>My questions are: Why are so many characters missing?
>And: what am I supposed to do if I encounter a text that uses these
>AFAIK, there's no other way than to do a mapping from e.g. CNS to Unicode,
>convert the resulting code points to UTF-8.
>Any help?
>Best regards,
>Oliver Steinau
>ST encode GmbH
>E.-L.-Kirchner-Str. 9; D-67227 Frankenthal
>Phone: +49 (6233) 480 800; Fax: -801

