I need to write a program to convert sort of all kinds of CJK encodings to
Unicode (UTF-8, to be precise). I got the Unihan3.0 file from
and made a quick analysis of the CNS characters in there. I used the
and kIRG_TSource tags, converted the codes to r/c and printed the result.
I must admit that I don't understand the results I get:
There are entries for plane 3, rows > 66 (which, according to Ken Lunde's
are not defined; plane 3 stops at row 66); OTOH, I found quite some
missing from plane 4, and almost all from planes 5, 6, 7, and 15.
My questions are: Why are so many characters missing?
And: what am I supposed to do if I encounter a text that uses these
AFAIK, there's no other way than to do a mapping from e.g. CNS to Unicode,
convert the resulting code points to UTF-8.
ST encode GmbH
E.-L.-Kirchner-Str. 9; D-67227 Frankenthal
Phone: +49 (6233) 480 800; Fax: -801
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT