CJK conversion problem

From: Oliver Steinau (oliver.steinau@STencode.de)
Date: Thu Jul 06 2000 - 16:57:58 EDT

Next message: Linus Toshihiro Tanaka: "Re: Japanese pronunciation of hex digits?"
Previous message: Michael \(michka\) Kaplan: "Re: How-To handle i18n when you don't know charset?"
Next in thread: John H. Jenkins: "Re: CJK conversion problem"
Maybe reply: John H. Jenkins: "Re: CJK conversion problem"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

All,

I need to write a program to convert sort of all kinds of CJK encodings to
Unicode (UTF-8, to be precise). I got the Unihan3.0 file from
ftp.unicode.org
and made a quick analysis of the CNS characters in there. I used the
kCNS1992
and kIRG_TSource tags, converted the codes to r/c and printed the result.
I must admit that I don't understand the results I get:
There are entries for plane 3, rows > 66 (which, according to Ken Lunde's
book
are not defined; plane 3 stops at row 66); OTOH, I found quite some
characters
missing from plane 4, and almost all from planes 5, 6, 7, and 15.
My questions are: Why are so many characters missing?
And: what am I supposed to do if I encounter a text that uses these
characters?
AFAIK, there's no other way than to do a mapping from e.g. CNS to Unicode,
and
convert the resulting code points to UTF-8.

Any help?

Best regards,

/oliver

Oliver Steinau
ST encode GmbH
E.-L.-Kirchner-Str. 9; D-67227 Frankenthal
Phone: +49 (6233) 480 800; Fax: -801

Next message: Linus Toshihiro Tanaka: "Re: Japanese pronunciation of hex digits?"
Previous message: Michael \(michka\) Kaplan: "Re: How-To handle i18n when you don't know charset?"
Next in thread: John H. Jenkins: "Re: CJK conversion problem"
Maybe reply: John H. Jenkins: "Re: CJK conversion problem"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT