Re: character set onversion routines

From: Jungshik Shin (jshin@pantheon.yale.edu)
Date: Mon Dec 22 1997 - 03:49:32 EST


On Thu, 18 Dec 1997, Victor Tse wrote:

>
> I am looking for source code (free or commercially available) that can
> do the following character set conversions:
>
> UCS2 -> Microsoft Asian Code Pages(SJIS, BIG5, BG2312 and KCS5601);
> UNIX Asian Code Pages (Japanese, Traditional Chinese, Simplify Chinese
> and Korean EUC).
>
> Shift-JIS <-> EUC for Japanese (I got this one from
> Ken Lunde's book)

  His jconv is also available on-line at
   <url:http://www.ora.com/people/authors/lunde/>

> BIG5 <-> EUC for Traditional Chinese
> BG2312 <-> EUC for Simplify Chinese
> KCS5601 <-> EUC for Korean

  I can't help wondering what you meant by KS C 5601?

  EUC for Korean is one of several CES'(Character set Encoding Scheme:
I'm using terms adopted in RFC 2130) for two CES'(Coded Character Set),
namely US-ASCII/ISO-646/KS C 5636 and KS C 5601-1987. Of courset, EUC
for Korean is the most widely used CES for US-ASCII/ISO-646/KS C 5636.
Granted, no one can give you a converter between two things belonging
to completely different categories. (see
http://pantheon.yale.edu/~jshin/faq/qa8.html and references therein).

   In case you meant EUC for Korean by KSC5601(as a lot of
people mistakenly have done in the past and unfortunately
are still doing), obviously you don't need any conversion
program at all because they're the same thing.

  If you meant UniHan(Unified Hangul Encoding) used in Korean version of
MS-Windows 95/NT by KSC5601, you were very much mistaken. UniHan has
NOTHING to do with Korean national standard(it's just a proprietary
encoding of Microsoft) except that it's upward compatible with EUC for
Korean(EUC-KR as MIME charset parameter value). As I posted before to
this mailing list, UniHan contains a lot of Hangul
syllables(11172-2350=8822) not covered in KS C 5601-1987(and accordingly
not covered by EUC-KR/EUC for Korean). Thus, it's impossible to convert
those not included in KS C 5601-1987 (EUC-KR) but in character
repertoire of UniHan to EUC-KR/EUC for Korean. However, NO conversion
whatsoever is necessary if your text contains only those characters
covered by both UniHan and EUC-KR as they share common code points.

   Hope this would help,

      Jungshik Shin

   



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:38 EDT