Re: mbstowcs and UNICODE

From: Jungshik Shin (jshin@pantheon.yale.edu)
Date: Fri Jan 16 1998 - 10:02:59 EST


On Thu, 4 Dec 1997, Ienup Sung wrote:

  Thank you for the info on iconv() in Solaris 2.6.

> Sun supports various code conversions among:
>
> UCS-2, UCS-4, UTF-8, UTF-7, UTF-16, SJIS, Japanese EUC, Korean EUC,
> Traditional Chinese EUC, BIG5, Simplified EUC,
> ISO 8859-1 ~ 10, KOI8-R, ISO-2022-KR, ISO-2022-CN, ISO-2022-TW, ...
>
> as like attached picture in PostScript file since Solaris 2.6. All of them are
> available through iconv(1) utility and also iconv(3) APIs. We are also
> adding more code conversions like TIS620 and so on at the Solaris 2.7.

  Taking another look at the attached Postscript file, I found a small
glitch I overlooked before. The parentheses following EUC-KR has only
one of two character sets convered by the encoding, KS C 5601-1992,
leaving out KS C 5636-1993/US-ASCII while those following EUC-JP contain
all three character sets covered by that encoding, JIS X 0201,JIS X
0208, and JIS X 0212. As you know well, EUC-KR(encoding) is not
equivalent to KS C 5601-1992(character set) and I'm concerned this
diagram gives the false impression that they're equivalent. The same is
true of EUC-CN and EUC-TW. EUC-CN is not equivalent to GB 2312-80 but an
EUC encoding of US-ASCII/GB 1988-80 and GB 2312-80. By the same token,
EUC-TW covers not only CNS 11643-1992 but also US-ASCII.

  Considering the confusion among many people about the
distinction(between encoding and character set/character repertoire or
in terms of RFC 2130, CES and CCS), I guess it's important to make this
point as clear as possible to avoid perpetuation of this unfortunate
mix-up.

   Regards,

    Jungshik Shin



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:39 EDT