RE: japanese xml

From: Jungshik Shin (jshin@mailaps.org)
Date: Thu Aug 30 2001 - 14:13:37 EDT


On Wed, 29 Aug 2001, Marco Cimarosti wrote:

> "euc-jp" means the Japanese character set (JIS) serialized in EUC ("Extended
> Unix Code").

  I'm afraid this is slightly misleading because EUC-JP encodes NOT
a *single* coded character set BUT *three* coded character sets,
US-ASCII/JIS X 201, JIS X 208 and JIS X 212. Moreover, calling Japanese
character sets as JIS, however common the practice might be, is not
strictly right. As you know too well, JIS just stands for Japanese
Industrial Standard under which there are numerous standards other than
coded character sets.

> EUC is what Unicoders would call a "transformation format", and
> it is very popular with the three main CJK character sets (JIS=Japan,
> GB=China, KCS=Korea).

 Again, EUC-KR encodes NOT one coded character set BUT TWO coded
character sets, US-ASCII/KS X 1003 and KS X 1001 (formerly known
as KS C 5601. KS began to fill a new series 'X' with IT related
standards in mid-90's). The same is true of EUC-CN which encodes
US-ASCII and GB2312-80. That's why it's WRONG to refer to EUC-KR as
ksc_5601-1987/KSC5601 as some vendors do in their products.

  Jungshik Shin



This archive was generated by hypermail 2.1.2 : Thu Aug 30 2001 - 15:16:37 EDT