Re: Properties of multibyte encodings

From: Jungshik Shin (
Date: Mon Dec 01 1997 - 13:20:50 EST

On Tue, 25 Nov 1997, Nitsan Seniak wrote:

> I'm currently working on the internationalization of a product
> for asian countries, especially Japan. For implementation reasons,
> I'm considering only supporting multibyte encodings with the
> following properties:
> 1. They are a superset of ASCII, which means that a character starting
> with a byte in the range [0x00, 0x7F] is a one-byte ASCII
> character;
> 2. They don't use shift states (ie, a multibyte character can always
> be interpreted independently of the ones which precede it.)
> Does anybody knows if these restrictions are reasonable? I know that EUC
> and SJIS are OK, and that JIS isn't; will not supporting JIS cut a big
> part of the market? Thanks for any advice.

  Just for your information, EUC-KR (EUC for Korean ; 1byte range(94) :
US-ASCII/KS C 5636/ISO 646 and 2byte range(94x94) with MSB set to 1 : KS C
5601-1987. C0/G0: US-ASCII, G1: KS C 5601-1987 C0/G0 invoked on CL/GL
and G1 invoked on GR) is "stateless" and perfectly satisfies both of
your criteria. Even though MS, IBM, and Mac use different names for
EUC-KR, it's used and supported by all major platforms(MS-DOS,
MS-Win,MacOS, OS/2, and Unix). KS C 5636 is different from US-ASCII/ISO
646 at one code point(backslash is replaced with the Korean currency
symbold, Won), but as it's mentioned about Japanese Yen sign, it's
better dealt with at "glyphs rendering level". Besides, it doesn't
impose much problem because KS C 5601-1987 defines the Korean currency
sign, too. There is a stateful encoding(ISO-2022-KR) for
US-ASCII/KS C 5636/ISO 646 and KS C 5601-1987, but it's only used for
the Internet mail exchange(RFC 1557) and you won't lose much by not
supporting it. You may wish to take a peek at
<url:> and links therein for
Korean(and CJ) issues.

   Jungshik Shin

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:38 EDT