character (set) encoding (scheme) (was ..RE: U+xxxx, U-xxxxxx, and the basics)

From: Jungshik Shin (jshin@pantheon.yale.edu)
Date: Wed Mar 08 2000 - 15:37:42 EST

Next message: Kenneth Whistler: "UTF-7 (was: RE: U+xxxx, U-xxxxxx, and the basics)"
Previous message: Julie Sisson: "Internationalization Architects"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Wed, 8 Mar 2000, Mike Brown wrote:

> I wrote:
> > > The mapping of abstract characters from a character
> > > repertoire to integers in a code space is called a
> > > "coded character set". Other names for such
> > > mappings are "character encoding" [...]
>
> Keld wrote:
> > "character encoding" alsoincludes a transformation
> > format or coded character set shifting techniques,
> > such as ISO 2022. So delete that term here.

I agree on this point.

> My intent is to clarify issues relating to the creation of XML documents, so
> I am mainly concerned with what certain terms mean in the realm of "ISO/IEC
> 10646-1993 ... (plus amendments AM 1 through AM 7)" as referenced by the XML
> 1.0 Recommendation.
>

> However, if, outside of the Unicode realm, "character encoding" means
> something other than what I stated here, please post examples of that term
> being used (maybe quote from ISO 2022?) so these can be considered. Both the
> UTR #17 and Unicode 3.0 book make an attempt to mention what alternative and
> conflicting terms for these same concepts exist in the rest of the
> information industry.

ISO 2022 is available online as ECMA35 at http://www.ecma.ch so that you
can take a look at it yourself. ISO-2022-JP and EUC-JP are two different
character set encoding schemes for coded character sets JIS X 201(and/or
ISO 646/US-ASCII), JIS X 208 and JIS X 212. Likewise, EUC-KR and
ISO-2022-KR are two different character set encoding schemes for coded
character sets KS X 1003(or ISO-646/US-ASCII) and KS X 1001. All four
of character (set) encoding (schemes) are compliant to ISO-2022. In
addition, IETF RFC 2130 and RFC 2278 are worth looking at. At least in
MIME context, character set (coded character set) and character (set)
encoding (scheme) should never be mixed up with each other as was done
by some employees of MS (Korea). Ken Lunde's 'CJKV Information
Processing' is another good reference in this regard.

Jungshik Shin

Next message: Kenneth Whistler: "UTF-7 (was: RE: U+xxxx, U-xxxxxx, and the basics)"
Previous message: Julie Sisson: "Internationalization Architects"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT