Re: 8-bit encodings and ASCII (was: Unicode conformant character encodings and us-ascii)

From: Doug Ewell ([email protected])
Date: Fri May 16 2003 - 02:57:46 EDT

Next message: Doug Ewell: "Re: 'code unit' and 'code point' meaning check"

Previous message: Doug Ewell: "Re: Unicode conformant character encodings and us-ascii"
In reply to: Philippe Verdy: "Re: 8-bit encodings and ASCII (was: Unicode conformant character encodings and us-ascii)"
Next in thread: [email protected]: "Re: 8-bit encodings and ASCII (was: Unicode conformant character encodings and us-ascii)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Philippe Verdy <verdy_p at wanadoo dot fr> wrote:

> Don't forget EBCDIC, and also some Unicode-conforming encodings based
> on basic EBCDIC, where unused code units have been used to encode
> Unicode in a way similar to the UTF-8 encoding (with a simple
> reordering of bytes, so that ASCII characters are left on their
> equivalent ECDIC positions, as well as the extended EBCDIC controls
> such as NEL which are also assigned in ISO8859-* according to ISO6429
> in range 0x80 to 0x9F)...

I can think of only one such encoding, UTF-EBCDIC:

http://www.unicode.org/reports/tr16/

> Don't forget too VISCII (for Vietnamese) which uses some rarely used
> ASCII controls to map some Vietnamese characters with double accents,
> as the ISO6429 standard does not offer enough free positions in the
> range 0xA0 to 0xFF to map all Vietnamese characters. (Not conforming
> to Unicode, as there's no way to fully encode it with full roundtrip
> capability).

Of course there is. Each of the 256 VISCII code points maps to one and
only one Unicode character. 0x02 in VISCII can only be U+1EB2 LATIN
CAPITAL LETTER A WITH BREVE AND HOOK ABOVE, never U+0002 START OF TEXT.
If you'd like, I can provide a mapping table.

> Finally don't forget all the DOS/OEM codepages which assign visible
> characters in ASCII control code units and in extended ISO6429
> position... However all these are not conforming to Unicode (no way
> to fully encode it with full roundtrip capability).

Now this is true, because the controls have double meanings (e.g. 0x0D
is both a carriage return and an eighth note).

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

Next message: Doug Ewell: "Re: 'code unit' and 'code point' meaning check"
Previous message: Doug Ewell: "Re: Unicode conformant character encodings and us-ascii"
In reply to: Philippe Verdy: "Re: 8-bit encodings and ASCII (was: Unicode conformant character encodings and us-ascii)"
Next in thread: [email protected]: "Re: 8-bit encodings and ASCII (was: Unicode conformant character encodings and us-ascii)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri May 16 2003 - 03:40:22 EDT