Re: 8-bit encodings and ASCII (was: Unicode conformant character encodings and us-ascii)

From: Doug Ewell (
Date: Fri May 16 2003 - 02:57:46 EDT

  • Next message: Doug Ewell: "Re: 'code unit' and 'code point' meaning check"

    Philippe Verdy <verdy_p at wanadoo dot fr> wrote:

    > Don't forget EBCDIC, and also some Unicode-conforming encodings based
    > on basic EBCDIC, where unused code units have been used to encode
    > Unicode in a way similar to the UTF-8 encoding (with a simple
    > reordering of bytes, so that ASCII characters are left on their
    > equivalent ECDIC positions, as well as the extended EBCDIC controls
    > such as NEL which are also assigned in ISO8859-* according to ISO6429
    > in range 0x80 to 0x9F)...

    I can think of only one such encoding, UTF-EBCDIC:

    > Don't forget too VISCII (for Vietnamese) which uses some rarely used
    > ASCII controls to map some Vietnamese characters with double accents,
    > as the ISO6429 standard does not offer enough free positions in the
    > range 0xA0 to 0xFF to map all Vietnamese characters. (Not conforming
    > to Unicode, as there's no way to fully encode it with full roundtrip
    > capability).

    Of course there is. Each of the 256 VISCII code points maps to one and
    only one Unicode character. 0x02 in VISCII can only be U+1EB2 LATIN
    If you'd like, I can provide a mapping table.

    > Finally don't forget all the DOS/OEM codepages which assign visible
    > characters in ASCII control code units and in extended ISO6429
    > position... However all these are not conforming to Unicode (no way
    > to fully encode it with full roundtrip capability).

    Now this is true, because the controls have double meanings (e.g. 0x0D
    is both a carriage return and an eighth note).

    -Doug Ewell
     Fullerton, California

    This archive was generated by hypermail 2.1.5 : Fri May 16 2003 - 03:40:22 EDT