Re: 8-bit encodings and ASCII (was: Unicode conformant character encodings and us-ascii)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu May 15 2003 - 13:48:13 EDT

  • Next message: Philippe Verdy: "Re: 'code unit' and 'code point' meaning check"

    From: "Otto Stolz" <Otto.Stolz@uni-konstanz.de>
    > Yael.Aharon@nokia.com wrote:
    >
    > > I actually meant to ask about the various iso (e.g. 8859 variants) and windows
    > > character encodings.
    > > thanks
    > Cf. <http://czyborra.com/charsets/iso8859.html>
    > and <http://czyborra.com/charsets/codepages.html>.

    Also look in the extensive charsets database maintained by IBM, many have been contributed to its now open-source ICU project...

    > > Or can anyone give me an example of a conformant character
    > > encoding that does not reserve these bytes to us-ascii?
    > Cf. <http://czyborra.com/charsets/cyrillic.html#KOI>
    > and <http://czyborra.com/charsets/codepages.html#CJK>.

    Don't forget EBCDIC, and also some Unicode-conforming encodings based on basic EBCDIC, where unused code units have been used to encode Unicode in a way similar to the UTF-8 encoding (with a simple reordering of bytes, so that ASCII characters are left on their equivalent ECDIC positions, as well as the extended EBCDIC controls such as NEL which are also assigned in ISO8859-* according to ISO6429 in range 0x80 to 0x9F)...

    Don't forget too VISCII (for Vietnamese) which uses some rarely used ASCII controls to map some Vietnamese characters with double accents, as the ISO6429 standard does not offer enough free positions in the range 0xA0 to 0xFF to map all Vietnamese characters. (Not conforming to Unicode, as there's no way to fully encode it with full roundtrip capability).

    Finally don't forget all the DOS/OEM codepages which assign visible characters in ASCII control code units and in extended ISO6429 position... However all these are not conforming to Unicode (no way to fully encode it with full roundtrip capability).

    Note that recent versions of JIS and GB encodings have now been updated to allow full roundrip conversion to/from Unicode. But note that JIS is not strictly compatible with ASCII (look at the position of the backslash, replaced by a yen character, the backslash being encoded elsewhere...)



    This archive was generated by hypermail 2.1.5 : Thu May 15 2003 - 14:39:21 EDT