From: Doug Ewell (email@example.com)
Date: Fri May 16 2003 - 02:57:46 EDT
Philippe Verdy <verdy_p at wanadoo dot fr> wrote:
> Don't forget EBCDIC, and also some Unicode-conforming encodings based
> on basic EBCDIC, where unused code units have been used to encode
> Unicode in a way similar to the UTF-8 encoding (with a simple
> reordering of bytes, so that ASCII characters are left on their
> equivalent ECDIC positions, as well as the extended EBCDIC controls
> such as NEL which are also assigned in ISO8859-* according to ISO6429
> in range 0x80 to 0x9F)...
I can think of only one such encoding, UTF-EBCDIC:
> Don't forget too VISCII (for Vietnamese) which uses some rarely used
> ASCII controls to map some Vietnamese characters with double accents,
> as the ISO6429 standard does not offer enough free positions in the
> range 0xA0 to 0xFF to map all Vietnamese characters. (Not conforming
> to Unicode, as there's no way to fully encode it with full roundtrip
Of course there is. Each of the 256 VISCII code points maps to one and
only one Unicode character. 0x02 in VISCII can only be U+1EB2 LATIN
CAPITAL LETTER A WITH BREVE AND HOOK ABOVE, never U+0002 START OF TEXT.
If you'd like, I can provide a mapping table.
> Finally don't forget all the DOS/OEM codepages which assign visible
> characters in ASCII control code units and in extended ISO6429
> position... However all these are not conforming to Unicode (no way
> to fully encode it with full roundtrip capability).
Now this is true, because the controls have double meanings (e.g. 0x0D
is both a carriage return and an eighth note).
This archive was generated by hypermail 2.1.5 : Fri May 16 2003 - 03:40:22 EDT