Re: Fwd: Wired 4.09 p. 130: Lost in Translation

From: Mark Leisher (
Date: Wed Aug 28 1996 - 11:04:23 EDT

    David> Interesting 16-bit vs. 32-bit issue for characters. (I guess
    David> nobody seriously considered 24-bit characters?)

    David> Anyway, I have an even more radical idea. Could Unicode support
    David> variable-length characters, so that one or more Unicode values
    David> would mean "shift"? This would allow quite a number of Chinese
    David> (etc.) characters to be represented in the second Unicode
    David> byte-pair.

Literally speaking, the UTF8 form of Unicode (for the range 0x0000-0xFFFF) is
a variable length (up to) 24-bit encoding, but does not exhibit the "shift"
property in the sense you intended.

I can say from experience that handling variable length encodings is as much
of a pain as handling multiple character sets. Maintenance and debugging are
annoyingly involved, not to mention other problems like font mapping and
database issues.

Unicode's answer to the space limitation is UTF16, which basically provides an
"escape" into a much larger plane.

    David> Or am I being way too whimsical?

Reasonable questions, I think. The last 10-15 years have seen numerous
"shift" and "escape" schemes attempting to solve some of the representation
problems. Few have survived. Those that survived will be used until a
clearly superior successor appears on the scene. My personal opinion is that
Unicode is on the right path to becoming that "clearly superior successor."
Mark Leisher "A designer knows he has achieved perfection
Computing Research Lab not when there is nothing left to add, but
New Mexico State University when there is nothing left to take away."
Box 30001, Dept. 3CRL -- Antoine de Saint-Exup'ery
Las Cruces, NM 88003

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT