Re: UTF-9

From: John Cowan (cowan@mercury.ccil.org)
Date: Fri Oct 31 2003 - 19:26:34 CST


Mark Crispin scripsit:

> [Read: "crazy old farts who still care about
> obsolete processors and have the temerity to think about implementing
> Unicode on them in native form."]

And their epigonoi.

> I thought about UTF-18, but I couldn't think of a good way to represent
> Unicode in 18 bits without surrogates. On the other hand, the idea to cover
> 0/1/2/14 (BMP/SMP/SIP/SSP) in a UTF-18 is interesting.

I agree, and think it makes sense.

> It would still need surrogates though. Are the D800-DFFF codepoints
> reserved in all planes or just in the BMP? I wonder if there is some
> way we could do all of ISO 10646 in UTF-18.

Only on the BMP. Planes 2. though 13., and 15. and 16., can be expressed
by surrogates. Planes above 16. have been definitively abandoned by both
ISO 10646 and Unicode, and need not be encoded.

To these proposals I would add UTF-8H and UTF-24 for the LINC, PDP-5,
PDP-8, LINC-8, and PDP-12 architectures. UTF-8H is identical to UTF-8,
except that the most significant bit of each octet is inverted. This
is intended to adapt to the convention on this architecture which encodes
ASCII in octets with the high bit set. UTF-24 represents each Unicode
scalar value in two consecutive 12-bit words, high order word first.

-- 
"But the next day there came no dawn,           John Cowan
and the Grey Company passed on into the         jcowan@reutershealth.com
darkness of the Storm of Mordor and were        http://www.ccil.org/~cowan
lost to mortal sight; but the Dead              http://reutershealth.com
followed them.          --"The Passing of the Grey Company"


This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST