Well, I would like to see the same limitation in both Unicode and ISO
I am trying to approach this from an implementer's view, and I see some
merit in having only 20b code points, which is more of a motivation to me
than proposing this exact encoding.
Maybe it is just my gut feel for "elegance" in encodings that is going wild
- Bit field storage would not need a hardly used 21st bit.
- An extension of HTML and Java escape codes like "\u20ac" could be
sufficiently defined to use 5 digits instead of 6. (I am not a big friend
of the assumption that is made in the existing escape codes, namely that
the target code is UTF-16, where surrogate pairs form a valid non-BMP
character.) The 6th hexadecimal digit can only be 0 or 1, and the 1 is
rarely used in this position.
(Having to code non-BMP characters in escape code pairs is
counter-intuitive and makes humans learn the surrogate pair mechanism.)
- An encoding like this "UTF-20" would be possible.
- "Aesthetics" - I am looking more at the scalar values than at the default
The only reason for the use of 17 planes instead of 16 seems to be "because
it is what UTF-16 can reach" (I like the UTF-16 encoding very much!).
Only 6 of them are assigned to anything, the other 11 planes with 704k code
points are reserved. My proposal would replace one reserved plane with one
private use plane and re-reserve the private use plane 16 - or just
discourage its use; it is, of course, still available from other encodings
like higher private use planes and groups.
My impression is that the joint effort of Unicode/ISO-10646 has already
passed through a phase of making a compromise between the wish to have a
linear 16b encoding for everything and the realization that there are more
than 64k characters to encode, arriving at an actually used code point
range of 20.1 respectively 21b.
I think that implementers would be more happy with a limitation to 20b.
I have high respect for everything(!) that has been done to create this
Markus Scherer IBM RTP +1 919 486 1135 Dept. Fax +1 919 254 6430
Michael Everson <firstname.lastname@example.org> on 99-01-25 12:20:23
Subject: Re: proposal: UTF-20
Ar 08:09 -0800 1999-01-25, scríobh email@example.com:
>I propose the UTF-20 encoding for Unicode/ISO-10646 as a compromise
>compactness and the use of scalar integers for all characters in planes 0
>to 15. Planes 16 and above are not accessible.
Then it doesn't have much to do with ISO/IEC 10646, does it.
-- Michael Everson, Everson Gunn Teoranta ** http://www.indigo.ie/egt 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland Guthán: +353 1 478-2597 ** Facsa: +353 1 478-2597 (by arrangement) 27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT